- Unbeliebable: White House turns Bieber petition response into immigration screed
- Obama signs law denying Iran ambassador’s visa, but says law is ‘advisory’
- Mich. judge to laughing convicted killer: ‘I hope you die in prison’
- Man charged in Kansas City-area highway shootings
- Keystone XL pipeline still on hold after State Dept. decision
- Fla. man charged with killing 16-month-old son to play Xbox undisturbed
- Drones from the deep: Pentagon develops ocean-floor attack robots
- Michigan mayor slaps back atheists’ try to erect ‘reason station’ at city hall
- PHILLIPS: Where is the conservative establishment?
- 7.5-magnitude earthquake shakes southern Mexico
San Francisco startup makes data science a sport
SAN FRANCISCO (AP) - Strange secrets hide in numbers. For instance, an orange used car is least likely to be a lemon.
This particular unexpected finding came to light courtesy of a data jockey who goes by the Internet alias SirGuessalot, who in fact wasn’t guessing at all. Instead, he and his partner, PlanetThanet, relied on the hard math skills that make them top contenders in a sport tailor-made for the 21st century: competitive number-crunching.
The used car defect prediction contest is one of dozens hosted by San Francisco online startup Kaggle, whose creators believe they can tap the global geek population’s instinct for one-upmanship to mine better answers faster from the world’s ever-rising mountain of data.
“Competitions bring together a wide variety of people into a wide variety of problems,” said Jeremy Howard, who became Kaggle’s president and chief scientist after winning multiple competitions himself. “You get people looking at stuff they’d never look at otherwise.”
While the used car contest was fun, Kaggle has its eye on weightier scientific problems. In one contest, an English major who trained himself in data science built a model for predicting the progress of HIV infections in individual patients. In another, a scientist who studies glaciers for a living won a NASA-backed Kaggle competition to measure the shapes of galaxies by mapping the universe’s dark matter.
The data problems that need solving are so important that those who find the solutions should be paid like professional athletes, said Kaggle founder Anthony Goldbloom. By turning data-mining into a crowdsourced contest, he hopes he’s created a way to make that happen. Already one of Kaggle’s contests offers a multimillion dollar prize.
“We want to see the best data scientists earning more than Tiger Woods,” said Goldbloom, who started the company in his native Australia and recently came to San Francisco’s South of Market startup haven.
The job market for mathematicians and statisticians has become hot as the sheer volume of data generated by ever faster, cheaper computing resources explodes.
Data storage has become so inexpensive that a 2011 McKinsey and Co. report estimated that a disk drive capable of storing all the world’s music would cost about $600. Walmart stores 10 times more data on customer transactions and other parts of its operation than is contained in the entire Library of Congress, according to the same report.
Analyzing the so-called “big data” deluge has become a key task for businesses in an effort to divine everything from which ads online customers will click to how much inventory they need to maintain. Political candidates analyze data to predict voting patterns. Dating websites try to predict ideal mates.
Kaggle competitions focus on creating and testing formulas that can be used to make predictions based on the contents of giant datasets.
The more accurate the formula, the better the chances it will accurately provide answers to complex questions, such as the orange used car being the least likely to break down.
Goldbloom argues that no matter how many data scientists companies hire, relying on in-house data talent means companies can’t know if they’re getting the best solution.
In a Kaggle contest, competitors find out as soon as they submit their solutions how they stack up against fellow contestants. They can keep trying for the duration of the typically three-month contests, which are highlighted on the company web site.
As the first entries come in, the accuracy of competing models improves by leaps, Goldbloom said. As the contests progress, the improvement curve flattens out. Goldbloom and Howard believe that shows the competitive approach pushes data scientists toward the best solutions within human reach.
TWT Video Picks
Women losing coverage under Obamacare, too
- Scalia to students on high taxes: At a certain point, 'perhaps you should revolt'
- Former Ranger breaks silence on Pat Tillman death: I may have killed him
- Special Forces' suicide rates hit record levels casualties of 'hard combat'
- Feds approve powdered alcohol; 'Palcohol' available later this year
- Nancy Pelosi washes immigrants' feet in humble Holy Week act then promotes on Twitter
- Justice at last: 'Evil woman' outed for grabbing girl's game ball
- Army goes to war with National Guard, seizes Apache attack helicopters
- Russian fighter jet buzzes U.S. Navy destroyer in Black Sea
- EDITORIAL: Mark Warner running scared?
- Harry Reid blasts Bundy ranch supporters as 'domestic terrorists'
Top 10 handguns in the U.S.