Yesterday AOL proudly announced the release of 20 million web queries from 650,000 users (screenshot), with each user “anonymized,” but identified by a unique ID. This is appalling – it means that potentially thousands of social security numbers and email addresses are now free for spammers and thieves to harvest, along with a lot of other personally identifying information. Think about what you search for – email addresses, people’s addresses, business secrets and even social security numbers come to mind. AOL quickly realized their mistake and pulled the plug, but not before the dataset had taken on a life of its own.
So, spammers and thieves are having a field day, but now that it’s out, we might as well use it for educational purposes. It’s a big, unwieldy file, but I’ll try to post some real estate search patterns by tomorrow. If you’re hoping to do your own analysis on this dataset, I wager that there will be a nice web interface for you to use within a week (Consumerist thinks so too). I’ll let you know when it pops up.
More on the ramifications of the release at TechCrunch. If you’re going to cancel your AOL account, good luck.
I saw that on Jim’s site early this morning and my jaw dropped. It is definitely shocking that they allowed this to be released. The data has the poential to be completely disruptive to a thousands of AOL users. I’m sure glad I never took part in their service.
Pingback: Ubertor Real Estate Blog » Real Estate Carnival
Pingback: Seattle’s Rain City Real Estate Guide » 20 million reasons to cancel AOL
Boy am I glad I dumped my AOL stock.
Perhaps AOL was attempting to emulate Microsoft? The MSN Search Team recently gave selected university researchers access to 15 million real-user queries (which were also filtered & anonymized prior to their access of this data).
The big difference is that the researchers are under strict license in using the data, and since MS is providing them with grant money the likelihood of a leak is very slim. AOL appears to have put the data in public domain, so Joe Consumer (or John Criminal Mind) can do what they want with it while MS kept the researchers on a very short leash.
Yes.. try out the AOL search database yourself.. It is just fun to look at some of the search data..
http://data.aolsearchlogs.com/log/random.cgi
Another site where you can search this data is here
http://www.datablunder.com/logitems/query/
Another site where you can search this data is here
http://www.datablunder.com/logitems/query/
Here’s a *quick* site where you can search the AOL data for yourself:
http://www.frogspy.com