A Dutch researcher has discovered that he could convert most of the data within Google Profiles into a single SQL statement and expose, among other data, the usernames and Gmail addresses of some 35,000,000 people.
The researcher, Matthijs R. Koot explained in a blogpost that there is an xml file known inside and outside of Google which points to more than 7000 sitemap-NNN(N).txt containing 5000 hyperlinks to Google profiles, with some 35,000,000 links in all. Koot spent roughly a month assembling this information into a database, and claims that in that time Google neither throttled, blocked, CAPTHCAd, or otherwise made his mass-downloading experience difficult in any way.
Koot claims that Google Profiles gives users the choice of using their username in their Google Profile URL, but warns that doing so could make an individual’s email address publicly accessible. The 35,000,000 profiles he assembled are those which chose to use their usernames to make a Google Profile URL easier to find and remember.
Other information he was able to access include in many cases, users’ professions, employers, education information, locations, links to their Twitter accounts, Picasa photo albums, LinkedIn accounts, and at times, various other information.
The researcher says this information is a spear-phishing attack just waiting to happen. In a second blog post, Koot claims his efforts are “directed at inciting, or poking up, debate about privacy – NOT to create DISTRUST but to achieve REALISTIC trust.” He goes on to claim this is another instance of Google, or any other tech company for that matter, equating “implied consent” with checking a box.
For more depth and information on this issue, you can find Matthijs R. Koot’s original blog posts here and here (in that order).