Clicky

X

Subscribe to our newsletter

Get the State of Digital Newsletter
Join an elite group of marketers receiving the best content in their mailbox
* = required field
Daily Updates

Dutch researcher downloads 35 million Google Profiles

25 May 2011 BY

Aren’t they lovely, the new Google Profiles? And you can put so much information in it. Information which everybody can see. And download… We’ve discussed the privacy matters around the profiles before and I will be talking about the presentation I did at SMX about the profiles soon too. But there is a lot more to the Google Profiles. A Dutch researcher was able to download, export and import 35 million Google Profiles, with data.

The researcher Matthijs Koot, working for the University of Amsterdam, is writing a research paper about anonymity and privacy. For that research he decided to look at the Google Profiles. He noted that a lot of the information can be downloaded pretty easy.

Last February Koot created “a database containing ALL ~35.000.000 Google Profiles without Google throttling, blocking, CAPTCHAing or otherwise make more difficult mass-downloading attempts.” He was able to import all the data into one of his own databases. He used a sitemap from Google to download all the data.

The scary part is that the database contains “Twitter conversations (also stored in the OZ_initData variable) , person names, aliases/nicknames, multiple past educations (institute, study, start/end date), multiple past work experiences (employer, function, start/end date), links to Picasa photoalbums, …. — and in ~15.000.000 cases, also the username and therefore @gmail.com address.”

Google doesn’t mind

The information which has been downloaded is freely accessible for everybody. Google actually allows it themselves by allowing the profiles to be indexed. Koot publishes the code he used on his blog and is now hoping Google won’t kick him out of Blogger, on which platform he is blogging.

Google Netherlands responded already saying that there is nothing wrong here. The data which is stored in the sitemaps is after all already publicly visible. It is not a leak, the data is there already. This off course is the ‘easy’ answer. Yes, it is data which is public already, but should it be downloadable that easy? Also, the data which is in the sitemap can with some help be easily connected to personal data already gathered. If you have somebodies e-mail for example you can enhance the profile you have on them with the data in the Google Profiles.

With Google Profiles being pushed to be more of the ‘landing page’ for your online identity, Google also pushed the option to give the profile a nicer url, namely with your username in it. Google Profiles can either look like this: https://profiles.google.com/12345678901234567890 or this: https://profiles.google.com/USERNAME. The last one of course looks nicer, but also shows your username in the Google Profiles and can connect the data to your e-mail.

Google specifically mentions in their privacy settings that this can make your name more visible in the search results:

“To make it easier for people to find your profile, you can customize your URL with your Google email username. (Note this can make your Google email address publicly discoverable.)”

Because these connections can be made it is much more easy to actually make foul use of the data. This is something which spammers and phishing experts will gladly make use of. Even though Google ‘officially’ isn’t doing anything wrong, the data being out there and downloadable that easy is something which doesn’t seem right.

Again, it is clear that Google has to watch its steps and that you need to be careful about what you actually put on the web. It looks like with data elements like this being all over the web it will become inevitable that somebody will be starting to connect the dots, and the data…

AUTHORED BY:
h

Bas van den Beld is a speaker, trainer and online marketing strategist. Bas is the founder of Stateofdigital.com. -- You can hire Bas to speak, train or consult.
  • http://www.andil.co.uk Andy

    Heaven forbid someone collects the data that I’ve put online in my public profile. I don’t understand worries like this about online privacy, when the users have knowingly put that information there in the first place. If the user doesn’t want their information available, they don’t HAVE to create a public profile.

    • http://www.basvandenbeld.com Bas van den Beld

      Completely true, but do you think many people will realize that? And can you see the spamming-possibilities here?

  • David

    The phonebooks contain troves of information about people and they have been distributed for years.
    It’s *public* data, and we should stop being so alarmists about these topics.

    • http://hauntingthunder.wordpress.com/ Maurice

      @david well you only normaly get your local phone book not the whole thing and most countries dont allow reverse lookups of phone books and have ex directory for a reason – talk to any Phone Company employee about what what might happen if they looked up the addess for a phone number for a “friend”

      Jelous Exes tracking down their partners and Crooks finding the adress of people in the whitness protection system have lead to several murders in the UK.

      I do wonder if at some stage a stalker or ex husband will use Google profiles to track some one down and do them harm.

  • David

    For the records: ‘Tweets’ are also public.

    • http://www.basvandenbeld.com Bas van den Beld

      I agree David, it is public data, and we maybe shouldn’t be that afraid. But I do believe the danger lies within people not knowing about this (hence I publish about it) and the danger is that there will be those who start collecting public data and making a database out of it.

      As you may know e-mail regulations say that e-mail adresses can only be stored in specific databases with permission of the owner of the e-mail adress. If I download all the Google-adresses, I am doing something which is not allowed. Same goes for adresses for example on websites, you cannot ‘just take them’ and put them in a database.

      The problem here is that Google is doing nothing wrong, but it is providing those who might do wrong a very useful tool.

  • http://yoyoseo.com Dana

    OUCH! This should or shouldn’t scare us? Well, your point to David above is the kicker. It’s one thing to have public information streaming in disparate locations, but to have it all in one place that is opening accessible for people to see and correlate our conversations, comings and goings… THAT is a grave concern.

    Thanks for making us aware.

  • http://remark.no Herbert

    Exactly, it “doesn’t seem right”. But that’s between my ears.
    What is right, is what is according to the TOS when I signed up.
    And frankly, I don’t know all the details in it, and so you get this “this doesn’t seem right” feeling..

    People walk away from their responsibility and throw away the ability to think for themselves.
    Face – the – consequences. Please.

    Now I sound all grumpy etc, but I really think this is a good thing.
    If only laziness wouldn’t beat the crap out of awareness.. :P

  • Pingback: A wave out to all my Google+ friends « Fran's Computer Services' Blog

Nice job, you found it!

Now, go try out the 12th one:

Use Google Translate to bypass a paywall...

Ran into a page you can't read because it is blocked or paywalled? Here's a quick trick (doesn't always work, but often does!):

Type the page into Google translate (replace the example with the page you want):

http://translate.google.com/translate?sl=ja&tl=en&u=http://example.com/

How about that!?

Like this 12th trick? Tell others they need to look for this trick on our page: http://www.stateofdigital.com/search-hacks-marketers/

Or Tweet: Found the secret 12th one!