With more and more victims of identity theft minted every day, figuring out if you’re one of the unlucky masses with a leaked email password is yeoman’s work. Now one security researcher is trying to make it easy with PwnedList.com, a Web site that collects leaked and stolen data, then tells Internet users whether their information is in it.
PwnedList is the brainchild of Alen Puzic, a security researcher who works for HP’s TippingPoint DVLabs on the Advanced Security Intelligence team. The biggest challenge, he says, is staying on top of the tsunami of leaked records – which are pouring in at a rate of 40,000 to 50,000 a week. Puzic chatted(*) with Threatpost editor Paul Roberts via Skype this week.
Paul Roberts: tell me about pwndedlist.com
Alen Puzic: Sure. It all started out as a small security project I was doing on the side. I had some free time at home and I thought it would be fun to spider the web and see how many account leaks I could find. So I wrote a simple crawler that would go through sites like pastebin and various underground forums, as well as Twitter, and harvest as much account data as possible. It was shocking to see how much data is just laying out there. I got thousands of hits in the first couple of hours alone.That got me thinking. If this data was so easily accessible to me, then I should do something to help people who’s (sp) accounts have been compromised before somebody with a less benevolent intent finds the data. So starting in early June of this year I started to design a web site, pwnedlist.com. The rest is sort of history.
Paul Roberts: What exactly were you searching for? Stolen data takes all kinds of forms.
Alen Puzic: In particular I was searching for the Gawker data dump from late last year as well as any other leaks that might be out there. I was focusing on Facebook logins for a while because I thought those would be popular. I also started to follow a bunch of “hacktivist” groups on Twitter and reading their feeds lead (sp) me in the right direction. I’ve also had a few benevolent hackers contribute data to the site anonymously. I still get anonymous submissions every now and then. The amount of data out there is ridiculous, and its not just limited to account credentials. There’s personal details such as phone numbers, addresses, and even worse, credit card numbers, but i don’t store those. What I had realized (was) that a lot of people were influenced by groups such (as Anonymous spin-off) Lulzsec and followed suit, and they treat data breaches as trophies, just dumping them onto the Internet for all the world to see. I figured somebody needs to help out, and provide a safe portal for the ordinary man to check if their accounts have been compromised. I never realized the amount of data harvested would grow so fast. I also didn’t expect to get such a good response from the users. I get tons of email from people thanking me for the web site and others who want to help us out and improve the site.
Paul Roberts: So how much data do you have? How many records? And what kind of data do you store?
Alen Puzic: So currently we have just a little over five million records collected. ( The number on the web site has not been updated yet.) The only data we stored in our databases are hashes of emails and usernames from account dumps. That’s done in case our Web server gets compromised and somebody gets access to the database, they can’t get anything but five million SHA512 hashes. In a private database we have a lot more data, as well as copies of all data leaks we’ve collected.
Paul Roberts: Ok. And all that data’s correlated? Are there plans to host or offer access to the other (non email) data?
Alen Puzic: Yes, we have correlations between all (the) data collected. In the future we plan to host hashes of credit cards and phone numbers…but that is something we’re still working on. Especially credit cards, as that is an extremely sensitive issue. We also have plans to alert companies of any sensitive data (documents, spreadsheets, etc) leaked from their network. We already had to do that twice this year.
Paul Roberts: So, as it turns out, my personal email address is in your database. If I gave it to you, could you tell me what other data you have?
Alen Puzic: I will be able to tell you the date we acquired the data and (very soon) the details of the data leak..such as the group that leaked it and the company the breach happened at.
Paul Roberts: How would you recommend people who read this interview use PwnedList.com?
Alen Puzic: I would recommend to folks to check their emails on pwnedlist on a monthly basis. Then when we add automated alerts they can setup notifications for all of their accounts and we’ll send them an email if we ever come accross an account of theirs.
Paul Roberts: Is this volunteer or for profit (or somewhere in between)?
Alen Puzic: Right now its most definitely volunteer and it will always remain so for the average user. However, when we decide to offer notifications for corporations we might charge for it.
Paul Roberts: How far back does the data go?
Alen Puzic: Our data only goes back as far as mid 2010. We have thought about getting older data but at some point it becomes irrelevant. Freshness of data is important.
Paul Roberts: How fast is the database growing? Are you still crawling or just counting on submissions?
Alen Puzic: We still both crawl and accept submissions. We crawl sites like pastebin, pastie.org, yourpaste.net, pastebay.com, justpaste.it. We follow the “right” people on Twitter and harvest links they post, and we get a feed from partners sites like www.cyberwarnews.info. I’d say 80% of our data comes from crawled data, the other 20% from submissions. We are growing at an average rate of 40-60 thousand accounts a week, sometimes less, sometimes a lot more. For example, we just recently got a submission of almost 300,000 accounts.
Paul Roberts: Where do most of these records come from? Are they from big hacks like Heartland Payments or smaller hacks or inadvertent leaks – misconfigured systems, open file shares/ftp servers, etc.
Alen Puzic: My perception is that most of our data comes from smaller hacks of inadvertent leaks. I’d say about 60% of all the data I’ve collected comes from those two. The other leaks are more targeted, like the Sony hacks.
(*) This interview was transcribed from a Skype chat. We added punctuation and filled in some missing words, but otherwise tried to leave it as is.