Millions of Social Profiles Leaked by Chinese Data-Scrapers

socialark data exposure social media

A cloud misconfig by SocialArks exposed 318 million records gleaned from Facebook, Instagram and LinkedIn.

More than 400GB of public and private profile data for 214 million social-media users from around the world has been exposed to the internet – including details for celebrities and social-media influencers in the U.S. and elsewhere.

The leak stems from a misconfigured ElasticSearch database owned by Chinese social-media management company SocialArks, which contained personally identifiable information (PII) from users of Facebook, Instagram, LinkedIn and other platforms, according to researchers at Safety Detectives.

The server was found to be publicly exposed without password protection or encryption during routine IP-address checks on potentially unsecured databases, researchers said. It contained more than 318 million records in total.

2020 Reader Survey: Share Your Feedback to Help Us Improve

SocialArks’ data-management platform is used for programmatic advertising and marketing. It bills itself as a “cross-border social-media management company dedicated to solving the current problems of brand building, marketing, marketing, social customer management in China’s foreign trade industry.”

The data included reams of North American users’ information. Source: Security Detectives.

The affected server, hosted by Tencent, was segmented into indices in order to store data obtained from each social-media source, which allowed researchers to look into the data further.

“Our research team was able to determine that the entirety of the leaked data was ‘scraped’ from social-media platforms, which is both unethical and a violation of Facebook’s, Instagram’s and LinkedIn’s terms of service,” researchers said, in a Monday blog post.

The scraped profiles included 11,651,162 Instagram user profiles; 66,117,839 LinkedIn user profiles; 81,551,567 Facebook user profiles; and 55,300,000 Facebook profiles that were deleted within a few hours after the open server was discovered.

The public profile data included biographies, profile pictures, follower totals, location settings, contact details such as email addresses and phone numbers, number of followers, number of comments, frequently used hashtags, company names, employment position and more.

“Social media data scraped for marketing purposes will inevitably include sensitive information,” Jack Mannino, CEO at nVisium, told Threatpost. “For every privacy-conscious person using social media, there is an exponentially greater number of people publicly sharing intimate details about their private lives. To protect yourself, restrict public access to your profile and media assets, be sensible about what you post online, and be careful what permissions you grant to applications that may abuse, misuse or steal your information.”

However, in addition to the collating of publicly available data, the database also included, inexplicably, private data for social-media users.

“SocialArks’ database stored personal data for Instagram and LinkedIn users such as private phone numbers and email addresses for users that did not divulge such information publicly on their accounts,” researchers said. “How SocialArks could possibly have access to such data in the first place remains unknown…It remains unclear how the company managed to obtain private data from multiple secure sources…Moreover, the company’s server had insufficient security and was left completely unsecured.”

Threatpost has reached out to SocialArks for more information.

The database was secured by SocialArks the same day that Security Detectives alerted the company to the issue.

SocialArks suffered a similar data breach in August, which affected 66 million LinkedIn users, 11.6 million Instagram accounts and 81.5 million Facebook accounts – about 150 million in all. The information exposed also consisted of scraped, publicly available data such as full names, country of residence, place of work, position, subscriber data and contact information, as well as direct links to profiles.

Having a central repository for such information opens the door to high-volume, automated social-engineering attacks, experts warned.

“Most data scraping is completely innocuous and carried out by web developers, business intelligence analysts, honest businesses such as travel booker sites, as well as being done for market research purposes online,” the researchers said. “However, even if such data is obtained legally – if it is stored without adequate cybersecurity, large leaks affecting millions of people can occur. When private information including phone numbers, email addresses and birth information is extracted and/or leaked, criminals are empowered to commit heinous acts including identity theft and financial fraud.”

Dirk Schrader, global vice president at New Net Technologies, said that the fact the scraping took place at all – public or private information – is in itself of interest.

“Public profiles have been scraped before and the giants in that space usually try to block mass scraping attempts as the intention behind is to get access to their ‘oil,'” he told Threatpost. “Why it hasn’t worked in this case would be an interesting fact to know. As a likely affected LinkedIn user, my choices are limited. Either I accept that scraping will happen, or I can reduce my profile which limits my ability to make business connections to a certain extent. How much information a user provides is their choice. Scraping itself, especially when the data collected is so badly secured, increases the likelihood to be targeted with specific attacks and unwanted emails.”

Supply-Chain Security: A 10-Point Audit Webinar: Is your company’s software supply-chain prepared for an attack? On Wed., Jan. 20 at 2p.m. ET, start identifying weaknesses in your supply-chain with actionable advice from experts – part of a limited-engagement and LIVE Threatpost webinar. CISOs, AppDev and SysAdmin are invited to ask a panel of A-list cybersecurity experts how they can avoid being caught exposed in a post-SolarWinds-hack world. Attendance is limited: Register Now and reserve a spot for this exclusive Threatpost Supply-Chain Security webinar – Jan. 20, 2 p.m. ET.

 

Suggested articles