National Science Foundation Funds Purdue Data-Anonymization Project
A group of researchers from Purdue University has been awarded $1.5 million from the National Science Foundation to help fund an ongoing project that's investigating how well current techniques for anonymizing data are working and whether there's a need for better methods.
The grant will help the researchers further their research, which includes work from computer scientists and linguists, who are looking at ways in which people can still be identified through textual clues even after explicitly identifiable data has been removed. The Purdue anonymization project has been ongoing for some time, and also includes researchers from a number of other institutions, including Indiana University and the Kinsey Institute.
The question of how well data anonymization works has become an important one in recent years as the volume of data collected by advertisers, merchants, Web sites, health care organizations and other companies has increased exponentially. That data is the lifeblood of many of these organizations, and they mine and analyze it constantly for new insights into customer behavior, buying patterns and potential marketing opportunities.
Editor's Pick
Consumers in many cases know little about how their data is collected, analyzed and sold to other companies, and privacy advocates have been putting pressure on a variety of organizations to improve their disclosures, as well as their efforts to keep user data private. By way of compromise, some organizations have taken to anonymizing certain kinds of data by removing identifiable portions, such as names, birth dates and Social Security numbers. And many data-protection laws have carved out exemptions for data breaches that involve anonymized data.
But there are questions about how well those techniques work, as well as whether the subsequent analysis of anonymized data has any validity.
"Textual data, even when explicit identifiers are removed (names, dates, locations), can contain highly identifiable information. For example, a sample of chief complaint fields from the Indiana Network for Patient Care (INPC) found several instances of "phantom limb pain". Amputees can be visually identifiable, but the HIPAA Safe Harbor rules do not list this as "identifying information". Any policy explicitly listing all types of identifying data is likely to fail. Through a joint effort with computer science and linguistics, the project is developing new methods to remove specific details from text while preserving meaning, eliminating such highly identifiable information without a priori knowledge of what would be identifying," the Purdue team's project page explains.
The project, led by Chris Clifton, Victor Raskin, Chyi-Kong Chang, and Luo Si at Purdue, is a long-term effort that encompasses not just computer science approaches, but also linguistic analysis.
Commenting on this Article is closed.
Today's Most Popular
- Anatomy of a LulzSec Attack 'Singles Out' Web 2.0 Weakness
- OPINION: Are Anonymous Members Forged in the Crucible of IT Compliance?
- Defense Contractor Northrop Grumman Hiring For Offensive Cyber Ops
- Google to Notify Users of DNSChanger Infections Ahead of July 9 Deadline
- Facebook Cancellation Malware Disguised As Adobe Update Making Rounds
Most Commented Stories
-
Forget 'Brogrammers,' Women Have The Edge In DEFCON Social Engineering Contest (9)
-
Defense Contractor Northrop Grumman Hiring For Offensive Cyber Ops (10)
-
HULK DDoS Tool Smash Web Server, Server Fall Down (4)
-
Author of LilyJade Facebook Plugin Ignores Facebook Cease-and-Desist (3)
-
The Internet Crime Complaint Center recently warned of malware targeting travelers connecting to Wi-Fi. When traveling, do you (1)
Newsletter Sign-up
Take Our Poll
Listen to Latest Podcasts
-
-
You are missing some Flash content that should appear here! Perhaps your browser cannot display it, or maybe it did not initialize correctly.
-
You are missing some Flash content that should appear here! Perhaps your browser cannot display it, or maybe it did not initialize correctly.



