
De-Anonymization
De-anonymization is a technique used in data mining that attempts to re-identify encrypted or obscured information. In the age of big data where sensitive information about a user’s online activities are shared instantaneously through cloud computing, data anonymization tools have been employed to protect users’ identities. De-anonymization, also referred to as data re-identification, cross-references anonymized information with other available data in order to identify a person, group, or transaction. De-anonymization reverses the process of anonymization by matching shared but limited data sets with data sets that are easily accessible online. Although data scientists use these strategies to sever sensitive information from the shared data, they still preserve the original information, thereby opening doors for the possibility of re-identification.

What Is De-Anonymization?
De-anonymization is a technique used in data mining that attempts to re-identify encrypted or obscured information. De-anonymization, also referred to as data re-identification, cross-references anonymized information with other available data in order to identify a person, group, or transaction.



Understanding De-Anonymization
The technology-savvy era is rapidly disrupting the traditional way of doing things across various sectors of the economy. In recent years, the financial industry has seen a lot of digital products introduced to its sector by fintech companies. These innovative products have promoted financial inclusion whereby more consumers have access to financial products and services at a lower cost than traditional financial institutions allow. The rise in the implementation of technology has brought about an increase in the collection, storage, and use of data.
Technology tools like social media platforms, digital payment platforms, and smart phone technology have unveiled a ton of data used by various companies to enhance their interaction with consumers. This ton of data is called big data, and is a cause for concern among individuals and regulatory authorities calling for more laws that protect the identities and privacy of users.
How De-Anonymization Works
In the age of big data where sensitive information about a user’s online activities are shared instantaneously through cloud computing, data anonymization tools have been employed to protect users’ identities. Anonymization masks the personally identifiable information (PII) of users transacting in various fields like health services, social media platforms, e-commerce trades, etc. PII includes information like date of birth, Social Security Number (SSN), zip code, and IP address. The need to mask the digital trails left behind by online activities have led to the implementation of anonymization strategies like encryption, deletion, generalization, and perturbation. Although data scientists use these strategies to sever sensitive information from the shared data, they still preserve the original information, thereby opening doors for the possibility of re-identification.
De-anonymization reverses the process of anonymization by matching shared but limited data sets with data sets that are easily accessible online. Data miners can then retrieve some information from each available data set to put together a person’s identity or transaction. For example, a data miner could retrieve a data set shared by a telecommunications company, a social media site, an e-commerce platform, and a publicly available census result to determine the name and frequent activities of a user.
How De-Anonymization Is Used
Re-identification can be successful when new information is released or when the anonymization strategy implemented isn’t done properly. With a vast supply of data and limited amount of time available per day, data analysts and miners are implementing shortcuts known as heuristics in making decisions. While heuristics saves valuable time and resources in combing through a data set, it could also create gaps that could be taken advantage of if the wrong heuristic tool was implemented. These gaps could be identified by data miners seeking to de-anonymize a data set for either legal or illegal purposes.
Personally identifiable information gotten illegally from de-anonymization techniques can be sold in underground marketplaces, which are also a form of anonymization platforms. Information that falls into the wrong hands can be used for coercion, extortion, and intimidation leading to privacy concerns and enormous costs for businesses who fall victims.
De-anonymization can also be used legally. For example, the Silk Road website, an underground marketplace for illegal drugs, was hosted by an anonymized network called Tor, which uses an onion strategy to obfuscate the IP addresses of its users. The Tor network also hosts a couple of other illegal markets trading in guns, stolen credit cards, and sensitive corporate information. With the use of complex de-anonymization tools, the FBI successfully cracked and shut down Silk Road and sites engaging in child pornography.
Success on re-identification processes have proved that anonymity is not guaranteed. Even if groundbreaking anonymization tools were implemented today to mask data, the data could be re-identified in a couple of years as new technology and new data sets become available.
Related terms:
Big Data
Big data refers to large, diverse sets of information from a variety of sources that grow at ever-increasing rates. read more
Data Anonymization
Data anonymization seeks to protect private or sensitive data by deleting or encrypting personally identifiable information from a database. read more
Data Breach
A data breach is an unauthorized access and retrieval of sensitive information by an individual, group, or software system. read more
Data Mining
Data mining is a process used by companies to turn raw data into useful information by using software to look for patterns in large batches of data. read more
Financial Technology (Fintech)
Fintech, a portmanteau of 'financial technology,' is used describe new tech that seeks to improve and automate the delivery and use of financial services. read more
Heuristics
Heuristics are a problem-solving method that uses shortcuts to produce good-enough solutions within a limited time. read more
Personally Identifiable Information (PII)
Personally identifiable information (PII) is information that, when used alone or with other relevant data, can identify an individual. read more
Silk Road
The Silk Road was a digital black market platform that was popular for hosting money laundering activities and illegal drug transactions using cryptocurrencies for payment. read more
Stealth Address (Cryptocurrency)
Stealth addresses hide the identity of the receiver of a blockchain transaction, ensuring stronger privacy and anonymity on the Monero network read more