Researchers have discovered that individual participants in single-cell gene expression datasets are vulnerable to linking attacks, where hackers can uncover private genetic and physical trait information without consent. This article explores the implications of this finding and the importance of developing robust privacy policies to protect research participants.
Related topics: Single-cell RNA sequencing, Genetic privacy, Ethical issues in biotechnology

Revealing Vulns That Are Yet to Be Found
In a short period, single-cell gene expression analysis has transformed our knowledge of how complex biological systems are controlled in health and disease. However, these accessibility advances in data have prompted serious debate over the privacy of research participants. Historically, researchers have considered single-cell datasets somewhat less at risk of information leakage due to the inherent “noise” or stochastic nature of the data.
Published in the journal Cell, this groundbreaking research suggests quite the opposite is true. However, researchers based at the New York Genome Center, Columbia University, and Brown University found that cell identities in single-cell gene expression datasets can be susceptive to “linking attacks”. Such attacks allow hackers to infer the private genetic and phenotypic information of research participants without requiring access to the original consent forms or study metadata.
Article name: Hacking the Noise — A Disturbing Discovery
To begin, the researchers collected data from a Lupus study and the OneK1k cohort applying a filter-matching tool to link individuals to genetic and phenotypic data using publicly available bulk expression quantitative trait loci (eQTLs). The most surprising finding was that even more accurate linking could then be achieved with cell-type specific eQTLs.
In addition, they showed that even if the eQTL data is not available, one can still link individuals to their corresponding genetic and phenotypic profiles. With this approach, they trained a model to successfully predict genetic data that individuals had not agreed to share but for the smaller group.
Data Privacy and Future Research Implications
Privacy Concerns- Linking of individuals across different datasets (irrespective of their health status or the nature of the data) is a severe privacy concern. As corresponding author Dr. Gamze Gürsoy explained, “The ability to take data that was generated in a different lab and even processed on a different method, and then use it to link individuals with an entirely separate anonymized dataset is somewhat surprising and brings up an actual privacy concern for single-cell data.”
This finding highlights an imminent requirement for the adoption of explicit consent policies, which robustly outline the privacy risks to donors of single-cell data. Policy makers and researchers however must take charge in this area to create laws and legislation that make it difficult for attackers to exploit this sensitive information. Approaching these privacy concerns allows for full utilization of the enormous capacity of single-cell data to advance biological research while prioritizing the rights and protection afforded to participants in the study.