Analysis of sensitive information leakage in functional genomics signal profiles through genomic deletions

Nat Commun. 2018 Jun 22;9(1):2453. doi: 10.1038/s41467-018-04875-5.

Abstract

Functional genomics experiments, such as RNA-seq, provide non-individual specific information about gene expression under different conditions such as disease and normal. There is great desire to share these data. However, privacy concerns often preclude sharing of the raw reads. To enable safe sharing, aggregated summaries such as read-depth signal profiles and levels of gene expression are used. Projects such as GTEx and ENCODE share these because they ostensibly do not leak much identifying information. Here, we attempt to quantify the validity of this statement, measuring the leakage of genomic deletions from signal profiles. We present information theoretic measures for the degree to which one can genotype these deletions. We then develop practical genotyping approaches and demonstrate how to use these to identify an individual within a large cohort in the context of linking attacks. Finally, we present an anonymization method removing much of the leakage from signal profiles.

MeSH terms

  • Base Sequence
  • Data Anonymization*
  • Genetic Privacy*
  • Genomics*
  • Genotyping Techniques
  • Humans
  • Sequence Analysis, RNA
  • Sequence Deletion*