Towards Protecting Sensitive Text with Differential Privacy

Adam Roegiest, Alexander K. Hudek, and Sam Fletcher • August 2021 • IEEE TrustCom 2021

Natural language processing can often require handling privacy-sensitive text. To avoid revealing confidential information, data owners and practitioners can use differential privacy, which provides a mathematically guaranteeable definition of privacy preservation. In this work, we explore the possibility of applying differential privacy to feature hashing. Feature hashing is a common technique for handling out-of-dictionary vocabulary, and for creating a lookup table to find feature weights in constant time. Traditionally, differential privacy involves adding noise to hide the true value of data points. We show that due to the finite nature of the output space when using feature hashing, a noiseless approach is also theoretically sound. This approach opens up the possibility of applying strong differential privacy protections to NLP models trained with feature hashing. Preliminary experiments show that even common words can be protected with (0.04, 10^-5)-differential privacy, with only a minor reduction in model utility.

Read the Paper

Towards Protecting Sensitive Text with Differential Privacy

Read more papers

Variations in Assessor Agreement in Due Diligence

The Utility of Context When Extracting Entities From Legal Documents