A challenging problem in data privacy is privacy-preserving data publishing. Straightforward approaches such as removing identifiers to ensure anonymity do not provide meaningful protection against inference attacks.
In this talk, I will describe a new framework to share sensitive datasets in a privacy-preserving way. I will show how to construct a mechanism to synthesize full data records using a probabilistic generative model. A key feature of this technique is that privacy is not achieved by modifying the generative model or adding noise. Instead, a privacy test is used to decide whether each synthesized record can safely be published. On the theoretical front, I will show that appropriately randomizing the privacy test yields differential privacy.
On the experimental front, I will apply the framework to various types of data, including census microdata, location trajectories, and images.