CS700:Graduate Seminar in Computer Science & Informatics

Achieving Privacy in Heterogeneous Data

There is an increasing need for individuals, companies, universities, and health care providers to share information with external researchers and institutions. The privacy of individuals must be preserved when releasing this data and in many cases is even required by law. A considerable amount of research in data privacy community has been devoted to formalizing the notion of identifiability and developing techniques for anonymization but are focused mainly on structured data. HIDE: Heterogeneous Information DE-identification is a framework for anonymizing (or de-identifying) both structured and unstructured data. This talk will introduce the information extraction and anonymization techniques used in the current implementation of HIDE, including Conditional Random Fields and k-anonymization. The talk will also introduce our current research directions, including how we plan to incorporate the recently developed Differential Privacy techniques into the HIDE framework.

James Gardner is a Ph.D. student in the Computer Science and Informatics Program at Emory University. His research interest includes Natural Language Processing, Information Retrieval, Artificial Intelligence, Data Mining, Data Privacy.