HIDE: Health Information DE-identification
Overview
Health informatics is receiving a tremendous amount of attention nationally and
locally, as a strategic area of technological development in the 21st century.
There are recent discussions about the development of national and regional health information network as well as bench-to-bedside translation of genomic information into practice. Recent provisions of standardization of health care transactions will make it faster and easier to share health information. However, such data sharing has been stymied by restrictions of the privacy, security and quality of the data.
The objective of this project is to develop an integrated and adaptive Health Information DE-identification (HIDE) framework for publishing and sharing health data while preserving data privacy. There are two research thrusts. First, novel techniques for de-identifying unstructured (text) data and an integration of the techniques for de-identifying structured data (relational) as well as unstructured data will be explored. Second, the important application requirements and algorithms to take them into account in the de-identification will be investigated and developed. The envisioned outcome of the project will be a suite of algorithms and techniques as well as a set of open source software tools that will allow medical information service providers as well as computer science researchers to manage and share privacy constrained data more effectively and efficiently. While the project
is focused on the health domain, the resulting algorithms and techniques will be widely applicable in various application domains.
People
James Gardner
Kumudhavalli Rangachari
Li Xiong
Publications
J. Gardner, L. Xiong. HIDE: A Health Information DE-identification System. To appear in 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS), 2008
L. Xiong, K. Rangachari. Towards Application-Oriented Data Anonymization. To appear in SIAM Workshop on Practical Privacy-Preserving Data Mining, in conjunction with SIAM Internatioal Conference on Data Mining (SDM), 2008
Collaborators
Christopher Flowers (Director of Oncology Informatics, Winship Cancer Institute)
Michael Graiser (Genetics, Winship Cancer Institute)
Acknowledgement
This research is supported partially by Emory University Research Committee Fund, Emory ITSC fund, and Emory University Faculty Startup Fund.
Any opinions, findings, and conclusions or recommendations expressed in the
project material are those of the authors and do not necessarily reflect
the views of the sponsors.
Last updated: 3/12/2008