SHARE: Secure Heterogeneous information AggREgation

Overview

Advances in distributed service-oriented computing and global communications have formed a strong technology push for large scale data integration among organizations and enterprises. However, concerns about data privacy increasingly become an important aspect of information integration as organizations and individuals do not want to reveal their private databases for various legal and commercial reasons. For examle, it is widely observed that multiple organizations in the same market sectors are actively competing as well as collaborating with constantly evolving alliances. Many such organizations want to find out aggregate statistics about sales in the sector without disclosing sales data in their private databases. Privacy-preserving data sharing is becoming increasingly important for large scale mission-critical data integration applications. Ideally, given a database query or a data mining task spanning multiple private databases, we wish to compute the answer to the query without revealing any additional information of each individual database apart from the query or mining result. One way to tackle this problem in practice is to relax the privacy constraint to allow efficient information integration while minimizing the amount of information disclosure. There is an increasing need for developing efficient specialized protocols that facilitate such privacy-preserving data integration tasks.

Our research formalizes the notion of loss of privacy in terms of information revealed and aims to develop efficient decentralized protocols for important operations that enable information integration across multiple private databases with minimal information disclosure. We perform formal analysis and experimental evaluations of the protocols in terms of its correctness, efficiency and privacy characteristics.

People

  • Pawel Jurzcyk
  • Li Xiong
  • Subramanyam Chitti (Oracle)
  • Ling Liu (Georgia Tech)

    Publications

  • L. Xiong, P. Jurczyk, L. Liu. Mining Distributed Private Databases using Random Response Protocols. In NSF Symposium on Next Generation of Data Mining and Cyber-Enabled Discovery for Innovation (NGDM), October, 2007.

  • L. Xiong, S. Chitti, L. Liu. Preserving Data Privacy in Outsourcing Data Aggregation Services. ACM Transactions on Internet Technology (TOIT), 7(3), August, 2007.

  • L. Xiong, S. Chitti, L. Liu. Mining Multiple Private Databases using a kNN Classifier. In ACM Annual Symposium of Applied Computing (SAC), Data Mining Track, Seoul, Korea, March, 2007

  • L. Xiong, S. Chitti, L. Liu. K Nearest Neighbor Classification across Multiple Private Databases. In 15th ACM Conference on Information and Knowledge Management (CIKM 2006), Arlington, November, 2006 (Poster Paper)

  • L. Xiong, S. Chitti, L. Liu. Topk Queries across Multiple Private Databases. In 25th International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, June, 2005

    Acknowledgement

    This research was previously supported by NSF ITR Grant and is currently supported by Emory College faculty startup fund.

    Any opinions, findings, and conclusions or recommendations expressed in the project material are those of the authors and do not necessarily reflect the views of the sponsors.


    Last updated: 1/16/2007