Pawel Jurczyk
about me  |  research projects  |  publications
FRIL - Fine-Grained Record Integration and Linkage Tool
FRIL is a Java-based fine-grained probabilistic record integration and linkage tool that incorporates a rich collection of record distance metrics, search methods, and analysis tools. Along its workflow, FRIL provides a rich set of user-tunable parameters augmented with graphic visualization tools to assist users in understanding the effects of parameter choices.
List of selected publications:
  • Pawel Jurczyk, James J. Lu, Li Xiong, Janet D. Cragan, Adolfo Correa, Fine-grained record integration and linkage tool, Birth defects research. Part A, Clinical and molecular teratology, 2008 Nov;82(11):822-9.
  • Pawel Jurczyk, James J. Lu, Li Xiong, Janet D. Cragan, Adolfo Correa, FRIL: A Tool for Comparative Record Linkage, American Medical Informatics Associations (AMIA) 2008 Annual Symposium
DObjects - Distributed Data Objects Framework
The project is an attempt to build general distributed data objects framework that facilitates using data from many heterogeneous data sources. In the high-level overview, a system administrator specifies the general configuration of data objects (object names along with attributes, data mappings and locations). Nodes connecting to the system provide data in the form defined above.
Even though the system physically consists of many independent nodes, logically it can be considered to be one big metasystem. Users willing to use the system connect to any of nodes and can query for any data which is available at any physical location.
Initial implementation of the system is being developed using the H2O metacomputing framework.
The research in this area is still in the initial stage. During further investigation we will investigate such areas as fault tolerance, dynamic reconfiguration, background/continuous queries and data synchronization issues in dynamic environments.
List of selected publications:
  • Pawel Jurczyk and Li Xiong, Dynamic Query Processing for P2P Data Services in the Cloud, DEXA '09 (to appear)
  • Pawel Jurczyk and Li Xiong, DObjects: Enabling Distributed Data Services for Metacomputing Platforms (Demo), VLDB '08
  • Pawel Jurczyk, Li Xiong and Vaidy Sunderam, DObjects: Enabling Distributed Data Services for Metacomputing Platforms, ICCS '08 Best Paper Award
Discovering Authorities in Question Answer Communities
Question-Answer portals such as Naver and Yahoo! Answers are quickly becoming rich sources of knowledge on many topics which are not well served by general web search engines. Unfortunately, the quality of the submitted answers is uneven, ranging from excellent detailed answers to snappy and insulting remarks or even advertisements for commercial content. Furthermore, user feedback for many topics is sparse, and can be insufficient to reliably identify good answers from the bad ones. Hence, estimating the authority of users is a crucial task for this emerging domain, with potential applications to answer ranking, spam detection, and incentive mechanism design.
The research resulted in the following publications:
  • Pawel Jurczyk and Eugene Agichtein, HITS on Question Answer Portals: an Exploration of Link Analysis for Author Ranking (poster), ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR), 2007
  • Pawel Jurczyk and Eugene Agichtein, Discovering Authorities in Question Answer Communities by Using Link Analysis (poster), Conference on Information and Knowledge Management (CIKM), 2007
Peer-to-Peer computing with H2O and JXTA

H2O is Java-based, component-oriented, lightweight resource sharing platform for metacomputing. It allows deployment of services into container not only by container owners, but by anyauthorized clients or third parties. As a communication mechanism,H2O uses RMIX that is interoperable and extensible communication library. JXTA technology is a set of open protocols that allows any connected device on a network to communicate and collaborate in a P2P manner.

The main goal of this work is to build a uniform global computational network using H2O distributed computing framework and JXTA P2P technology. This computational network will give users new possibilities in building distributed computing systems: H2O kernels behind firewalls will be accessible to users and group management in JXTA will bring a possibility of creating virtual groups of kernels enabling dynamic ad-hoc collaborations.

The research resulted in the following publications:
  • Pawel Jurczyk, Maciej Golenia, Maciej Malawski, Dawid Kurzyniec, Marian Bubak, Vaidy Sunderam, A System for Distributed Computing Based on H2O and JXTA, in: Bubak, M., Turala, M., Wiatr, K. (Eds.), Proceedings of Cracow Grid Workshop - CGW'04, December 13-15 2004, ACC-Cyfronet UST, 2005, Krakow, pp. 257-268.
  • Pawel Jurczyk, Maciej Golenia, Maciej Malawski, Dawid Kurzyniec, Marian Bubak, Vaidy Sunderam, Enabling Remote Method Invocations inPeer-to-Peer Environments: RMIX over JXTA, in PPAM 2005, Poznan, Poland, 2005.

This research was also the subject of my M.S Thesis: Peer-to-Peer computing with H2O and JXTA.