CS700:Graduate Seminar in Computer Science & Informatics

MCDB: The Monte Carlo Database System

Analysts working with large data sets often use statistical models to "guess" at unknown, inaccurate, or missing information associated with the data stored in a database. For example, an analyst for a manufacturer may wish to know, "What would my profits have been if I'd increased my margins by 5% last year?" The answer to this question naturally depends upon the extent to which the higher prices would have affected each customer's demand, which is undoubtedly guessed via the application of some statistical model. In this talk, I'll describe MCDB, which is a prototype database system that is designed for just such a scenario. MCDB allows an analyst to attach arbitrary stochastic models to the database data in order to "guess" the values for unknown or inaccurate data, such as each customer's unseen demand function. These stochastic models are used to produce multiple possible database instances in Monte Carlo fashion (a.k.a. "possible worlds"), and the underlying database query is run over each instance. In this way, fine-grained stochastic models become first-class citizens within the database. MCDB can be used for diverse tasks such as risk assessment and large-scale, data-driven simulation.
Chris Jermaine received a BA from the Mathematics Department at UCSD, an MSc from the Computer Science and Engineering Department at OSU (advisor Renee Miller), and a PhD from the College of Computing at Georgia Tech (advisor Ed Omiecinski). He is the recipient of a 2008 Alfred P. Sloan Foundation Research Fellowship, a National Science Foundation CAREER award, and a 2007 ACM SIGMOD Best Paper Award. He has been at Rice since January, 2009, and had previously been on the faculty of the computer science department at the University of Florida. Chris is an associate editor for ACM Transactions on Database Systems, IEEE Transactions on Knowledge and Data Engineering, and the Very Large Database Journal.