Center for Comprehensive Informatics and Department of Biomedical Informatics Special Presentation

 

Speaker: Michael Franklin, PhD, Professor of Computer Science, University of California, Berkeley

Title: AMPLab:  Making Sense of Big Data with Algorithms, Machines and People

 

When: Wednesday, December 7, 2011, 10:00-11:00am

Where: Emory University, 208 White Hall (across the street from the Math/CS Building)

 

Abstract:

Organizations of all types are trying to figure out how to extract value from their data.   The key challenge is that the massive scale and diversity of the continuous flood of information we are faced with breaks existing technologies.  State-of-the-art Machine Learning algorithms do not scale to massive data sets.  Existing data analytics frameworks cope poorly with incomplete and dirty data and cannot process heterogeneous multi-format information.  Current large-scale processing architectures struggle with diversity of programming models and job types and do not support the rapid marshalling and unmarshalling of resources to solve specific problems.  All of these limitations lead to a Scalability Dilemma: beyond a point, our current systems tend to perform worse as they are given more data, more processing resources, and involve more people — exactly the opposite of what should happen.

To address these issues, a group of researchers from machine learning, systems, databases, and networking at Berkeley started a new five-year research effort called the AMPLab, where AMP stands for "Algorithms, Machines, and People".  AMPLab envisions a world where massive data, cloud computing, communication and people resources can be continually, flexibly and dynamically be brought to bear on a range of hard problems by huge numbers of people connected to the cloud via increasingly powerful mobile and other client devices.   The team is developing a new data analytics stack that implements this vision.   AMPLab's research is supported in part by 19 leading technology companies, including founding sponsors Google and SAP.

This talk will give an overview of the AMPLab motivation and research agenda and discuss several of the lab’s initial projects.   One such project, CrowdDB, is developing infrastructure to support hybrid cloud/crowd query answering systems - leveraging the very different skills, and performance, reliability, and cost characteristics of large groups of machines and large groups of people.   More information is at http://amplab.cs.berkeley.edu.

 

About the Speaker:

Michael Franklin is a Professor of Computer Science at UC Berkeley, focusing on new approaches for data management and data analysis.  At Berkeley he directs the Algorithms, Machines and People Laboratory (AMPLab).  He is founder and CTO of Truviso, Inc. a real-time data analytics company that enables customers to quickly make sense of diverse, high-speed, continuous streams of information. He is a Fellow of the Association for Computing Machinery, and a recipient of the National Science Foundation CAREER award and the ACM SIGMOD "Test of Time" award.   He was recently awarded the Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley.  He is currently serving as a committee member on the US National Academy of Science study on Analysis of Massive Data.   He received his Ph.D. from the University of Wisconsin in 1993.