CS700:Graduate Seminar in Computer Science & Informatics

Pattern Discovery Under Saturation Constraints

The problem of characterizing and detecting surprisingly recurrent sequence patterns such as substrings or motifs and related associations or rules is pursued ubiquitously in order to compress data, unveil structure, infer succinct descriptions, extract and classify features, etc. In Molecular Biology, some such patterns are variously implicated in facets of biological structure and function. Because of that, Pattern Discovery constitutes one of the most battered, flourishing and arguably useful applications of Computational Molecular Biology. The very notion of a pattern still embodies subtleties and ambiguities, as do related concepts such as class and structure. And the discovery, particularly on a massive scale, of surprising patterns and correlations thereof poses interesting methodological and algorithmic problems, some of which appear to be hardly surmountable. This talk proposes a brief account of algorithmic pattern discovery under constraints of saturation, displays some of its applications, and highlights issues, products and challenges emerged in recent and current work.
Alberto Apostolico is Professor in the College of Computing at Georgia Tech. His research interests are in the areas of algorithmic analysis and design and applications. His recent work deals with algorithms and data structures for combinatorial pattern matching and discovery problems as arising in text editing, data compression, picture processing, biomolecular sequence analysis, etc.