OSDI 06 Seattle WiP: Pattern Mining Kernel Trace Data to Detect Systemic Problems
Christopher LaRosa, Li Xiong, Ken Mandelberg

Abstract:
Systemic performance problems resulting from inter-process or process-to-operating system interactions are difficult to diagnose using traditional system administration tools. Recent kernel-level tracing tools, including DTrace and the Linux Trace Toolkit (LTT), provide detailed process and operating system tracing functionality that may help users isolate these difficult-to-identify problems on deployment systems. However, using these tools to diagnose systemic problems typically involves writing a series of ad hoc scripts to control an in-kernel tracing system; these scripts often rely on application-level implementation knowledge. General, system-wide traces have not been useful for identifying the causes of systemic problems because users have been stymied by a lack of proper trace analysis tools. Manually inspecting the noisy and voluminous data generated by unrestricted kernel-level tracing remains nearly impossible.

Our key idea is to address the analysis functionality shortcomings of kernel tracing tools by applying data mining techniques, specifically frequent pattern mining, to general system-wide traces. These mining algorithms uncover systemic problems in trace data that escape the detection of traditional system monitoring utilities. While our system has a number of innovative features, we will focus on the novel application of frequent itemset mining, which allows for special treatment of the temporal characteristics of kernel trace data. Our treatment overcomes the challenges presented by large gaps in process execution and process reorderings introduced by scheduler. Our framework also includes a data filtering technique that allows versatile and flexible cross-attribute pattern mining and a suite of tools that maintains the full semantics of each trace event while allowing for efficient mining.

We will show one preliminary study with the problematic gtik gnome applet (version 2.0), using both LTT and DTrace data, where we successfully detect frequent interactions between gtik and X server that point directly to the problematic gtik application. Our second preliminary study shows the approach successfully detects a poorly implemented server application that spawns many short-lived processes. These systemic problems had previously only been identified through repeated, ad hoc use of DTrace in conjunction with traditional system administration tools. Our approach is an improvement in both efficiency and effectiveness in that it requires no custom script writing or implementation knowledge. We will briefly discuss our ongoing work including tightening our system's coupling with the operating system and developing a library of standard frequent patterns that can be used to eliminate output noise in the form of frequent, but uninteresting patterns.


Presentation Slides, Monday Nov. 6, 2006


MSKD: A System for Mining Kernel Trace Data
Christopher LaRosa

The introduction of low-impact, kernel-level tracing tools allows for comprehensive and transparent reporting of process and operating system activity. An operating system trace log provides detailed, time-series information about processoper trace analysis tools that can be used to extract useful knowledge from raw trace logs.

We develop a Mining System for Kernel Trace Data (MSKD) that fuses kernel tracing tools with data mining tools. Our system transforms the output of tracing tools into compact, mine-ready data. Our system sends this data into a maximal frequent itemset mining application, harvests the output of the mining application, and reports frequent execution patterns in a human-readable format for use in debugging or optimization tasks.

We implement prototypes of our system using dTrace, LINUX Trace Toolkit, and MAFIA. In this paper we detail our practical experience using these frameworks.

Presentation Slides, Aug. 14, 2006