Suffix tries

• Indexing a text document for search operations

• Problem description"

• Given a text document:

 ``` xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.... ```

(Each x represents a character)

• Index the document so we can search for any pattern in the document.

• Prelude to a solution:

• A pattern can found starting in these places:

 ``` xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.... ||||||| ... ||||||+---------> |||||+----------> ||||+-----------> |||+------------> ||+-------------> |+--------------> +---------------> ```

• Specialized trie: the suffix trie

• Suffix trie:

 The suffix trie stores the set of strings that are all the possible suffixes of an input text

• Example:

• Input text: minimize

• All possible suffixes:

 minimize         inimize nimize imize mize ize ze e

• This is the same set of patterns that can be found in the input minimize:

 ``` minimize |||||||+ ===> e ||||||+- ===> ze |||||+-> ===> ize ||||+--> ===> mize |||+---> ===> imize ||+----> ===> nimize |+-----> ===> inimize +------> ===> minimize ```

• Storing all suffixes of minimize in a (ordinary) trie:

• Sort the suffixes (will help)

 ``` e e ze imize ize inimize mize ===> ize imize minimize nimize mize inimize nimize minimize ze ```

• Trie:

• The Compressed suffix trie:

This is also the:

 Suffix tree of the text minimize

• Usage of the suffix trie

• The suffix trie is used as a preprocessed index for fast keyword retrieval in an electronic document

• Example: indexing all patterns found in the document containing the text "minimize"

• Note:

 The leaf nodes store positions where the pattern starts in the document

• How to search for a pattern using the suufix tree:

• Example:

 Search for the pattern "im" in the document

• Traverse the suffix trie using the search pattern "im":