Compressed tries (Patricia tries)

• Compressed tries

• A compressed trie is a trie with one additional rule:

 Each internal node has   ≥ 2   children

Such compacted trie is als known as:

• In order to enforce the above rule, the labels are generalized:

 Each node is labeled with a string (multiple characters)         (The label used to be a single character)

• Converting a standard trie to a compressed tree

• Redundant node:

• An internal node v is redundant if

 Node v is not the root node,       and          Node v has 1 child node

Example:

• Redundant chain of edges:

• A chain of edges:

(v0,v1), (v1,v2), ..., (vk−1,vk)

is redundant if:

 Nodes   v1, v2, ..., vk−1   are redundant Nodes   v0   and   vk   are not redundant

Example:

• Compression algorithm:

• Replace:

 a redundant chain or edges   (v0,v1), (v1,v2), ..., (vk−1,vk)

by one edge:

 (v0 ,vk)

• Replace:

 the label vk

by the label:

 v1 v2 ... vk

Example:

• Before compression:

• After compression:

• What is different with the implementation of a compressed trie

• Recall how the standard trie is implemented:

• The compressed trie uses strings as keys:

• Find a node that has only one child node:

Result:

• Another example:

• Find a node that has only one child node:

Result:

• Caveat in Compressed Trie: Variable length labels used in nodes.....

• Labels used in Compressed trie:

• variable length strings

Example:

Fact:

 Variable length data items is a pain to store...

• Solution:

• Convert the variable length string representation to a fixed length string representation.

• The fixed length string representation method:

• Store the keys is a separate array of strings

• Store athe following triplet to represent a substring of one of the keys:

 ( keyword_index,   start_char_pos,   end_char_pos )

• Example:

• Compressed trie with Variable length string labels:

• Compressed trie with Fixed length string labels:

Here is the complete picture (with full nodes):

• Properties of the compress trie

• Properties:

• Each internal node has   ≥ 2   children   and   ≤ |Σ|   children

• A compressed trie T storing s strings (keys) has:   s   external nodes

• A compressed trie T storing s strings (keys) has:   O(s)   total number of nodes

• Because a compressed trie is in the worst case comparable to a binary tree

• In a binary tree:

 # external nodes = # internal nodes + 1

• Hence:

 # internal nodes in a compress trie   <   s          Total # nodes   <   2 × s