### An amortization performance analysis of the Splay Tree

• Performance analysis in general

• Before you can study the performance of an operation you need to know exactly what that operation is doing....

• Work done in lookup, insert and delete

• Lookup (get(k)):

• Traverse tree to find k:

• Splay node k:

• Insert (put(k,v)):

• Find location in tree to insert k:

• Splay node k:

• Delete (remove(k)):

• Traverse tree to find k to delete:

• Splay the parent node of node k:

• Now we need to count operations:

 A zig/zag operation is more "costly" than traverse one link We can ignore the cost to traverse the tree to find the key k

• Question:

• What is the cost of these operations:

 Zig-zig           Zig-zag Zig

• Classic tree-rotation operations

• Right-rotation:

• Left-rotation:

• A Zig-zig operation = 2 tree rotation operations

• Zig-zig (1):

• The zig-zig (1) operation:

• Zig-zig(1) = right-right

• The zig-zig (2) operation:

 Zig-zig(2) = left-left (It's similar to zig-zig(1))

• A Zig-zag operation = 2 tree rotation operations

• Zig-zag (1):

• The zig-zag (1) operation:

• Zig-zig(1) = right-left

• The zig-zag (2) operation:

 Zig-zig(2) = left-right (It's similar to zig-zag(1))

• A Zig operation = 1 tree rotation operations

• Zig (1):

• The zig (1) operation:

• Zig (1) = left

• The zig (2) operation:

 Zig (2) = right (It's similar to zig (1))

• Number of Tree-rotation operations to splay a node

• Recall:

 The zig-zig operation moves a node x up 2 levels The zig-zag operation alos moves a node x up 2 levels         The zig operation moves a node x up 1 level

• The cost to splay a node x at depth d (i.e., to make node x the new root node):

• To splay a node x at depth d (d = even), we need:

 d/2      zig-zig and/or zig-zag operations    = d   tree-rotation operations

• To splay a node x at depth d (d = even), we need:

 (d−1)/2      zig-zig and/or zig-zag operations        1                 zig operation    = d   tree-rotation operations

• Conclusion:

 Cost to splay node x at depth d = d tree rotation operations

• Definitions and Notations

• Notations:

• T = the splay tree with n nodes

• v = a node in T

 n(v) = number of nodes in the subtree rooted at v       r(v) = 2log(n(v))      (the rank of node v)

• root = the root node of the splay tree T

 n(root) = 2×n + 1 r(root) = 2log(2×n + 1)   ~=   2log(2×n)   =   2log(n) + 1

Example:

• Let e = a leaf node in T

 n(e) = 1 r(e) = 2log(1) = 0

• An amortization scheme to analyze the average performance of the Splay operation

• Prelude:

 Please don't ask how did they come up with this amortization scheme It's black magic :-) All I will do is show you that it work...

• Recall that the cost to splay a node with depth = d:

 c(splay node v at depth d) = d (tree rotation operations)

• Recall also that in an amortization analysis, we spread the cost by:

 Saving an additional (extra) tree rotation operations per splay tree operations (insert, delete, lookup) And we must make sure that the additional (extra) tree rotation operations saved will never become negative (i.e., have enough)

• Black magic: The balance (extra amount) of tree-rotation operations that you need to keep (save) in a node v of the splay tree is:

 Bal(v) = r(v)

Note:

 This is what I refered to as black magic How did they come up with this number ???

• Example: balance of tree-rotation operations kept in node in a splay tree

• Definition: r(T) = Total balance of extra operations in tree T

• Definition:

 r(T) = Total # extra tree operations stored in tree T

Example:

r(T) = lg(15) + lg(11) + 2lg(5) + 3lg(3)

• How the balance "Bal(v)" changes when you splay a node

• Example: splay tree before and after splay(node 3)

Notes:

 ``` Before splay(3): r(T) = lg(15) + lg(11) + 2lg(5) + 3lg(3) After splay(3): r'(T) = lg(15) + lg(11) + lg(9) + lg(5) + 3lg(3) Change in balance: Δ(r(T)) = r'(T) - r(T) = lg(9) - lg(5) ```

• The amortized cost of a splay operation

• We amortize the cost of a splay operation on a node x at depth d as follows:

• "Pay" d tree rotation operations (= the real cost to splay the node x at depth d)

• "Tag on" this extra number of operations:

 Δ(r(T))

• The amortized cost to splay a node x at depth d:

 Amortized cost of splay(x) = d + Δ(r(T))                   ....... (1)

• Claim:

• The total balance r(T) of any tree T resulting from a splay operation satisfies:

 r(T) ≥ 0

(This is trivial because we added in the difference so the total will never become negative...)

• \$64,000 question:

 What is the order of d + Δ(r(T)) ???

To answer this question, we must look at the amortized cost of a zig-zig, a zig-zag, a zig operation

Because: splay(x) = a sequence of these operations

• The change in r(T) caused by a zig-zig, zig-zag and zig operation --- Proposition 10.3 (Goodrich)

• Proposition 10.3

• Let &delta be the variation of r(T) (i.e., δ = (r(new tree) − r(old tree)) caused by a single splay sub step (a zig-zig, a zig-zag, or a zig)

• Then:

 δ   ≤   3(r'(x) − r(x)) − 2   if the sub step was a zig-zig or zig-zag δ   ≤   3(r'(x) − r(x))   if the sub step was a zig

• I will only show:

 δ   ≤   3(r'(x) − r(x)) − 2   if the sub step was a zig-zig

The proof for the zig-zag and zig step are similar to the zig-zig step (see Goodrich for details)

• Observation:

• The value r'(v) of a node v in the new tree is unchanged, except for the nodes x, y, and z that are involved in the zig-zig operation

• Reason:

 r'(v) = lg(n'(v)) The number of nodes n'(v) in the subtree rooted at node v only changes when v = x, y, or z:

Illustration:

• Therefore:

 ``` δ = r( new tree ) - r( old tree ) = r'(a) + r'(b) + .... + r'(x) + r'(y) + r'(z) - r(a) + r(b) + .... + r(x) + r(y) + r(z) = (r'(x) + r'(y) + r'(z)) - (r(x) + r(y) + r(z)) ```

• From the following relationships:

Because r(v) = lg(n(v)), we have that:

• r'(x) = r(z)
• r'(y) < r'(x)
• r(x) < r(y)    −r(y) < −r(x)

Therefore:

 ``` δ = (r'(x) + r'(y) + r'(z)) - (r(x) + r(y) + r(z)) ^^^^ ^^^^ = (r'(y) + r'(z)) - (r(x) + r(y)) ^^^^ < (r'(x) + r'(z)) - (r(x) + r(y)) = (r'(x) + r'(z)) - r(x) - r(y) ^^^^ < (r'(x) + r'(z)) - r(x) - r(x) < (r'(x) + r'(z)) - 2r(x) ........(2) ```

• We need one more relationship to get rid of r'(z):

• We will count nodes in subtrees:

From the figure, we see that:

 ``` n(x) = n(T1) + n(T2) + 1 n'(x) = n(T1) + n(T2) + n(T3) + n(T4) + 3 n'(z) = n(T3) + n(T4) + 1 ==> n'(x) = n(x) + n'(z) + 1 <==> n(x) + n'(z) + 1 = n'(x) <==> n(x) + n'(z) < n'(x) ......... (3) ```

(We need some math because we cannot simply take the log() function from both side !)

• Property:

 ``` if a + b < c then log(a) + log(b) < 2*log(c) - 2 Proof: 4*a*b a * b = ------- 4 4*a*b + a2 - a2 + b2 - b2 = ------------------------- 4 a2 + 2*ab + b2 - a2 + 2*ab - b2 = ------------------------------- 4 a2 + 2*ab + b2 - (a2 - 2*ab + b2) = ---------------------------------- 4 (a + b)2 - (a - b)2 = ------------------- 4 (a + b)2 ≤ --------- 4 ==> 2log( a*b ) ≤ 2log((a + b)2) - 2log(4) <==> 2log(a) + 2log(b) ≤ 2*2log((a + b)) - 2 (Use: a+b < c) ==> 2log(a) + 2log(b) ≤ 2*2log(c) - 2 ```

• Now we can simply the relationship further:

 ``` n(x) + n'(z) < n'(x) ......... (3) ==> lg(n(x)) + lg(n'(z)) < 2*lg(n'(x)) - 2 <==> r(x) + r'(z) < 2*r'(x) - 2 <==> r'(z) < 2*r'(x) - r(x) - 2 ........ (4) Use Equation (4) to reduce Equation (2) further: δ < r'(x) + r'(z) - 2r(x) ........ (2) < r'(x) + (2*r'(x) - r(x) - 2) - 2r(x) = 3r'(x) - 3r(x) - 2 Q.E.D ! ```

• The change in r(T) caused by splay(x) --- Proposition 10.4 (Goodrich)

• Proposition 10.4

• T = splay tree before applying operation splay(x)
The rank of T = r(T)

• T' = splay tree after applying operation splay(x)
The rank of T' = r'(T)

• Node x is at depth d

• Claim:

 r'(T) − r(T)  ≤  3(r(root) − r(x)) − d + 2

Proof:

• Splay(x) consists of:

 d/2 steps of zig-zig or zig-zag                                   if d = even        (d-1)/2 steps of zig-zig or zig-zag + 1 zig step         if d = odd

• Each zig-zig or zig-zag at step i transforms a tree and the change in r(T) is:

 ``` zig-zig zig-zag Ti -----------> Ti+1 δi = r'(Ti+1) - r(Ti) < 3( r'(x) - r(x) ) - 2 (Proposition 10.3) More suitable notation: δi < 3( ri+1(x) - ri(x) ) - 2 ```

• The splay(x) operation will cause this change:

 ``` T = T1 ---> T2 ---> T3 ---> .... ---> Tp = T' δ1 δ2 The change in r(T) caused by splay(x) is: r'(T) - r(T) = δ1 + δ2 + ... + δp = 3( r1(x) - r(x) ) - 2 (p terms) + 3( r2(x) - r1(x) ) - 2 + 3( r3(x) - r2(x) ) - 2 ... + 3( rp-1(x) - rp-2(x) ) - 2 + 3( r'(x) - rp-1(x) ) - 2 = -3r(x) + 3r'(x) - 2*p (p = d/2) = -3r(x) + 3r'(x) - d (If p = odd, we have a zig operation. We have not look at the bound for zig, just accept that we need to add little more to to total cost: 2) r'(T) - r(T) ≤ 3r'(x) - 3r(x) - d + 2 ...... (5) ```

• The node x is the root node in the new splay tree T', therefore:

 ``` r'(T) - r(T) ≤ 3r'(root) - 3r(x) - d + 2 ```

• The new splay tree T' has the same number of nodes as the original splay tree T, therefore:

 ``` r'(root) = log(n'(root)) = log(n(root)) = r(root) r'(T) - r(T) ≤ 3r(root) - 3r(x) - d + 2 Q.E.D. ```

• The order of d + Δ(r(T))

• Recall:

The amortized cost to splay a node x at depth d:

 Amortized cost of splay(x) = d + Δ(r(T))                   ....... (1)

• Recall also the \$64,000 question:

 What is the order of d + Δ(r(T)) ???

• We can now answer the \$64,000 question:

 ``` From Porposition 10.4: r'(T) - r(T) ≤ 3r(root) - 3r(x) - d + 2 <==> r'(T) - r(T) + d ≤ 3r(root) - 3r(x) + 2 <==> d + Δ(T) ≤ 3r(root) - 3r(x) + 2 ==> d + Δ(T) ≤ 3r(root) + 2 = 3*log(n(root)) + 2 = 3*log(2n+1) + 2 ==> d + Δ(T) = O(log(n)) ```

• Conclusion:

 The average running time of splay(x) is O(log(n))

• Insert and delete modified the number of nodes in the splay tree

But the change is very small (1 node compared to n nodes).

• Further analysis (which is very straightforward - see Goodrich) can complete the proof this claim:

 The average running time of insert, delete and lookup in splay trees  =  O(log(n))

I'll omit it; the most difficult part of the proof have been discussed...