### A probabilistic (run time) analysis of Skip Lists

• Problem description

• Question:

 What is the running time of the get(k) operation on a Skip List ???

• In other words:

 How many entries in the skip list will the get(k) operation examine ?

• Related problems:

• The running time for the get(k) method will also determine the running times for put(k,v) and remove(k)

• Because:

 the put(k,v) and the remove(k) operation needs to find the entry first before they can perform the corrresponding operation (Once you have located the entry, the put(k,v) and the remove(k) will take O(1) number of operations to complete....)

• Preliminary result of the run time of the search operation in skip lists

• Consider the search operation in a skip list:

• Main loop in the search algorithm:

 ``` public HashEntry findEntry(String k) { HashEntry p; /* ----------------- Start at "head" ----------------- */ p = head; /* ========================================= Main search loop ========================================= */ while ( true ) { /* -------------------------------------------- Search RIGHT until you find a LARGER entry -------------------------------------------- */ while ( (p.right.key) != HashEntry.posInf && (p.right.key).compareTo(k) <= 0 ) { p = p.right; } /* --------------------------------- Go down one level if you can... --------------------------------- */ if ( p.down != null ) { p = p.down; } else break; // We reached the LOWEST level... Exit... } return(p); // Nearest p (<= k) } ```

• Scanning motion through a skip list:

• Number of entries visited:

Prelimary result of the run time of the get(k) operation:

 ``` # entries visited = # entries visited in level h-1 + # entries visited in level h-2 + .... + # entries visited in level 0 # entries visited = 1 + # right traversals on level h-1 + 1 + # right traversals on level h-2 + .... + 1 + # right traversals on level 0 # entries visited = h + # right traversals on level h-1 + # right traversals on level h-2 + ... + # right traversals on level 0 run time get(k) ~= h + h × ( avg # right traversals on one level ) ```

So the average running time of the get(k) operation is apprimately equal to:

 ``` Avg run time get(k) ~= Avg(h) + Avg(h) × ( avg # right traversals on one level ) ....... (1) ```

• There are 2 unknowns that we need to figure out:

 The average height (Avg(h)) of a Skip List The average # of right traversals on one level made by a search operation.

• Preliminary to computing Avg(h)

• Recall how we compute an average cost using frequencies:

 ``` Case Frequency Cost -------+-------------+-------- 1 f1 C1 2 f2 C2 .... n fn Cn ```

The average cost is equal to:

 ``` Avg Cost = f1×C1 + f2×C2 + .... + fn×Cn ```

• Recall also that:

 The (relative) frequency is approximately equal to the probability.

• Probability that a skip list has height h

• Fact:

• The height of a skip list is random

• We can only talk about the average height of a skip list

• Or:

 how likely (probability) that a skip list has height h....

• Height of a skip list - part 1:

• The height of a skip list is the height of the highest tower:

• What determines the height of a tower:

 The height of a tower is determined by the number of consecutive successful tosses (using a fair (50% success) coin)

• Height of one tower:

• Probability that the height of a tower = i:

Note:

 ``` 1 Ҏ[ height = i ] = --- 2i ```

• Height of a skip list - part 2:

• Height of a skip list:

Notes:

 ``` Skip list has height h = (height of tower 1 = h) OR (height of tower 2 = h) OR (height of tower 3 = h) ... OR (height of tower n = h) ```

• Probability that a skip list has height h:

 ``` Ҏ[ Skip list has height h ] = Ҏ[ (height of tower 1 = h) OR (height of tower 2 = h) OR (height of tower 3 = h) ... OR (height of tower n = h) ] ~= Ҏ[ (height of tower 1 = h) ] + Ҏ[ (height of tower 2 = h) ] + Ҏ[ (height of tower 3 = h) ] ... + Ҏ[ (height of tower n = h) ] 1 1 1 = --- + --- + .... + --- 2h 2h 2h n = --- 2h ```

• The average height of a skip list of n entries

• Computing the average height of a skip list using the "frequency weighting method" (f1*c1 + f2*c2 + ...) is very difficult

Goodrich resorts to an approximation....

by looking for:

 What is the height with the greatest probability....

• The likelihood (chance) of having a skip list of height equal to 3log(n) is approximately:

 ``` n Ҏ[ Skip list has height 3log(n) ] = --- 23log(n) n = --- 2log(n3) n = --- n3 1 = --- n2 Example: n = 1000 1 Ҏ[ Skip list of 1000 elements has height 3log(1000) ] = ------- = 0.000001 10002 ```

• The likelihood (chance) of having a skip list of height equal to 2*log(n) is approximately:

 ``` n Ҏ[ Skip list has height 2log(n) ] = --- 22log(n) n = --- 2log(n2) n = --- n2 1 = --- n Example: n = 1000 1 Ҏ[ Skip list of 1000 elements has height 2log(1000) ] = ------- = 0.001 1000 ```

• The likelihood (chance) of having a skip list of height equal to log(n) is approximately:

 ``` n Ҏ[ Skip list has height log(n) ] = --- 2log(n) n = --- n = 1 ```

That's huge for a probability (it's approximately "certain") !!!

• Without going into mathematical details, the above computations argues that:

• The average height (or expected height) of a skip list with n entries is:

 ``` Avg(h) of a Skip list with n entries = log(n) ```

One down, one more to go !!!.

• Average number of right-directional traversals in one layer

• Important fact:

 All the keys after the first key scanned in layer i belong to towers of height i

Example:

Note:

 After scanning key 31, the other keys scanned (38, 44) belong to a tower of height 2

• Situation in layer i:

Conclussion:

 The number of right-directional traversals in layer i ≤ number of consecutive towers of height i

• \$64,000 question:

How likely is the next tower that you visit has height = i

• It is important to note that there are only 2 kinds of "towers" in layer i:

 Columns that have height = i Columns that have height > i

So only towers of height ≥ i are in layer i.

• Note:

• For those that are familiar with in probability theory, we are using the conditional probability:

 we are given the fact that height ≥ i

(I am trying to explain this stuff without requiring the knowledge of conditional probability)

• Question:

• When a tower has reached the height of i, what makes it stop growing ?

Answer: the code that determine when to stop adding a layer is:

 ``` while ( r.nextRandom() < 0.5 ) { // code to add 1 layer } ```

So r.nextRandom() returned a value that is ≥ 0.5

• When a tower has reached the height of i, what makes the tower grow higher than i ?

Answer: the code that determine when to stop adding a layer is:

 ``` while ( r.nextRandom() < 0.5 ) { // code to add 1 layer } ```

So r.nextRandom() returned a value that is < 0.5

• Another question:

• What is then the probability that the next column that you visit inside layer i has height i ???

You have a 50% change to find a tower of height i as your next column

Therefore, the average number consecutive towers of height i in layer i can be computed as follows:

• Case 1: 0 towers of height i

• Case 2: 1 tower of height i

• Case 3: 2 towers of height i

And so on !!!!

• The average number of right (directional) moves to search in layer i is then: (see the frequencies and cost in the figures above)

 ``` Avg # right moves = 0 * (0.5) + 1*(0.5)2 + 2*(0.5)3 ... = 1 ................... (2) ```

We can use maple to compute this sum:

 ``` |\^/| Maple 10 (SUN SPARC SOLARIS) ._|\| |/|_. Copyright (c) Maplesoft, a division of Waterloo Maple Inc. 2005 \ MAPLE / All rights reserved. Maple is a trademark of <____ ____> Waterloo Maple Inc. | Type ? for help. > sum( (k-1)*(1/2)^k, k = 1..infinity); 1 ```

• Final result

• The final expression for the run time of get(k) in a skip list of n entries:

 ``` run time get(k) ~= h + h×( # right traversals in 1 level ) ....... (1) (h ~= log(n)) ~= log(n) + log(n)×( # right traversals in 1 level ) ....... (2) (# right traversal in 1 level = # compare op in 1 level) ~= log(n) + log(n)×( 1 ) ~= 2×log(n) ........ (3) ```

• Running time of put(k,v) and remove(k)

• The put() and remove() method will finish in O(1) after they have located the appropriate entry

• Therefore:

 running time put(k,v) = O(log(n)) running time remove(k) = O(log(n))