### Linear Hashing

• The problem with Extensible Hashing

• Main disadvantage of Extensible Hashing:

 The size of the bucket array will double each time the parameter i incraeses by 1 This exponential growth rate is too fast

• Intro to Linear Hashing (and to contrast with Extensible Hashing)

• Properties of the Linear Hashing technique:

• The growth rate of the bucket array will be linear (hence its name)

• The decision to increase the size of the bucket array is flexible....

A commonly used criteria is:

• If ( the average occupancy per bucket   >   some threshold ) then:

 split one bucket into two

• Linear hashing uses overflow buckets .....

• Parameters used in Linear hashing

• Parameters used in Linear Hashing:

• n = the number of buckets that is currently in use

• There is also a derived parameter i:

 ``` i = ⌈ log2( n ) ⌉ ```

The parameter i is the number of bits needed to represent a bucket index in binary:

 ``` #buckets bucket indexes n used i = ⌈log(n)⌉ ------------------------------------------------ 1 0 1 bit // 1 bucket --> bucket 0 2 0 1 1 bit // 2 buckets --> bucket 0 and 1 3 00 01 10 2 bits 4 00 01 10 11 2 bits 5 000 001 .. 100 3 bits 6 000 001 .. 101 3 bits 7 000 001 .. 110 3 bits 8 000 001 .. 111 3 bits 9 0000 0001 .. 1000 4 bits .... ```

Note:

• The n buckets are number as:

 ``` 0, 1, 2, .... , (n−1) // In binary ```

• Important property:

• When the number (n − 1) is written as i  bits binary number:

 The first bit in the binary number is always "1"

Example:

 ``` n n-1 in binary i = ⌈log(n)⌉ ----------------------------------------- 2 1 1 1 bits 3 2 10 2 bits 4 3 11 2 bits 5 4 100 3 bits 6 5 101 3 bits 7 6 110 3 bits 8 7 111 3 bits ... ^ | first bit = 1 !! ```

Consequently:

• For any number x:   (n − 1) < x < 2i-1   :

When x is written as i  bits binary number:

 The first bit in the binary number (for x) is always "1"

• Example of parameters in the Linear hashing method

• Example:

 ``` n = 2 (2 buckets in use, bucket indexes: 0 .. 1) i = 1 (1 bit needed to represent a bucket index) ```

Suppose the number of records r = 3 :

• Linear Hashing technique

• Hash function used in Linear Hashing:

The bucket index consists of the last  i bits in the hash function value.

• A bucket in Linear Hashing is a chain of disk blocks:

• Note

• There are only n buckets in use

• However:

 A hash key value consists of i bits I.e.: A hash key value can address: 2i buckets !!!

And:

 ``` n ≤ 2i ```

• Therefore:

• A hash key value that is > (n − 1) will lead to (what I call) ghost buckets:

"Ghost" bucket = a non-existing bucket !!!

Conclusion:

 We will need to map the "ghost" (non-existing) buckets to an existing bucket (See below)

• Recall that:

• The first bit of the parameter n - 1 (= the number of bucket that is currently in use) written in binary must be equal to 1:

 ``` n - 1 = 1xxxxx... ```

• Therefore:

• The "ghost" buckets (the non-existent buckets) must have as first bit the binary number 1:

Notice that:

• When we change the first bit of a "ghost" bucket index from 1 to 0:

 The result index identifies a real bucket !!! (Because the last bucket is (n−1) starts with a 1 bit !!!)

• Criteria to increase n in Linear Hashing

• Commonly used criteria to adjust (increase) the number of buckets n in Linear Hashing:

 ``` if ( Avg occupancy of a bucket > τ ) { n++; } ```

• How to determine average occupancy of a bucket:

• We need the following parameters:

 ``` n = current number of buckets in use r = current number of search keys stored in the buckets γ = block size (# search keys that can be stored in 1 block) ```

• Computation:

 ``` Max # search keys in 1 block = γ Max # search keys in n blocks = n × γ We have a total of: r search keys in n blocks r Avg occupancy = -------- n × γ ```

Example:

• Increase criteria in Linear hashing:

 ``` r if ( -------- > τ ) n × γ { n++; } ```

• Insert Algorithm of the Linear Hashing technique

• Insert Algorithm:

 ``` Parameter: n = current number of buckets in use i = ⌈ log(n) ⌉ Insert( x , recordPtr(x) ) { k = h(x); // General hash function value m = last i bits of k; // Linear hash function value /* ----------------------------------------------- Insert (x, recordPtr(x)) in "bucket m" ----------------------------------------------- */ if ( m ≤ n−1 ) { /* ========================================= m is a "real" bucket ========================================= */ Insert (x, recordPtr(x)) into bucket m if neccessary, use an overflow block; } else { /* ========================================= m is a "ghost" bucket Recall: first bit of m is 1 I.e.: m = 1xxxxxxxxx ========================================= */ m' = m with the first bit is changed to 0 // I.e.: m = 1xxxxxxxxxx // m' = 0xxxxxxxxxx // Note: m' is for sure a "real" bucket !!! Insert (x, recordPtr(x)) into bucket m' if neccessary, use an overflow block; } /* ============================================= Check if we need to adjust n ============================================= */ if ( r/(n*γ) > τ ) { Add bucket n; Let n = b1b2...; // n = the index of the NEW bucket // n can be a binary number of i bits OR // a binary number 100...000 of i+1 bits // In any case, the first bit of n (b1) = 1 !!! j = ⌈ log(n) ⌉ ; // Number of binary digits for n /* -------------------------------------- We need to move some search keys into this new bucket -------------------------------------- */ n' = 0b2... ; // n' was the "real" bucket used to // store search keys that were previously hashed // into the "ghost" bucket "1b2..." /* ---------------------------------------------------- Now that this "ghost" bucket is NOW "real", we move the "ghost"'s search keys into the "real" bucket ---------------------------------------------------- */ for ( every search key k ∈ bucket n' ) do { if ( last j bits of k == b1b2... (i.e., = n) ) { move search key k into the new bucket n; } } } } ```

• Lookup Algorithm of Linear Hashing

 ``` Parameter: n = current number of buckets in use i = ⌈ log(n) ⌉ Lookup( x ) { k = h(x); // General hash function value m = last i bits of k; // Linear hash function value if ( m ≤ n−1 ) { /* ========================================= m is a "real" bucket ========================================= */ Search (x, recordPtr(x)) in bucket m (including the overflow block); } else { /* ========================================= m is a "ghost" bucket Recall: first bit of m is 1 m = 1xxxxxxxxx ========================================= */ m' = m where the first bit is changed to 0 // I.e.: m = 1xxxxxxxxxx // m' = 0xxxxxxxxxx // Note: m' is a "real" bucket !!! Search (x, recordPtr(x)) in bucket m' (including the overflow block); } } ```

• Example Linear Hashing

• Parameters:

 ``` Max # search keys in 1 block (γ) = 2 Threshold avg occupance (τ) = 0.85 ```

• Initial State:

Average occupancy:

 ``` r 3 ------- = ------- = 0.75 n × γ 2 × 2 ```

• Insert search key K such that h(K) = 0101

• Insert 0101:

Result:

Average occupancy:

 ``` r 4 ------- = ------- = 1 > τ (0.85) n × γ 2 × 2 ```

• We must add an new bucket:

• Add bucket 2 (= 10 (binary)):

• Transfer search keys from bucket 00 to the newly created bucket 10:

Result:

Average occupancy:

 ``` r 4 ------- = ------- = 0.6666 ≤ τ (0.85) n × γ 3 × 2 ```

Notice that:

• n = 3 (changed) !!!
• i = 2 (changed) !!!

• We can find 1111 as follows:

Explanation:

 1111 will lead to the bucket 11 (using the last i (= 2) bits) which is a "ghost" bucket The search algorithm will use the real bucket 01 instead !!!

Continue example....

• Insert search key K such that h(K) = 0001

• Insert 0001:

Result:

So: Linear Hashing uses overflow blocks !!!

Average occupancy:

 ``` r 5 ------- = ------- = 0.833333 ≤ τ (0.85) n × γ 3 × 2 ```

No need too add another bucket.....

Continue example....

• Insert search key K such that h(K) = 0111

• Insert 0111:

Result:

Average occupancy:

 ``` r 6 ------- = ------- = 1 > τ (0.85) n × γ 3 × 2 ```

• We must add an new bucket:

• Add bucket 3 (= 11 (binary)):

• Transfer search keys from bucket 01 to the newly created bucket 11:

Result:

Average occupancy:

 ``` r 6 ------- = ------- = 0.75 ≤ τ (0.85) n × γ 4 × 2 ```

Notice that:

• n = 4 (changed) !!!

• Every search key in the Linear Hash table is now identified by their last 2 bits:

• Deleting from a Linear Hashing Table

• Delete was not discussed in the text book

• But if you understand how it works, you can easily devise an algorithm to reduce the number of buckets

Hint:

 When a bucket is delete, it becomes a "ghost" bucket If the bucket contains some search keys, you will need to move these search keys.... Where do you move them ???