### B+-tree: introduction

• B+-tree

• B+-tree:

• B+-tree = an acronym for a tree structure that nobody knows what the acronym stands for....

Possible meaning:

 Balanced tree (because the tree is balanced Boeing tree (because reportedly, someone working for Boeing developed it)

• What is a B+-tree:

 B+-tree = a dynamic multi-level index Organized as a balanced tree structure

• Definitions related to a tree in general

• Definitions:

 Root node = the node at the "top" (or bottom depending how you draw the tree) of the tree Internal node = a node that has one or more child node(s) Leaf node = a node that does not have any children nodes

Note:

• B+-tree is a height balanced tree:

 Every leaf node is located at the same distance (height) from the root node

• Storing a B+-tree

• How to store a B+-tree index:

 Each (internal/leaf) node is stored as one disk block

• Structure of an internal node of a B+-tree

• Structure of an internal node (including the root node):

• Properties of an internal node:

• Each internal node is stored in one block on disk

• A internal node contains at most:

 ≤   n       keys ≤   n+1   pointers (= database addresses) to other B+-tree nodes

A full internal node:

• A pointer in an internal node points to:

 A (another) B+-tree node (= a disk block !!!)

• Every node (except the root node) has ≥   ⌈ (n+1)/2 ⌉ pointers

A minimal internal node:

• Summary: (and a concrete example with n = 3)

• Structure of a leaf node of a B+-tree

• Structure of an leaf node :

• Properties of an leaf node:

• Each leaf node is stored in one block on disk

• A leaf node contains at most:

 ≤   n   keys ≤   n   pointers (= database addresses) to data blocks (= database addresses) The last pointer points to the next leaf node (= a disk block !) in the B+-tree

A full leaf node:

• Every leaf node (except when the leaf node is the root node) has ≥   ⌈ n/2 ⌉ keys

A minimal leaf node:

Summary: (and a concrete example with n = 3)

• Difference between a B-tree and a B+ tree

• B-tree:

• B-tree:

• The leaf nodes in a B-tree is not linked

I.e.:

 The last record pointer in a leaf node is not used

• In this course

• We will only use B+-trees because:

• It's very little effort to maintain the links in the leaf nodes

• We need the ability to traverse the records in an ordered manner

 The links between the leaf nodes allows us to traverse the records in an ordered manner !!!!

• Therefore:

• Whenever I mentioned:

 B-tree

I really mean:

 B+-tree !!!

(There are a few places in the notes where I forgot to put the super script)

• Summary: structural requirements of a B+-tree

• The following figure summarizes the structural requirement of a B+-tree:

• Content of an internal node:

 k search key values k+1 pointers to other B+-tree nodes

• Content of an leaf node:

 k search key values k pointers to data blocks 1 pointer to the next leaf (sibbling) node

• Properties of the search key values in an internal node of a B+-tree node

• Search keys stored in an internal node statisfies the following property:

Explanation:

 All search keys in the left-most (= first) subtree is smaller (<) the value k1 All search keys in the 2nd subtree have values:   k1 ≤ key value < k2 All search keys in the 3rd subtree have values:   k2 ≤ key value < k3 And so on....

• Example:

• Search keys and block pointers in a leaf node

• Search keys and block/record pointer stored in an leaf node statisfies the following property:

Note that

 A block pointer = a database address of a block

• Computing n:   the number of search keys (and pointers) in a node of the B+-tree

• Fact:

 A node (both internal and external) of the B+-tree is stored in a disk block

• Consequently:

 We will always pack a disk block with the maximum number of search keys !!!!

• Computing n (the number of search keys in a node):

 ``` Let: Sk = length of a search key (in bytes) Sp = length of a pointer (in bytes) In 1 node, there are maximum: n search keys = n × Sk bytes n+1 pointers = (n+1) × Sp bytes Therefore: We must find the largest value of n such that n × Sk + (n+1) × Sp ≤ BlockSize ```

Example:

 ``` BlockSize = 4096 bytes Sk = 4 bytes (search key) Sp = 8 bytes Then: 4n + 8(n+1) ≤ 4096 12n + 8 ≤ 4096 12n ≤ 4088 n ≤ 340.666 ===> n = 340 search keys in 1 block ```

• Example of a B+-tree

• Example B+-tree (with n = 3 , #pointers = 4):

Structure requirement:

• Internal node:

 2   ≤   # pointers   ≤   4

• Leaf node:

 2   ≤   # keys   ≤   3