### Choosing a join order for R ⋈ S

Slideshow:

Does is matter if we perform R ⋈ S vs S ⋈ R ?

The theory says:

Note that the theory only says:   the result set of R ⋈ S is equal to that of S ⋈ R

Does is matter if we perform R ⋈ S vs S ⋈ R ?

In practice:

Does is matter if we perform R ⋈ S vs S ⋈ R ? - Example

Does is matter if we perform R ⋈ S vs S ⋈ R ? - Example

Consequence of the asymmetric behavior of the implementation algorithm(s):

Although:

Terminology: build relation and scanned relation in the one-pass algorithm

One-pass join algorithm:

Terminology: build relation and scanned relation in the one-pass algorithm

One-pass join algorithm:

Terminology:

Terminology: first relation and second relation in the nested-loop algorithm

Nested-loop join algorithm:

Terminology: first relation and second relation in the nested-loop algorithm

Nested-loop join algorithm:

Terminology:

Choosing the join ordering for R ⋈ S

Choosing the join ordering for R ⋈ S - Example

Query:

Optimized query plan:

Choosing the join ordering for R ⋈ S - Example

Query:

After choosing the best join ordering:

(We always put the first relation at the left branch of the join tree)

• Theory vs. Implementation....

• Theory:

• The join operator is (in theory) commutative: (i.e.: symmetric in behavior)

 ``` R ⋈ S = S ⋈ R ```

• Practice:

• The algorithms that implement the join operation are:

 Asymmteric !!!

I.e.:

 The role of the first input relation is different from the role of the second input relation

• Example: the one-pass join algorithm:

Roles:

 First relation is indexed and searched Second relation is scanned

• Cost of join algorithms:

• The cost (= # disk IOs) of the execution of:

 ``` R ⋈ S and S ⋈ R ```

are different !!!

However:

• The output (= result) of the execution of:

 ``` R ⋈ S and S ⋈ R ```

are equal (same) !!!

• Nomenclature used in the various join operation....

• One-pass join algorithm:

• R ⋈ S:

 The first relation (R) in the join expresion is used to build an index. The second relation (S) in the join expresion is scanned

Graphically:

• Nomenclature:

 R is called the build relation (We build an index on this relation) S is called the scanned relation (We scan this relation)

• Therefore, in the one-pass join algorithm:

 The smaller relation is the build relation The larger relation is the scan relation

• Nested loop join algorithm:

• R ⋈ S:

 The first relation (R) in the join expresion is used in the outer loop The second relation (S) in the join expresion is used in the inner loop

In other words:

 ``` while ( R ≠ empty ) // Outer loop { Read M - 1 blocks of R and organize them into a search structure (e.g., hash table)' Rewind R; while ( S ≠ empty ) // Inner loop { Read 1 block (b) of S; for ( each tuple t ∈ b ) do { Find the tuples r1, r2, ... of R (in the search structure) that join with t Output (r1,t), (r2,t), ... } } } ```

Graphically:

• Terminology used in the nested-loop join algorithm:

 The outer relation is the first relation The inner relation is the second relation

• Index join algorithm:

• R ⋈ S:

 The second relation (S) must have an index on the join attributes

• Choosing a join order for R ⋈ S

• Suppose you want to compute the join the relations R and S and there are no constraints on your choices.

Then you have 2 ways to compute the join:

 R ⋈ S           and           S ⋈ R

• Without any other factors to help you choose, the (default) choosing strategy is as follows:

 ``` If ( T(R) ≤ T(S) ) then: choose R ⋈ S else: choose S ⋈ R ```

I.e.: choose the smaller relation as the first relation

• Example:

• Query:

 ``` SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdateLIKE '%1960' ```

• Query plan:

The join-tree is actually like this:

• If the second input relation ( σbirthdate LIKE '%1960'(MovieStar)) is smaller than the first relation (StartsIn), we want to switch the order to: