### CS554 - Homework 7

• Question 1 (30 pts)

• Consider the join of relations R(a,b), S(b,c), T(c,d), and U(a,d), where:

 R(a,b) S(b,c) T(c,d) U(d,a) ================================================================= a V(R,a) = 100 V(U,a) = 100 b V(R,b) = 100 V(S,b) = 100 c V(S,c) = 10 V(T,c) = 20 d V(T,d) = 100 V(U,d) = 150 ================================================================= T(R) = 1000 T(S) = 100 T(T) = 100 T(U) = 1000

Questions:

• Use the dynamic programming algorithm to find the best ordering of the join operations by filling out the following table:

 Relations Size of join result Cost (min) Join order --------------+----------------------+-------------+---------------- R | | | --------------+----------------------+-------------+---------------- S | | | --------------+----------------------+-------------+---------------- T | | | --------------+----------------------+-------------+---------------- U | | | --------------+----------------------+-------------+---------------- R,S | | | --------------+----------------------+-------------+---------------- R,T | | | --------------+----------------------+-------------+---------------- R,U | | | --------------+----------------------+-------------+---------------- S,T | | | --------------+----------------------+-------------+---------------- S,U | | | --------------+----------------------+-------------+---------------- T,U | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+---------------- | | | --------------+----------------------+-------------+----------------

• Question 2 (10 pts)

• Use the same information on the relations as Question 1

• Use the greedy algorithm to find the best join ordering by filling out the following table:

 Number of relations Best join order Cost ---------------------+--------------------+----------------- 2 | | ---------------------+--------------------+----------------- 3 | | ---------------------+--------------------+----------------- 4 | | ---------------------+--------------------+-----------------

• Question 3 (20 pts)

• Consider a relation R(a,b,c,d) that has a clustering index on a and nonclustering indexes on each of the other attributes.

The relevant parameters are:

 B(R ) = 1000, T(R ) = 5000, V(R,a) = 20, V(R,b) = 1000, V(R,c) = 5000, and            V(R,d) = 500.

Give the best query plan (index-scan or table-scan followed by a filter step)

Give the disk-I/O cost for each of the following selections:

1. σ a=1 ∧ b=2 ∧ d=3 (R)

 Best Plan: IO cost of plan: __________________________

2. σ a=1 ∧ b=2 ∧ c≥3 (R)

 Best Plan: IO cost of plan: __________________________

3. σ a=1 ∧ b≤2 ∧ c≥3 (R)

 Best Plan: IO cost of plan: __________________________

• Question 4 (Exercise 16.7.2 in text book) (10 pts)

• In terms of B(R), T(R), V(R,x), and V(R,y), express the following conditions about the cost of implementing a selection (σ) on R:

• It is better to use index-scan with a nonclustering index on x and a term that equates x to a constant than a nonclustering index on y and a term that equates y to a constant.

In other words:

 Give a condition (expressed using B(R), T(R), V(R,x), and/or V(R,y)) that states when is it better to use index-scan for σx=constant with a non-clustering index on x than to use index-scan for σy=constant with a non-clustering index on y

 Condition:

• It is better to use index-scan with a nonclustering index on x and a term that equates x to a constant than a clustering index on y and a term that equates y to a constant.

In other words:

 Give a condition (expressed using B(R), T(R), V(R,x), and/or V(R,y)) that states when is it better to use index-scan for σx=constant with a non-clustering index on x than to use index-scan for σy=constant with a clustering index on y

 Condition:

• It is better to use index-scan with a nonclustering index on x and a term that equates x to a constant than a clustering index on y and a term of the form y > C for some constant C.

In other words:

 Give a condition (expressed using B(R), T(R), V(R,x), and/or V(R,y)) that states when is it better to use index-scan for σx=constant with a non-clustering index on x than to use index-scan for σy>constant with a clustering index on y

 Condition:

• Question 5 (20 pts)

• Consider the following logical query plan:

where:

 ⋈1 is implemented using a 1-pass join algorithm ⋈2 is implemented using a 2-pass hashing-based join algorithm

The result of 1 is pipelined to 2

Question:

• Show that the minimum # buffers needed to perform this operation is:

 B(R) + sqrt( B(R)2 + 4 B(R⋈S) ) M ≥ ---------------------------------- + 1 2

(Use a separate sheet of paper to write down your proof -- because I don't know how much space you will need)

• Question 6 (10 pts)

• Consider the following logical query plan:

where:

 R(a,b) S(b,c) T(c,d) U(d,e) ================================================================= b V(R,b) = 200 V(S,b) = 100 c V(S,c) = 200 V(T,c) = 100 d V(T,d) = 200 V(U,d) = 150 ================================================================= B(R) = 1000 B(S) = 1000 B(T) = 1000 B(U) = 1000

We wish to use the 1-pass join algorithm for all join operations and use pipeline to pass the results

Question:

 What is the minimum # buffers required for this execution ?            Show how you arrive at your answer.

(Use a separate sheet of paper to write down your derivation -- because I don't know how much space you will need)