CS457 Syllabus & Progress

## Decomposition and its effect on a relation

• Decomposition

• A decomposition of a relation R is

• Decomposition = a collection of relations R1, R2, ..., Rn, such that:

 Each attribute in R1, R2, ..., Rn is found in R and Every attribute of R appears in R1, R2, ..., Rn at least once

In other words:

 A decomposition will not lose any attributes from the original relation

• Fact:

• There are "good" and "bad" decompositions

• We wee soon see that:

 A "bad" decompositions will introduce extraneous information into the database !!! A "good" decompositions will preserve the content of the original relation

• The effect of a decomposition

• Consider the following content of the relation Employee1:

SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

The content of the relation Employee1 convey the following facts:

• There are 2 employees:

 (111-11-1111, John, Smith)         (111-22-3333, Jane, Doe)

• There are 2 projects:

 (pj1, DBApplet)          (pj2, WebServer)

• There are 3 "works_on" information items:

 (111-11-1111, pj1, 20), (111-11-1111, pj2, 10) (111-22-3333, pj1, 5)

• When we decompose (or break up) Employee1(SSN, FName, LName, PNumber, PName, Hours) into:

 ``` Employee1(SSN, FName, LName, PNumber, PName, Hours) into: R1 (SSN, FName, LName) R2 (PNumber, PName, Hours) ```

we must also:

 Populate the relations R1 and R2 using the data in the relation Employee1

• This is achieved by using a projection operation:

 ``` R1 = πSSN, FName, LName(Employee1) R2 = πPNumber, PName, Hours(Employee1) ```

• Original content of relation Employee1:

SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

• Content of the decomposed relations R1 and R2 :

R1 = πSSN, FName, LName(Employee1)
SSN FName LName
111-11-1111 John Smith
111-22-3333 Jane Doe
R2 = πPNumber, PName, Hours(Employee1)
PNumber PName Hours
pj1 DBApplet 20
pj2 WebServer 10
pj1 DBApplet 5

• NOTE:

 After the decomposition of Employee1 into R1 and R2, the relation Employee1 is deleted !!! (We don't keep the original relation after decomposition !!!)

• An important question to ask is:

 Can we obtain the same information content (stored in the original relation Employee1) using the decomposed relations R1 and R2 ? (If not, we will be in deep trouble)...

• Re-constructing the original content of a relation

• The reconstruction algorithm used is as follows:

 ``` if ( R1 ∩ R2 != ∅ ) { reconstruction = R1 * R2 // Natural join } else { reconstruction = R1 × R2 // Cartesian product } ```

• Example:
R1
SSN FName LName
111-11-1111 John Smith
111-22-3333 Jane Doe
× R2
PNumber PName Hours
pj1 DBApplet 20
pj2 WebServer 10
pj1 DBApplet 5
=

Content obtained by reconstruction:
SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-11-1111 John Smith pj1 DBApplet 5
111-22-3333 Jane Doe pj1 DBApplet 20
111-22-3333 Jane Doe pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

• NOTE:

 We performed a cartesian product R1 × R2) on R1 and R2 because they do not have any attributes in common !

Compare the content of the reconstruction to the content of the original Employee1 relation:

Original content of relation Employee1:
SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

• Observation:

• The good:

• We are able to obtain every tuple that was in the original relation :

SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

• There are extraneous tuples that were not present in the original relation

Reconstruction: (extraneous tuples in red)
SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-11-1111 John Smith pj1 DBApplet 5
111-22-3333 Jane Doe pj1 DBApplet 20
111-22-3333 Jane Doe pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

• Requirements of a decomposition:

• After decompositing a relation R:

 We must be able to obtain all tuples in the original relation R using the reconstruction algorithm We must not obtain extraneous tuples that were not present in the original relation R using the reconstruction algorithm

• Note:

• If we miss some tuple in the reconstruction, it means that:

 We have lose information

Clearly that is unacceptable !!!

• If we gain some (extraneous) tuple in the reconstruction, it means that:

 We have some invalid information in the relation (= database) !!!

That is also inacceptable !!!

• Example of a "good" decomposition...

• Consider: Employee1

• We decompose Employee1 into:

 ``` Employee1(SSN, FName, LName, PNumber, PName, Hours) into: R3 (SSN, FName, LName) R4 (SSN, PNumber, PName, Hours) ```

• Result of the decomposition:

Original relation: Employee1
SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

Decomposition:

R3
SSN FName LName
111-11-1111 John Smith
111-22-3333 Jane Doe
R4
SSN PNumber PName Hours
111-11-1111 pj1 DBApplet 20
111-11-1111 pj2 WebServer 10
111-22-3333 pj1 DBApplet 5

• Reconstruction:

R3
SSN FName LName
111-11-1111 John Smith
111-22-3333 Jane Doe
* R4
SSN PNumber PName Hours
111-11-1111 pj1 DBApplet 20
111-11-1111 pj2 WebServer 10
111-22-3333 pj1 DBApplet 5
=

SSN FName LName PNumber PName Hours
111-11-1111 John Smith pj1 DBApplet 20
111-11-1111 John Smith pj2 WebServer 10
111-22-3333 Jane Doe pj1 DBApplet 5

• The reconstructed content is the same as the original content !!!

I.e., we have perserve the exact content - no more and no less

• Lossless decomposition

• A decomposition of a relation R into 2 relations R1 and R2 is called lossless if and only if:

 content(R1) * content(R2)   =   content(R)

where * is changed to a cartesian product if R1 ∩ R2 = ∅

• In plain English:

• A decomposition of a relation R into 2 relations R1 and R2 is lossless iff:

 We will obtain the original set of tuples exactly (= not more and not less) by reconstruction using the decomposed relations

• Definition:

 A decomposition that is not lessless is called:    lossy

• Ensuring that a decomposition is lessless

• \$64,000 question:

 How can we guarantee that a decomposition will be lossless ???