CS457 Syllabus & Progress

## Intro to finding all keys of a relation R

• Reminder: the importance of the Key

• Recall that the "natural" ( "trivial" or "beneficial" ) functional dependencies arise from the fact that:

 the key of a relation functionally determines   all attributes in the relation.

• To determine if a functional dependency is natural (beneficial, "harmless") , we must therefore first find all keys of a relation.

• Fact:

 The problem of find all keys of a relation. is NP-complete ....

• A problem is called "NP-complete" when the running time of any computer algorithm for solving the problem exactly is not likely to be polynomially increasing with the problem size.

• The running time of algorithms for solving "NP-complete" problem exactly are exponentially increasing with the problem size.

• In other words, it's not easy to solve large NP-complete problems

• Some well-known NP-complete problems:

 Traveling Sales Man : a sales man starts off in his city and needs to visit a set of cities by airplane. Flights connect cities, but not every pair of cities have a connecting flight. Find a route that allows him to visit all cities in his destination set exactly once. Scheduling (e.g., class schedules.): given a number of teachers who teach some class, a fix number of rooms and time slots. Find an assignment for the teachers in the rooms (constraint: teachers can't teach 2 classes at the same time and one room can not host 2 classes at once)

Most NP-complete problems look like they can be solved "easily", but looks here is very deceptive

• A brute force algorithm to find all keys in a relation

• Brute force algorithm to find all keys: check every subset of attributes for super key property

 ``` Assume that R = (A1, A2,..., An) SuperKeys := {}; for every possible subset X &sube {A1, A2,..., An} do { if ( X+ == R ) SuperKeys := SuperKeys ∪ X; } Remove all non-minimal sets from SuperKeys to find all keys ```

• The algorithm is called a brute force algorithm because it just tests every possible solution .

• The number of subsets of R = (A1, A2,... , An) is equal to 2n

Example: The subsets of R = (A1, A2, A3) are:

 ``` 1. () - empty set 2. (A1) 3. (A2) 4. (A3) 5. (A1, A2) 6. (A1, A3) 7. (A2, A3) 8. (A1, A2, A3) ```

• Thus, the running time of the brute force algorithm increases exponentially with the problem size (n)

• Example: Find all keys - with the brute force algorithm

• Problem:

Find all keys in the following relations:

 ``` R(A, B, C, D, E, F) ℉ = { A → BC BD → EF F → A } ```

• Brute force:

The subsets of R(A, B, C, D, E) are:

 ``` () (A) (A,B) (A,B,C) (A,B,C,D) (A,B,C,D,E) (A,B,C,D,E,F) (B) (A,C) (A,B,D) (A,B,C,E) (A,B,C,D,F) (C) (A,D) (A,B,E) (A,B,C,F) (A,B,C,E,F) (D) (A,E) (A,B,F) (A,B,D,E) (A,B,D,E,F) (E) (A,F) (A,C,D) (A,B,D,F) (A,C,D,E,F) (F) (B,C) (A,C,E) (A,C,D,E) (B,C,D,E,F) (B,D) (A,C,F) (A,C,D,F) (B,E) (A,D,E) (A,C,E,F) (B,F) (A,D,F) (A,D,E,F) (C,D) (A,E,F) (B,C,D,E) (C,E) (B,C,D) (B,C,D,F) (C,F) (B,C,E) (B,C,E,F) (D,E) (B,C,F) (C,D,E,F) (D,F) (B,D,E) (C,B,D,E) (E,F) (B,D,F) (C,B,D,E) (B,E,F) (C,D,E) (C,D,F) (C,E,F) (D,E,F) ```

Compute the closure set for every subset

 A+ = A, B, C B+ = B C+ = C D+ = D E+ = E F+ = F, A, B, C AB+ = A, B, C AC+ = A, C, B AD+ = A, D, B, C, E, F               KEY !!! AE+ = A, E, B, C AF+ = A, F, B, C BC+ = ... BD+ = ... BE+ = ... BF+ = ... BG+ = ... And so on.... (no HUMAN is that crazy to check every possible subset)