CS457 Syllabus & Progress

## Set (or Aggregate) Functions

Set functions operates on a set of values and computes one single output value. They are used to formulate "group conditions" - conditions on a set of tuples.

• Aggregate or Set Functions

• Aggregate or Set functions are introduced to relational algebra to increase its expressive power.

• An aggregate function operates on a set of values (tuples) and computes one single value as output.

• Schematically:

• The Set Functions in Relational Algebra

• sum(): computes the sum of all values in the (numeric) set
• avg(): computes the average of all values in the (numeric) set
• max(): finds the maximum value of all values in the set
• min(): finds the minimum value of all values in the set
• any(): returns TRUE if set is not empty, otherwise (i.e., empty set), returns FALSE
• count(): returns the cardinaility (number of elements) in the set

• Forming groups using attribute values

• Set functions must be applied to a group/set of value

(The set may contain zero or one value, it is nevertheless a set )

• The SAME (set) function is usually applied to multiple sets/groups :

• For each group , the set function will produce EXACTLY one output value

(Therefore, since there are multiple groups , you will get multiple values or a group/set of values )

• Groups are formed based on common values in one or more attributes

• Example: grouping employee tuples based on their dno attribute:

• Set functions used on groups formed on attribute values

• We saw that tuples in a relation can be grouped based on one or more attribute values

• One or more set functions can now be applied to these groups

• Example:

Find the average salary for each department

We must first form groups of employee tuples based on their DNO attribute (grouped by department number)

And then compute the average Salary :

• The Result SET is:

 ``` DNO Avg +-----------+------------+ | 4 | 45000 | +-----------+------------+ | 5 | 55000 | +-----------+------------+ ```

• NOTE:

1. The result is always a RELATION (i.e., a set of tuples !!!)

2. The attributes of the result relation consist of:

• The grouping attributes !!!
• The set function

The grouping attributes are necessary, otherwise, you cannot tell much from the result, e.g.:

 ``` Avg +------------+ | 45000 | +------------+ | 55000 | +------------+ ```

You can't tell from this result what the average salary is of each department !

• Example:

Compare (find) the average salary for male and female employees

We must first form groups of employee tuples based on their "sex" attribute

And then compute the average Salary :

• The Result SET is:

 ``` sex Avg +-----------+------------+ | M | 50000 | +-----------+------------+ | F | 50000 | +-----------+------------+ ```

• Forming groups using multiple attributes

• You can form groups using more than one attributes

In that case, you will have "finer" (more smaller) groups.

• Example:

Find the average salary of males and females for each department

We must first form groups of employee tuples based on their DNO and sex attributes (grouped by department number and sex)

And then compute the average Salary :

• Group 1 is the male employees in department 4
• Group 2 is the male employees in department 5
• Group 3 is the female employees in department 5

• The Result SET is:

 ``` DNO sex Avg +--------+-----------+------------+ | 4 | M | 45000 | +--------+-----------+------------+ | 5 | M | 60000 | +--------+-----------+------------+ | 5 | F | 50000 | +--------+-----------+------------+ ```

• Notice that:

1. the set/group of female employees in department 4 is empty

2. The result set is NOT equal to:

 ``` DNO sex Avg +--------+-----------+------------+ | 4 | M | 45000 | +--------+-----------+------------+ | 4 | F | 0 | <---- NOT the result !!! +--------+-----------+------------+ | 5 | M | 60000 | +--------+-----------+------------+ | 5 | F | 50000 | +--------+-----------+------------+ ```

Keep this in mind when you do queries !!!

• Applying multiple (set) functions

• You can apply more than one (set) function on the groups

• Each set function will produce one value for each group

• So, you will get a "vector" (or array) of function values

• Example:

Find the average salary and average number of employees for males and females for each department

We must first form groups of employee tuples based on their DNO and sex attributes (grouped by department number and sex)

And then compute average Salary and count the SSN of employees:

• The Result SET is:

 ``` DNO sex Avg Count +--------+-----------+------------+------+ | 4 | M | 45000 | 2 | +--------+-----------+------------+------+ | 5 | M | 60000 | 1 | +--------+-----------+------------+------+ | 5 | F | 50000 | 1 | +--------+-----------+------------+------+ ```

• Set function: Formal Notation

• The operation where:

• a set of tuples (i.e., a relation) is first separated into different groups based on one or more attributes

• and then applying one or more set functions on each group

is denoted as follows:

• The output of the set function is a relation containing the following attributes:

• The content (i.e., tuples ) of the relation consists of:

• all distinct values of the grouping attributes attr1, attr2, ..., attrN.

• and the function values f1(a1), f2(a2), ..., fM(aM) on the corresponding group.

• Example:

• Notice that the output is indeed a set of tuples - i.e., a relation.

• Notice that there is NO tuple with attribute values (DNO=4, Sex='F') when this group (set) is empty !!!

• The attribute name for the function values is named after the function name.

• Renaming Attributes in a result relation

• Often -- for clarity reasons -- one or more attribute in the result relation are renamed

• The attribute in the result relation are given new names by sepcifying explicit names for the attributes when the result is computed

• Example:

• Applying (set) functions without any grouping attributes

• When there are no grouping attributes specified in a set function, then the set function is applied on ONE group that consists of all tuples of the relation (i.e., all tuples in the relation are in one single group).

• The most important fact to remember in this case is:

 Although the result set will consist of ONE single tuple (row) , the output of the set function is still a relation It is a relation with one single tuple in it !!!

• Example:

determent the average salary of all employees in the company and the total number of employees in the company:

• Enough Theory, show me the applications....

• That's all the theory on relational algebra.
• We will next do some examples to help you develop skills to formulate queries in relation algebra - these skills will help you formulate queries in SQL.