CS457 Syllabus & Progress

## SQL: Set Functions

• Set Functions

• Set Functions compute an aggregate value from a set of values

• The set functions available in SQL are:

 SUM( ): sums a set of values AVG( ): takes the average of a set of values MAX( ): finds the maximum value of a set of values MIN( ): finds the minimum value of a set of values COUNT( ): returns the number of elements in a set (duplicates are counted multiple times !!!) COUNT(DISTINCT ...): returns the number of distinct elements in a set ( duplicates are counted ONCE )

• Applying set function on a set of values

• Fact:

 SELECT returns a set of tuples

Therefore:

 We can therefore apply one or more set functions on the result (set) of the SELECT command.

• Syntax:

 ``` SELECT SetFunction( attribute ) ..... ```

Meaning:

 Apply the set function SetFunction on the selected set of tuples returned by the SELECT command

• Example:

 ``` SELECT salary FROM employee Result (set): SALARY ---------- 30000 40000 25000 43000 38000 25000 25000 55000 Apply set functions on the result set: SELECT SUM(salary), AVG(salary), MAX(salary), MIN(salary), COUNT(salary), COUNT( DISTINCT salary ) FROM employee Result: SUM(SALARY) AVG(SALARY) MAX(SALARY) MIN(SALARY) COUNT(SALARY) COUNT(DISTINCT salary) ----------- ----------- ----------- ----------- ------------- ---------------------- 281000 35125 55000 25000 8 6 ```

• The query returns the total salary, the average salary, the max salary, the min salary and the number of salary values of all employees.

• Note:

 COUNT( ) will count duplicate values as multiple times To count duplicate value once, use COUNT(DISTINCT ...)

• Using the result of set functions in queries

• Query:

 Find fname, lname of employees who earn more than the average salary level in the company

• Solution:

 ``` select avg(salary) from employee AVG(SALARY) ----------- 35125 select fname, lname, salary from employee where salary > 35125 FNAME LNAME SALARY ---------- ---------- ---------- Frank Wong 40000 Jack Wallace 43000 John Doe 38000 James Borg 55000 select fname, lname, salary from employee where salary > ALL (select avg(salary) from employee) FNAME LNAME SALARY ---------- ---------- ---------- Frank Wong 40000 Jack Wallace 43000 John Doe 38000 James Borg 55000 ```

Note:

• When a sub-query returns a single value, then:

 ``` > ALL (select .... ) Can be shorted to: > (select .... ) ```

• Therefore, we can also write:

 ``` select fname, lname, salary from employee where salary > (select avg(salary) from employee) ```

using the fact that when the result set of the SELECT command consists of one (1) tuple, the use of ALL (or ANY) can be omitted.

• The target set of tuples used by Set Functions

• Important note:

 The output of a set function depends on the set of selected tuples (I call this the "target" set) selected by the WHERE clase of the SELECT command

• Example:

 ``` select salary from employee where dno=4 SALARY ---------- 25000 43000 25000 select max(salary) from employee where dno=4 MAX(SALARY) ----------- 43000 select avg(salary) from employee where dno=4 AVG(SALARY) ----------- 31000 ```

• Therefore:

 How well you can use set functions will depend very much on how well you can formulate the where clause that select the correct set of tuples !!!!

• Example Queries with Set Function

• Query 1:

 Find the total salary and the average salary of the salary paid to employees in the "Research" department.

• Solution:

 ``` First, find the set of Salaries for the employees in the "Research" department: SELECT salary FROM employee, department WHERE dno = dnumber AND dname = 'Research' Then find the max and avg of the set of values: SELECT SUM(salary), AVG(salary) FROM employee, department WHERE dno = dnumber AND dname = 'Research' ```

• Query 2:

 Find the fname and lname of the employee (in any department) that earns more than the average salary in the (whole) company

• Solution:

 ``` This query finds the average salary paid in the company: SELECT AVG(salary) FROM employee This query finds the employee with salary greater than the avergae salary: SELECT fname, lname FROM employee WHERE salary > ALL (SELECT AVG(salary) FROM employee) ```

However, because:

 ``` SELECT AVG(salary) FROM employee ```

returns one single value, we can omit the ALL keyword:

 ``` SELECT fname, lname FROM employee WHERE salary > (SELECT AVG(salary) FROM employee) ```

• Query 3:

 Find the fname and lname of the employee in the 'Research' department that earns more than the average salary within the 'Research' department

• Wrong solution:

 ``` The following query finds the average salary paid to employees in the 'Research' department: SELECT AVG(salary) FROM employee, department WHERE dno = dnumber AND dname = 'Research' WRONG solution: SELECT fname, lname FROM employee WHERE salary > (SELECT AVG(salary) FROM employee, department WHERE dno = dnumber AND dname = 'Research') ```

Reason: :

• The employee whose salary is greater than the average salary of the Research department, may:

 not work for the Research department !!!

• Correct solution:

 ``` SELECT fname, lname FROM employee, department WHERE dno = dnumber /* The employee must work for */ AND dname = 'Research' /* the Research department !! */ AND salary > (SELECT AVG(salary) FROM employee, department WHERE dno = dnumber AND dname = 'Research') ```

• Renaming the selected set function attribute

• Names given to set functions attributes:

 ``` select max(salary), avg(salary), sum(salary) from employee max(salary) avg(salary) sum(salary) ----------- ------------- ------------------------------- 55000.00 33875.000000 271000.00 ```

• You can rename the selected attributes with as alias clause inside the SELECT clause:

 ``` select max(salary) Max, avg(salary) Average, sum(salary) from employee Max Average sum(salary) --------- ------------- ------------------------------- 55000.00 33875.000000 271000.00 ```