• Ekta Aggarwal

Aggregate functions and GROUP BY in SQL

There are 5 aggregate functions in SQL which help us in retrieving certain statistics about our data. They are:

count : Provides number of non-NULL values in a particular column

min: Provides minimum of the values in a particular column

max : Provides maximum of the values in a particular column

avg: Provides average or mean of non-NULL values

sum: Provides total of non-NULL values


Dataset:

For this tutorial we shall make use of employee_performance:

CREATE TABLE employee_performance(
employee_id int,
department varchar(20),
education_level varchar(20) ,
gender char(1),
no_of_trainings int,
age int,
previous_year_rating numeric(2,1),
length_of_service int,
KPIs_met char(1),
avg_training_score numeric(5,2),
promoted_or_not varchar(3))

INSERT INTO
employee_performance(employee_id,department,education_level,gender,no_of_trainings,age,previous_year_rating,length_of_service,KPIs_met,avg_training_score,promoted_or_not)
VALUES
(1001,'Marketing','Graduate','M',2,24,NULL,1,'Y',69.5,'N'),
(1002,'Analytics','Post-Graduate','M',5,32,4.5,7,'Y',89.5,'Y'),
(1003,'R&D','Graduate','F',5,44,5,10,'Y',87,'Y'),
(1004,'HR','Graduate','M',1,32,3,3,'Y',54,'N'),
(1005,'Marketing','PhD','M',0,38,4,7,'N',79.5,'N'),
(1006,'IT','Graduate','F',2,23,NULL,0,'Y',83.5,'N'),
(1007,'Analytics','Post-Graduate','M',2,28,5,4,'Y',78.5,'Y'),
(1008,'Sales','Graduate','F',5,22,NULL,0,'Y',69.5,'N'),
(1009,'Marketing','PhD','F',2,48,3,7,'Y',98,'Y'),
(10010,'Marketing','PhD','M',10,39,5,4,'Y',71,'N'),
(10011,'Finance','Post-Graduate','M',3,35,3.5,10,'N',55,'N'),
(10012,'Sales','Post-Graduate','F',2,55,4,19,'Y',87,'Y'),
(10013,'HR','Graduate','M',0,48,2.5,21,'Y',72.5,'N'),
(10014,'IT','Graduate','F',2,24,3.5,12,'Y',64,'N');

Our dataset looks as follows:

SELECT * FROM employee_performance;

Getting the statistics from our data:

Using the following SQL query we can ascertain output of various aggregate functions in SQL.

We have renamed the resultant output by providing aliases using 'as' keyword.

SELECT sum(no_of_trainings) as total_trainings_attended, 
count(employee_id) as employee_strength,
avg(avg_training_score) as avg_training_score,
min(age) as min_age,max(age) as max_age
FROM employee_performance;

COUNT and COUNT DISTINCT

COUNT function provides the number of non-NULL values in a column.

SELECT count(department)
FROM employee_performance

In our data we have 7 unique or distinct departments, thus following query results the number of unique departments.

SELECT count(DISTINCT department)
FROM employee_performance

Following query results in counting the number of non-NULL rows in the dataset:

SELECT count(1) FROM employee_performance

Aggregation using GROUP BY:

Task: Calculate various aggregate metrics only for Sales department. Since we need information only about Sales department thus using WHERE keyword we filter the rows for Sales department and then the aggregation is done.

SELECT sum(no_of_trainings) as total_trainings_attended,count(employee_id) as 
employee_strength, 
avg(avg_training_score) as avg_training_score,
min(age) as min_age,max(age) as max_age 
FROM employee_performance 
WHERE department = 'Sales';

Task: Calculate the aggregate metrics for each department. For this, we shall write a GROUP BY statement and we define the variable(s) on which grouping needs to be executed. Note: In SELECT statement you need to define department name, otherwise it won't be possible to know which row maps to which department.

SELECT department,sum(no_of_trainings) as total_trainings_attended, count(employee_id)
as employee_strength,
avg(avg_training_score) as avg_training_score,
min(age) as min_age,max(age) as max_age
FROM employee_performance
GROUP BY department;

WHERE and GROUP BY together:

Task: Calculate the summary metrics for each department, considering only female employees. To achieve this, we write a WHERE statement and filter for rows having female employees and then we define the GROUP BY clause.

SELECT department,sum(no_of_trainings) as total_trainings_attended, count(employee_id)
as employee_strengthbr>avg(avg_training_score) as avg_training_scorebr>min(age) as min_age,max(age) as max_age
FROM employee_performance
WHERE gender  = 'F'
GROUP BY department;

Grouping by multiple criteria


Task: Calculate the summary metrics for each department and education level considering only female employees. To aggregate by multiple levels, we can define the multiple variables in our GROUP BY statement.

SELECT department,education_level,sum(no_of_trainings) as total_trainings_attended, count(employee_id)
as employee_strengthbr>avg(avg_training_score) as avg_training_scorebr>min(age) as min_age,max(age) as max_age
FROM employee_performance
WHERE gender  = 'F'
GROUP BY department,education_level;


Tags: