SQL: Window Functions
SQL: Window Functions
Introduction
Firstly we excute the following SQL statements and create a table Payroll
in Postgresql:
1 | CREATE DATABASE WindowFunctions; |
Now we have a table payroll
in the database windowfunctions
:
Then we can execute the following SQL:
1 | SELECT |
And the output is:
Now we want to “combine” the result of the above SQL and SELECT * FROM Payroll
, to get the following table:
The traditional way is to use JOIN
:
1 | SELECT |
And a more simple way is to use window functions
:
1 | SELECT |
That is why we use window functions, which can simplify our SQL statements.
Basic Syntax of Window Functions
We use an OVER
clause to open the windows, and ( )
is the window, i.e., a set of rows. That is, if a function is followed by OVER()
, we can treat it as a window function:
1 | window_function() OVER() --> () is the window |
Common Window Functions
Aggregate Functions
Aggregate functions, i.e., COUNT, SUM, MIN, MAX and AVG, can be used as window functions. For example:
1 | SELECT |
One thing worth noting is that windows are affected by the WHERE
clause, that is, windows cannot work outside our base data set. For example:
1 | SELECT |
As shown, the total_employee_count
is 7 rather than 18, which means that we firstly execute the statement SELECT * FROM Payroll WHERE department_name = 'IT'
, then open the windows, then ORDER BY employee_id
.
ROW_NUMBER() & RANK() & DENSE_RANK()
ROW_NUMBER()
, RANK()
, and DENSE_RANK()
functions assign an integer to each row based on its order in its result set.
ROW_NUMBER()
Here is the the first example of ROW_NUMBER()
:
1 | SELECT |
We can break the above statement into 3 steps:
SELECT * FROM Payroll
- Open the window, i.e.,
ROW_NUMBER() OVER () AS "Base Row No"
ORDER BY employee_id
Understanding the above executing procedures, we can see another ROW_NUMBER()
example, which following the similar procedures:
1 | SELECT |
RANK() & DENSE_RANK()
For RANK()
, duplicated values are assigned the same rank, skipping the next numebr in line:
1 | 1 1 |
While for DENSE_RANK()
, duplicated values are assigned the same rank, and no values are skipped:
1 | 1 1 |
For example:
1 | SELECT |
There is one thing that we should notice, if we execute the following statement:
1 | SELECT |
As shown, we must use ORDER BY
for ranks, otherwise each row’s rank would be 1.
Statistical Functions
percent_rank(): Relative rank of the current row, i.e.,
(rank-1)/(total rows - 1)
cume_dist(): Relative rank of the current row, i.e.,
(rank)/(total rows)
ntile(num_buckets integer): Returns integer ranging from 1 to the argument value, dividing the partition as equally as possible
For example:
1 | SELECT *, |
Offset Functions
Offset functions includes FIRST_VALUE()
, LAST_VALUE()
, LEAD()
and LAG()
.
FIRST_VALUE() & LAST_VALUE()
For example:
1 | SELECT |
LEAD() & LAG()
For example:
1 | SELECT |