SQL GROUP BY

SQL GROUP BY

In this article, we use PostgreSQL to explian group by operation. Let’s firstly create two tables course and student:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
CREATE TABLE COURSE
(
ID SERIAL PRIMARY KEY,
COURSENAME VARCHAR( 20 ),
DETAILS VARCHAR( 100 )
) ;

CREATE TABLE STUDENT
(
STUDENTID SERIAL PRIMARY KEY,
NAME VARCHAR( 30 ),
COURSEID INT REFERENCES COURSE( ID )
) ;

INSERT INTO COURSE
( COURSENAME, DETAILS )
VALUES
( 'MSc', 'The MSc CS Course' ),
( 'BSc', 'The BSc CS Course' ) ;

INSERT INTO STUDENT
( NAME, COURSEID )
VALUES
( 'S1', 1 ),
( 'S2', 2 ),
( 'S3', 1 ),
( 'S4', 2 ),
( 'S5', 2 ),
( 'S6', 1 );

So, now we have two tables in PostgreSQL:

sql_group_by_1

Firstly let’s do the following query:

SELECT COURSENAME, NAME FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID;

sql_group_by_2

As shown, we have 6 students, who take 2 courses MSc and BSc. Can we separate these 6 students according to their coursename? We can use group by:

SELECT COURSENAME, NAME FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID GROUP BY COURSENAME;

However, this would confuse the DBMS, because it cannot do like this:

1
2
3
4
5
6
7
8
9
10
11
DBMS can only generate tuples, however, the following data are not tuples:

coursename | name
------------+------
| S1
MSc | S3
| S6
------------+------
| S2
BSc | S4
| S5

Therefore we need to modify our query like this:

1
2
-- add a aggregate function on the attribute "NAME"
SELECT COURSENAME, COUNT(NAME) FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID GROUP BY COURSENAME;

Then the DBMS can do the following job for us:

1
2
3
4
 coursename | count 
------------+-------
MSc | 3
BSc | 3

Let’s try another query:

1
SELECT COURSEID, COURSENAME, NAME FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID GROUP BY COURSEID;

It would give us the following result:

1
2
3
4
5
6
7
8
 courseid | coursename | name 
----------+------------+------
1 | MSc | S1
2 | BSc | S2
1 | MSc | S3
2 | BSc | S4
2 | BSc | S5
1 | MSc | S6

Now we want to use group by like this:

1
SELECT COURSEID, COURSENAME, COUNT(NAME) FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID GROUP BY COURSEID;

Again, this would confuse the DBMS, because it cannot do this:

1
2
3
4
5
6
7
8
9
10
11
These are not tuples:

courseid | coursename | count(name)
----------+------------+------------
| MSc |
1 | MSc | 3
| MSc |
----------+------------+------------
| BSc |
2 | BSc | 3
| BSc |

To solve the problem, we have two methods:

  1. We can add a aggregate function to COURSENAME:
1
SELECT COURSEID, COUNT(COURSENAME), COUNT(NAME) FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID GROUP BY COURSEID;

which gives us the following results:

1
2
3
4
 courseid | count(coursename) | count(name)
----------+-------------------+------------
2 | 3 | 3
1 | 3 | 3
  1. We can group by COURSEID and COURSENAME:
1
SELECT COURSEID, COURSENAME, COUNT(NAME) FROM STUDENT, COURSE WHERE STUDENT.COURSEID = COURSE.ID GROUP BY COURSEID, COURSENAME;

which gives us the following results:

1
2
3
4
 courseid | coursename | count 
----------+------------+-------
2 | BSc | 3
1 | MSc | 3

Both methods can give us tuples.