The objective of this SQL Server tutorial is to teach you how to use the GROUPING SET feature of SQL to group query resultset into multiple groups on different column combinations.
What is GROUPING SET in SQL Server?
A GROUPING SET is the extension of the GROUP BY clause in SQL. A GROUP BY clause allows us to group the resultset of a query on the basis of a column or a combination of columns. The GROUP is like a category and the resultset reflects the properties of the category. For example, we can create a group combining department and location in an employee table to find out what is the total number of employees working in a particular department in a location. Like what is the total number of IT personnel working in Bangalore. We can go further and add employee type (say developers) to department and location to find out what is the number of developers working in the IT department in Bangalore location. But we cannot do both of these in a single query using GROUP BY. We will have to write 2 separate queries for the two separate combinations or groups (i.e. (department, location) and (department, location, employee type)). The GROUP BY clause does not allow us to specify more than one group in a query.
This is where GROUPING SET comes into the picture. It extends the capability of the GROUP BY clause by allowing us to specify multiple groups in a single query. Using GROUPING SET with a single query we can generate multiple groups and analyze and compare their properties. GROUPING SET basically means a set of GROUPS.
SQL Server GROUPING SET Syntax
The basic syntax of GROUPING SET is given below.
SELECT column1, column2, aggregate(column3) FROM table GROUP BY GROUPING SETS ( (column1, column2), (column1), (column2), () );
In this syntax,
- column –a column from the table in the SELECT list.
- aggregate(column)– column on which aggregate function (i.e. SUM, COUNT, AVG etc.) is used.
- GROUP BY – SQL keyword combination to specify a column or multiple columns as a single group.
- GROUPING SET – SQL keyword combination followed by a set of groups.
- () – optional. Specifies an empty grouping set. It aggregates on all columns individually.
SQL Server GROUPING SET with Examples
Instead of beginning with GROUPING SET we will begin with GROUP BY so that we can progressively extend the logic and get a proper and clear understanding of the significance and usage of the GROUPING SET clause. Let us imagine that we are running an electronics retail chain (primarily dealing in laptops and tablets) having outlets across the country. These outlets are supplied by warehouses in the major cities and state capitals. For the North we have warehouses in Kolkata and Delhi. In the company database we have a table called warehouse_stocks which stores the stock information in the two warehouses. The table is represented below. We will use this sample table as the reference in the following examples.
Given the above table we might want to analyze the stock situation of the different materials in the inventory in the warehouses. We can do so with the help of the following queries using the GROUP BY clause which creates and reports the stock figure for the different groups.
1) Query to determine the total number of laptops and tablets from both warehouses
The below query creates a GROUP using the product_category column to determine the total number of laptops and tablets in the 2 warehouses. It uses the SUM aggregate function on the stock_quantity column for the purpose and the result is aliased as stock.
SELECT product_category, SUM (stock_quantity) stock FROM warehouse_stocks GROUP BY product_category;
It produces the following desired output.
2) Query to determine the total stock of each brand from both warehouses
The below query creates a GROUP using the product_brand column and to determine the total number of items available from each brand in the 2 warehouses. It uses the SUM aggregate function on the stock_quantity column for the purpose and the result is aliased as stock.
SELECT product_brand, SUM(stock_quantity) stock FROM warehouse_stocks GROUP BY product_brand;
It produces the following desired output.
3) Query to determine the total number of laptops and tablets available in each warehouse
The below query creates a GROUP using the warehouse_city and product_category columns and to determine the total number of laptops and items available in each warehouse. It uses the SUM aggregate function on the stock_quantity column for the purpose and the result is aliased as stock.
SELECT warehouse_city, product_category, SUM(stock_quantity) stock FROM warehouse_stocks GROUP BY warehouse_city, product_category ORDER BY 1;
It will produce the following desired output.
From the above we see that we have to issue a fresh query every time to create a new group or category to analyze the data from a different point of view. One way to get around this is to combine all the query result sets using the UNION ALL operator to get a holistic view of the stock status. The following UNION ALL query does the same. Since the UNION ALL operator requires that the resultset of all the participating queries should have the same number of columns therefore we have added a dummy column NULL in the second and third queries to make the number of columns equal in all.
SELECT warehouse_city, product_category, SUM(stock_quantity) stock FROM warehouse_stocks GROUP BY warehouse_city, product_category UNION ALL SELECT NULL, product_category, SUM(stock_quantity) stock FROM warehouse_stocks GROUP BY product_category UNION ALL SELECT NULL, product_brand, SUM(stock_quantity) stock FROM warehouse_stocks GROUP BY product_brand;
It will produce the following output which is an integrated resultset of all the above 3 queries.
But the problem with this approach is that it is ad-hoc and cumbersome and more importantly inefficient. It puts pressure on the database server since the server has to run 3 separate queries and then combine the resultset of the first 2 queries and then combine that resultset with the third query. It requires multiple reads of the table and temporary storage and multiple IO’s. To overcome these shortfalls SQL server 2008 introduced the GROUPING SET feature which allows us to specify multiple GROUPS as a set in a single query. The following SQL query does the same by specifying all the GROUPS within the GROUPING SET clause so that the result consists of all the groups and their relevant details.
SELECT warehouse_city, product_category, product_brand, SUM(stock_quantity) stock FROM warehouse_stocks GROUP BY GROUPING SETS ( (warehouse_city, product_category), (product_category), (product_brand) () );
It will produce the same integrated resultset of all the above 3 queries or the UNION ALL query with much less hassle and more technically efficiently. There is one additional row though i.e. row number 6 which is the output of the optional empty grouping set () and gives the sum the entire stock i.e. 34200. Since it is not a group it just aggregates the total stock_quantity.