SQL Server GROUPING SET

Learning Objective

The objective of this SQL Server tutorial is to teach you how to use the GROUPING SET feature of SQL to group query resultset into multiple groups on different column combinations.

What is GROUPING SET in SQL Server?

A GROUPING SET is the extension of the GROUP BY clause in SQL. A GROUP BY clause allows us to group the resultset of a query on the basis of a column or a combination of columns. The GROUP is like a category and the resultset reflects the properties of the category. For example, we can create a group combining department and location in an employee table to find out what is the total number of employees working in a particular department in a location. Like what is the total number of IT personnel working in Bangalore. We can go further and add employee type (say developers) to department and location to find out what is the number of developers working in the IT department in Bangalore location. But we cannot do both of these in a single query using GROUP BY. We will have to write 2 separate queries for the two separate combinations or groups (i.e. (department, location) and (department, location, employee type)). The GROUP BY clause does not allow us to specify more than one group in a query.

This is where GROUPING SET comes into the picture. It extends the capability of the GROUP BY clause by allowing us to specify multiple groups in a single query. Using GROUPING SET with a single query we can generate multiple groups and analyze and compare their properties. GROUPING SET basically means a set of GROUPS.

SQL Server GROUPING SET Syntax

The basic syntax of GROUPING SET is given below.

SELECT
column1,
column2,
aggregate(column3)
FROM table
GROUP BY
GROUPING SETS (
(column1, column2),
(column1),
(column2),
()
);

In this syntax,

  • column –a column from the table in the SELECT list.
  • aggregate(column)– column on which aggregate function (i.e. SUM, COUNT, AVG etc.) is used.  
  • GROUP BY – SQL keyword combination to specify a column or multiple columns as a single group.
  • GROUPING SET – SQL keyword combination followed by a set of groups.
  • () – optional. Specifies an empty grouping set. It aggregates on all columns individually.

SQL Server GROUPING SET with Examples

Instead of beginning with GROUPING SET we will begin with GROUP BY so that we can progressively extend the logic and get a proper and clear understanding of the significance and usage of the GROUPING SET clause. Let us imagine that we are running an electronics retail chain (primarily dealing in laptops and tablets) having outlets across the country. These outlets are supplied by warehouses in the major cities and state capitals. For the North we have warehouses in Kolkata and Delhi. In the company database we have a table called warehouse_stocks which stores the stock information in the two warehouses. The table is represented below. We will use this sample table as the reference in the following examples.

warehouse_cityproduct_categoryproduct_brandstock_quantity
DelhiLaptopToshiba2000
DelhiLaptopDell7000
DelhiLaptopHP2500
DelhiTabletSony3500
DelhiTabletSamsung4200
KolkataLaptopHP3000
KolkataLaptopDell4000
KolkataLaptopToshiba1000
KolkataTabletSony3000
KolkataTabletSamsung4000
Table: warehouse_stocks

Given the above table we might want to analyze the stock situation of the different materials in the inventory in the warehouses. We can do so with the help of the following queries using the GROUP BY clause which creates and reports the stock figure for the different groups.

1)  Query to determine the total number of laptops and tablets from both warehouses

The below query creates a GROUP using the product_category column to determine the total number of laptops and tablets in the 2 warehouses. It uses the SUM aggregate function on the stock_quantity column for the purpose and the result is aliased as stock.

SELECT
product_category,
SUM (stock_quantity) stock
FROM warehouse_stocks
GROUP BY product_category;

It produces the following desired output.

product_categorystock
Laptop19500
Tablet14700

2)  Query to determine the total stock of each brand from both warehouses

The below query creates a GROUP using the product_brand column and to determine the total number of items available from each brand in the 2 warehouses. It uses the SUM aggregate function on the stock_quantity column for the purpose and the result is aliased as stock.

SELECT
product_brand,
SUM(stock_quantity) stock
FROM warehouse_stocks
GROUP BY product_brand;

It produces the following desired output.

product_brandstock
Dell11000
HP5500
Samsung8200
Sony6500
Toshiba3000

3)  Query to determine the total number of laptops and tablets available in each warehouse

The below query creates a GROUP using the warehouse_city and product_category columns and to determine the total number of laptops and items available in each warehouse. It uses the SUM aggregate function on the stock_quantity column for the purpose and the result is aliased as stock.

SELECT
warehouse_city,
product_category,
SUM(stock_quantity) stock
FROM warehouse_stocks
GROUP BY warehouse_city, product_category
ORDER BY 1;

It will produce the following desired output.

warehouse_cityproduct_categorystock
DelhiLaptop11500
DelhiTablet7700
KolkataLaptop8000
KolkataTablet7000

From the above we see that we have to issue a fresh query every time to create a new group or category to analyze the data from a different point of view. One way to get around this is to combine all the query result sets using the UNION ALL operator to get a holistic view of the stock status. The following UNION ALL query does the same. Since the UNION ALL operator requires that the resultset of all the participating queries should have the same number of columns therefore we have added a dummy column NULL in the second and third queries to make the number of columns equal in all.

SELECT
warehouse_city,
product_category,
SUM(stock_quantity) stock
FROM warehouse_stocks
GROUP BY warehouse_city, product_category
UNION ALL
SELECT NULL,
product_category,
SUM(stock_quantity) stock
FROM warehouse_stocks
GROUP BY product_category
UNION ALL
SELECT
NULL,
product_brand,
SUM(stock_quantity) stock
FROM warehouse_stocks
GROUP BY product_brand;

It will produce the following output which is an integrated resultset of all the above 3 queries.

warehouse_cityproduct_categorystock
DelhiLaptop11500
KolkataLaptop8000
DelhiTablet7700
KolkataTablet7000
NULLLaptop19500
NULLTablet14700
NULLDell11000
NULLHP5500
NULLSamsung8200
NULLSony6500
NULLToshiba3000

But the problem with this approach is that it is ad-hoc and cumbersome and more importantly inefficient. It puts pressure on the database server since the server has to run 3 separate queries and then combine the resultset of the first 2 queries and then combine that resultset with the third query. It requires multiple reads of the table and temporary storage and multiple IO’s. To overcome these shortfalls SQL server 2008 introduced the GROUPING SET feature which allows us to specify multiple GROUPS as a set in a single query. The following SQL query does the same by specifying all the GROUPS within the GROUPING SET clause so that the result consists of all the groups and their relevant details.

SELECT
warehouse_city,
product_category,
product_brand,
SUM(stock_quantity) stock
FROM warehouse_stocks
GROUP BY
GROUPING SETS (
(warehouse_city, product_category),
(product_category),
(product_brand)
()
);

It will produce the same integrated resultset of all the above 3 queries or the UNION ALL query with much less hassle and more technically efficiently. There is one additional row though i.e. row number 6 which is the output of the optional empty grouping set () and gives the sum the entire stock i.e. 34200. Since it is not a group it just aggregates the total stock_quantity.

warehouse_cityproduct_categoryproduct_brandstock
NULLNULLDell11000
NULLNULLHP5500
NULLNULLSamsung8200
NULLNULLSony6500
NULLNULLToshiba3000
NULLNULLNULL34200
DelhiLaptopNULL11500
KolkataLaptopNULL8000
NULLLaptopNULL19500
DelhiTabletNULL7700
KolkataTabletNULL7000
NULLTabletNULL14700

Advertisement