SQL Server CUBE

Learning Objective

The objective of this SQL Server tutorial is to teach you how to use CUBE to aggregate different groups and the complete details about them.

What is CUBE in SQL Server?

CUBE is an extension of GROUPING SETS. It is almost the same as ROLLUP with one subtle difference. The difference is that it does an exhaustive grouping of the specified columns using all possible permutations.

For e.g.  if we specify 3 table columns or fields (X, Y, Z) in rollup it will group as below. The total number of groups created and reported will be n+1 where n is the number of columns specified in the ROLLUP clause.

(X, Y, Z), (X, Y), (X), ()

But CUBE will group as below. The total number of groups created and reported in the resultset will be 2n where n is the number of columns specified in the CUBE clause.

(X, Y, Z), (X, Z), (X, Y), (X), (Y), ()

So, we can see it produces a larger resultset containing more rows of information.

SQL Server CUBE Syntax

The basic syntax of the CUBE is given below.

SELECT
column1, column2, aggregate_function (column 3)
FROM
table
GROUP BY CUBE (column1, column2);

In this syntax,

  • column – a column from the table in the SELECT list.
  • aggregate_function (column) – column on which aggregate function (i.e. SUM, COUNT, AVG etc.) is used.  
  • GROUP BY – SQL keyword combination to specify a column or multiple columns to create a group on which the to apply the aggregate.
  • CUBE – SQL keyword combination which creates an exhaustive grouping of the specified columns and aggregates them.

SQL Server CUBE with Example

Let us try to understand when and how to use the CUBE operator in a SQL query. Suppose we have an employee table containing the information about employees in the company. The table is represented below. We will use this sample table for reference in our examples.

idnamegendersalarydepartment
1David JacksonMale5000IT
2Jim JamesonFemale6000HR
3Kate JohnsonFemale7500IT
4Will RayMale6500Marketing
5Shane MathewsFemale5500Finance
6Shed PriceMale8000Marketing
7Viktor SmithMale7200HR
8Vincent SmithsonFemale6600IT
9Janice StreepFemale5400Marketing
10Laura WellsFemale6300Finance
11Mac BullMale5700Marketing
12Patrick PattersonMale7000HR
13Julie OrbisonFemale7100IT
14Elice HemingwayFemale6800Marketing
15Wayne JohnsonMale5000Finance
Table: employee

Let us start by doing a ROLLUP on department and gender for employee salary. The following query does the same. The sum of employee salaries is aliased as ‘employee cost’.

SELECT
department, gender, sum(salary) AS 'employee cost'
FROM employee
GROUP BY
ROLLUP (department, gender);

The resultset of the above query is the following. It has 13 rows of information or records. They include the output of the following set of groups – (department, gender) with 8 records, (department) with 4 records and () 1 record which is the total cost of company of all employees of all genders and all departments i.e. 2+1=3 as mentioned in introduction above.

departmentgenderemployee cost
FinanceFemale11800
FinanceMale5000
FinanceNULL16800
HRFemale6000
HRMale14200
HRNULL20200
ITFemale21200
ITMale5000
ITNULL26200
MarketingFemale12200
MarketingMale20200
MarketingNULL32400
NULLNULL95600

Now we will apply the CUBE operator on the same query and see the difference. The following query does the same.

SELECT
department, gender, sum(salary) AS 'employee cost'
FROM employee
GROUP BY
CUBE (department, gender);

The resultset of the above query is the following. It has 15 records i.e. 2 more than the output from ROLLUP. The additional records are highlighted in pink and it is the output of the (gender) group which was missing in the ROLLUP resultset i.e. the total cost to company of the male employees and female employees separately. The total number of groups created and reported is 2n as mentioned above in the introduction i.e. (department, gender), (gender), (department) and ().

departmentgenderemployee cost
FinanceFemale11800
HRFemale6000
ITFemale21200
MarketingFemale12200
NULLFemale51200
FinanceMale5000
HRMale14200
ITMale5000
MarketingMale20200
NULLMale44400
NULLNULL95600
FinanceNULL16800
HRNULL20200
ITNULL26200
MarketingNULL32400

The output can be formatted with the help of coalesce to address gap in values i.e. NULL values. The following query does the same making the resultset more readable and comprehensible.

SELECT
coalesce (department, 'All Department') AS department,
coalesce (gender, 'Both Genders') AS gender,
sum(salary) AS 'employee cost'
FROM employee
GROUP BY
CUBE (department, gender);

The formatted output is the following. The NULL values have been replaced with meaningful values.

departmentgenderemployee cost
FinanceFemale11800
HRFemale6000
ITFemale21200
MarketingFemale12200
All DepartmentFemale51200
FinanceMale5000
HRMale14200
ITMale5000
MarketingMale20200
All DepartmentMale44400
All DepartmentBoth Genders95600
FinanceBoth Genders16800
HRBoth Genders20200
ITBoth Genders26200
MarketingBoth Genders32400

Advertisement