Enhanced aggregation
Hive offers enhanced aggregation by using the GROUPING SETS
, CUBE
, and ROLLUP
keywords.
Grouping sets
GROUPING SETS
implements advanced multiple GROUP BY
operations against the same set of data. Actually, GROUPING SETS
are a shorthand way of connecting several GROUP BY
result sets with UNION ALL
. The GROUPING SETS
keyword completes all processes in a single stage of the job, which is more efficient. A blank set ()
in the GROUPING SETS
clause calculates the overall aggregation. The following are a few examples to show the equivalence of GROUPING SETS
. For better understanding, we can say that the outer level (brace) of GROUPING SETS
defines what data UNION ALL
is to be implemented. The inner level (brace) defines what GROUP BY
data is to be implemented in each UNION ALL
.
- Grouping set with one element of column pairs:
SELECT name, start_date, count(sin_number) as sin_cnt FROM employee_hr GROUP BY name, start_date GROUPING SETS((name, start_date...