LogoLogo
SnowflakeDocumentation Home
  • Snowpark Migration Accelerator Documentation
  • General
    • Introduction
    • Getting Started
      • Download and Access
      • Installation
        • Windows Installation
        • MacOS Installation
        • Linux Installation
    • Conversion Software Terms of Use
      • Open Source Libraries
    • Release Notes
      • Old Version Release Notes
        • SC Spark Scala Release Notes
          • Known Issues
        • SC Spark Python Release Notes
          • Known Issues
    • Roadmap
  • User Guide
    • Overview
    • Before Using the SMA
      • Supported Platforms
      • Supported Filetypes
      • Code Extraction
      • Pre-Processing Considerations
    • Project Overview
      • Project Setup
      • Configuration and Settings
      • Tool Execution
    • Assessment
      • How the Assessment Works
      • Assessment Quick Start
      • Understanding the Assessment Summary
      • Readiness Scores
      • Output Reports
        • Curated Reports
        • SMA Inventories
        • Generic Inventories
        • Assessment zip file
      • Output Logs
      • Spark Reference Categories
    • Conversion
      • How the Conversion Works
      • Conversion Quick Start
      • Conversion Setup
      • Understanding the Conversion Assessment and Reporting
      • Output Code
    • Using the SMA CLI
      • Additional Parameters
  • Use Cases
    • Assessment Walkthrough
      • Walkthrough Setup
        • Notes on Code Preparation
      • Running the Tool
      • Interpreting the Assessment Output
        • Assessment Output - In Application
        • Assessment Output - Reports Folder
      • Running the SMA Again
    • Conversion Walkthrough
    • Sample Project
    • Using SMA with Docker
    • SMA CLI Walkthrough
  • Issue Analysis
    • Approach
    • Issue Code Categorization
    • Issue Codes by Source
      • General
      • Python
        • SPRKPY1000
        • SPRKPY1001
        • SPRKPY1002
        • SPRKPY1003
        • SPRKPY1004
        • SPRKPY1005
        • SPRKPY1006
        • SPRKPY1007
        • SPRKPY1008
        • SPRKPY1009
        • SPRKPY1010
        • SPRKPY1011
        • SPRKPY1012
        • SPRKPY1013
        • SPRKPY1014
        • SPRKPY1015
        • SPRKPY1016
        • SPRKPY1017
        • SPRKPY1018
        • SPRKPY1019
        • SPRKPY1020
        • SPRKPY1021
        • SPRKPY1022
        • SPRKPY1023
        • SPRKPY1024
        • SPRKPY1025
        • SPRKPY1026
        • SPRKPY1027
        • SPRKPY1028
        • SPRKPY1029
        • SPRKPY1030
        • SPRKPY1031
        • SPRKPY1032
        • SPRKPY1033
        • SPRKPY1034
        • SPRKPY1035
        • SPRKPY1036
        • SPRKPY1037
        • SPRKPY1038
        • SPRKPY1039
        • SPRKPY1040
        • SPRKPY1041
        • SPRKPY1042
        • SPRKPY1043
        • SPRKPY1044
        • SPRKPY1045
        • SPRKPY1046
        • SPRKPY1047
        • SPRKPY1048
        • SPRKPY1049
        • SPRKPY1050
        • SPRKPY1051
        • SPRKPY1052
        • SPRKPY1053
        • SPRKPY1054
        • SPRKPY1055
        • SPRKPY1056
        • SPRKPY1057
        • SPRKPY1058
        • SPRKPY1059
        • SPRKPY1060
        • SPRKPY1061
        • SPRKPY1062
        • SPRKPY1063
        • SPRKPY1064
        • SPRKPY1065
        • SPRKPY1066
        • SPRKPY1067
        • SPRKPY1068
        • SPRKPY1069
        • SPRKPY1070
        • SPRKPY1071
        • SPRKPY1072
        • SPRKPY1073
        • SPRKPY1074
        • SPRKPY1075
        • SPRKPY1076
        • SPRKPY1077
        • SPRKPY1078
        • SPRKPY1079
        • SPRKPY1080
        • SPRKPY1081
        • SPRKPY1082
        • SPRKPY1083
        • SPRKPY1084
        • SPRKPY1085
        • SPRKPY1086
        • SPRKPY1087
        • SPRKPY1088
        • SPRKPY1089
        • SPRKPY1101
      • Spark Scala
        • SPRKSCL1000
        • SPRKSCL1001
        • SPRKSCL1002
        • SPRKSCL1100
        • SPRKSCL1101
        • SPRKSCL1102
        • SPRKSCL1103
        • SPRKSCL1104
        • SPRKSCL1105
        • SPRKSCL1106
        • SPRKSCL1107
        • SPRKSCL1108
        • SPRKSCL1109
        • SPRKSCL1110
        • SPRKSCL1111
        • SPRKSCL1112
        • SPRKSCL1113
        • SPRKSCL1114
        • SPRKSCL1115
        • SPRKSCL1116
        • SPRKSCL1117
        • SPRKSCL1118
        • SPRKSCL1119
        • SPRKSCL1120
        • SPRKSCL1121
        • SPRKSCL1122
        • SPRKSCL1123
        • SPRKSCL1124
        • SPRKSCL1125
        • SPRKSCL1126
        • SPRKSCL1127
        • SPRKSCL1128
        • SPRKSCL1129
        • SPRKSCL1130
        • SPRKSCL1131
        • SPRKSCL1132
        • SPRKSCL1133
        • SPRKSCL1134
        • SPRKSCL1135
        • SPRKSCL1136
        • SPRKSCL1137
        • SPRKSCL1138
        • SPRKSCL1139
        • SPRKSCL1140
        • SPRKSCL1141
        • SPRKSCL1142
        • SPRKSCL1143
        • SPRKSCL1144
        • SPRKSCL1145
        • SPRKSCL1146
        • SPRKSCL1147
        • SPRKSCL1148
        • SPRKSCL1149
        • SPRKSCL1150
        • SPRKSCL1151
        • SPRKSCL1152
        • SPRKSCL1153
        • SPRKSCL1154
        • SPRKSCL1155
        • SPRKSCL1156
        • SPRKSCL1157
        • SPRKSCL1158
        • SPRKSCL1159
        • SPRKSCL1160
        • SPRKSCL1161
        • SPRKSCL1162
        • SPRKSCL1163
        • SPRKSCL1164
        • SPRKSCL1165
        • SPRKSCL1166
        • SPRKSCL1167
        • SPRKSCL1168
        • SPRKSCL1169
        • SPRKSCL1170
        • SPRKSCL1171
        • SPRKSCL1172
        • SPRKSCL1173
        • SPRKSCL1174
        • SPRKSCL1175
      • SQL
        • SparkSQL
          • SPRKSPSQL1001
          • SPRKSPSQL1002
          • SPRKSPSQL1003
          • SPRKSPSQL1004
          • SPRKSPSQL1005
          • SPRKSPSQL1006
        • Hive
          • SPRKHVSQL1001
          • SPRKHVSQL1002
          • SPRKHVSQL1003
          • SPRKHVSQL1004
          • SPRKHVSQL1005
          • SPRKHVSQL1006
      • Pandas
        • PNDSPY1001
        • PNDSPY1002
        • PNDSPY1003
        • PNDSPY1004
      • DBX
        • SPRKDBX1001
    • Troubleshooting the Output Code
      • Locating Issues
    • Workarounds
    • Deploying the Output Code
  • Translation Reference
    • Translation Reference Overview
    • SIT Tagging
      • SQL statements
    • SQL Embedded code
    • HiveSQL
      • Supported functions
    • Spark SQL
      • Spark SQL DDL
        • Create Table
          • Using
      • Spark SQL DML
        • Merge
        • Select
          • Distinct
          • Values
          • Join
          • Where
          • Group By
          • Union
      • Spark SQL Data Types
      • Supported functions
  • Workspace Estimator
    • Overview
    • Getting Started
  • INTERACTIVE ASSESSMENT APPLICATION
    • Overview
    • Installation Guide
  • Support
    • General Troubleshooting
      • How do I give SMA permission to the config folder?
      • Invalid Access Code error on VDI
      • How do I give SMA permission to Documents, Desktop, and Downloads folders?
    • Frequently Asked Questions (FAQ)
      • Using SMA with Jupyter Notebooks
      • How to request an access code
      • Sharing the Output with Snowflake
      • DBC files explode
    • Glossary
    • Contact Us
Powered by GitBook
On this page
  • Description
  • Syntax
  • Sample Source Patterns
  • Setup data
  • Pattern code
  • Known Issues
  • Related EWIs
  1. Translation Reference
  2. Spark SQL
  3. Spark SQL DML
  4. Select

Group By

Will the SMA ever be in your GROUP BY clause?

PreviousWhereNextUnion

Last updated 1 year ago

Description

The GROUP BY clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. Databricks SQL also supports advanced aggregations to do multiple aggregations for the same input record set via GROUPING SETS, CUBE, ROLLUP clauses. The grouping expressions and advanced aggregations can be mixed in the GROUP BY clause and nested in a GROUPING SETS clause. ()

Groups rows with the same group-by-item expressions and computes aggregate functions for the resulting group. A GROUP BY expression can be:

  • A column name.

  • A number referencing a position in the list.

  • A general expression.

Extensions:

, ,

()

Syntax

GROUP BY ALL

GROUP BY group_expression [, ...] [ WITH ROLLUP | WITH CUBE ]

GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [, ...] ) } [, ...]

grouping_set
   { expression |
     ( [ expression [, ...] ] ) }
SELECT ...
  FROM ...
  [ ... ]
  GROUP BY groupItem [ , groupItem [ , ... ] ]
  [ ... ]
  
SELECT ...
  FROM ...
  [ ... ]
  GROUP BY ALL
  [ ... ]

groupItem ::= { <column_alias> | <position> | <expr> }

SELECT ...
FROM ...
[ ... ]
GROUP BY CUBE ( groupCube [ , groupCube [ , ... ] ] )
[ ... ]

groupCube ::= { <column_alias> | <position> | <expr> }

SELECT ...
FROM ...
[ ... ]
GROUP BY GROUPING SETS ( groupSet [ , groupSet [ , ... ] ] )
[ ... ]

groupSet ::= { <column_alias> | <position> | <expr> }

SELECT ...
FROM ...
[ ... ]
GROUP BY ROLLUP ( groupRollup [ , groupRollup [ , ... ] ] )
[ ... ]

groupRollup ::= { <column_alias> | <position> | <expr> }

Sample Source Patterns

Setup data

Databricks

CREATE TEMP VIEW dealer (id, city, car_model, quantity) AS
VALUES (100, 'Fremont', 'Honda Civic', 10),
       (100, 'Fremont', 'Honda Accord', 15),
       (100, 'Fremont', 'Honda CRV', 7),
       (200, 'Dublin', 'Honda Civic', 20),
       (200, 'Dublin', 'Honda Accord', 10),
       (200, 'Dublin', 'Honda CRV', 3),
       (300, 'San Jose', 'Honda Civic', 5),
       (300, 'San Jose', 'Honda Accord', 8);

Snowflake

CREATE TEMP TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
INSERT INTO dealer VALUES
        (100, 'Fremont', 'Honda Civic', 10),
        (100, 'Fremont', 'Honda Accord', 15),
        (100, 'Fremont', 'Honda CRV', 7),
        (200, 'Dublin', 'Honda Civic', 20),
        (200, 'Dublin', 'Honda Accord', 10),
        (200, 'Dublin', 'Honda CRV', 3),
        (300, 'San Jose', 'Honda Civic', 5),
        (300, 'San Jose', 'Honda Accord', 8);

Pattern code

Databricks

-- 1. Sum of quantity per dealership. Group by `id`.
SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;

-- 2. Use column position in GROUP by clause.
SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;

-- 3. Multiple aggregations.
-- 3.1. Sum of quantity per dealership.
-- 3.2. Max quantity per dealership.
SELECT id, sum(quantity) AS sum, max(quantity) AS max
    FROM dealer GROUP BY id ORDER BY id;

-- 4. Count the number of distinct dealers in cities per car_model.
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;

-- 5. Count the number of distinct dealers in cities per car_model, using GROUP BY ALL
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY ALL;

-- 6. Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
SELECT id,
         sum(quantity) FILTER (WHERE car_model IN ('Honda Civic', 'Honda CRV')) AS `sum(quantity)`
    FROM dealer
    GROUP BY id ORDER BY id;

-- 7. Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 7.1. city, car_model
-- 7.2. city
-- 7.3. car_model
-- 7.4. Empty grouping set. Returns quantities for all city and car models.
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
    ORDER BY city;

-- 8.Group by processing with `ROLLUP` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY city, car_model WITH ROLLUP
    ORDER BY city, car_model;

-- 9. Group by processing with `CUBE` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY city, car_model WITH CUBE
    ORDER BY city, car_model;

id

sum(quantity)

100

32

200

33

300

13

id

sum(quantity)

100

32

200

33

300

13

id

sum

max

100

32

15

200

33

20

300

13

8

car_model

count

Honda Civic

3

Honda CRV

2

Honda Accord

3

car_model

count

Honda Civic

3

Honda CRV

2

Honda Accord

3

id

sum(quantity)

100

17

200

23

300

5

city
car_model
sum

NULL

Honda Civic

35

NULL

Honda Accord

33

NULL

NULL

78

NULL

Honda CRV

10

Dublin

Honda Civic

20

Dublin

NULL

33

Dublin

Honda CRV

3

Dublin

Honda Accord

10

Fremont

Honda Accord

15

Fremont

Honda Civic

10

Fremont

NULL

32

Fremont

Honda CRV

7

San Jose

Honda Accord

8

San Jose

NULL

13

San Jose

Honda Civic

5

city

car_model

sum

NULL

NULL

78

Dublin

NULL

33

Dublin

Honda Accord

10

Dublin

Honda CRV

3

Dublin

Honda Civic

20

Fremont

NULL

32

Fremont

Honda Accord

15

Fremont

Honda CRV

7

Fremont

Honda Civic

10

San Jose

NULL

13

San Jose

Honda Accord

8

San Jose

Honda Civic

5

city

car_model

sum

NULL

NULL

78

NULL

Honda Accord

33

NULL

Honda CRV

10

NULL

Honda Civic

35

Dublin

NULL

33

Dublin

Honda Accord

10

Dublin

Honda CRV

3

Dublin

Honda Civic

20

Fremont

NULL

32

Fremont

Honda Accord

15

Fremont

Honda CRV

7

Fremont

Honda Civic

10

San Jose

NULL

13

San Jose

Honda Accord

8

San Jose

Honda Civic

5

Snowflake

-- 1. Sum of quantity per dealership. Group by `id`.
SELECT id, sum(quantity) FROM dealer GROUP BY id ORDER BY id;

-- 2. Use column position in GROUP by clause.
SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;

-- 3. Multiple aggregations.
-- 3.1. Sum of quantity per dealership.
-- 3.2. Max quantity per dealership.
SELECT id, sum(quantity) AS sum, max(quantity) AS max
    FROM dealer GROUP BY id ORDER BY id;

-- 4. Count the number of distinct dealers in cities per car_model.
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;

-- 5. Count the number of distinct dealers in cities per car_model, using GROUP BY ALL
SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY ALL;

-- 6. Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
SELECT 
    id,
    SUM(CASE WHEN car_model='Honda Civic' OR car_model='Honda CRV' THEN quantity ELSE NULL END) AS `sum(quantity)`
    FROM dealer
    GROUP BY id ORDER BY id;

-- 7. Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 7.1. city, car_model
-- 7.2. city
-- 7.3. car_model
-- 7.4. Empty grouping set. Returns quantities for all city and car models.
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
    ORDER BY city NULLS FIRST;
 

-- 8. Group by processing with `ROLLUP` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), ())
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY ROLLUP (city, car_model)
    ORDER BY city NULLS FIRST, car_model NULLS FIRST;

-- 9. Group by processing with `CUBE` clause.
-- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), ())
SELECT city, car_model, sum(quantity) AS sum
    FROM dealer
    GROUP BY CUBE (city, car_model)
    ORDER BY city NULLS FIRST, car_model NULLS FIRST;

id

sum(quantity)

100

32

200

33

300

13

id

sum(quantity)

100

32

200

33

300

13

id

sum

max

100

32

15

200

33

20

300

13

8

car_model

count

Honda Civic

3

Honda CRV

2

Honda Accord

3

car_model

count

Honda Civic

3

Honda CRV

2

Honda Accord

3

id

sum(quantity)

100

17

200

23

300

5

city
car_model
sum

NULL

Honda Civic

35

NULL

Honda Accord

33

NULL

NULL

78

NULL

Honda CRV

10

Dublin

Honda Civic

20

Dublin

NULL

33

Dublin

Honda CRV

3

Dublin

Honda Accord

10

Fremont

Honda Accord

15

Fremont

Honda Civic

10

Fremont

NULL

32

Fremont

Honda CRV

7

San Jose

Honda Accord

8

San Jose

NULL

13

San Jose

Honda Civic

5

city

car_model

sum

NULL

NULL

78

Dublin

NULL

33

Dublin

Honda Accord

10

Dublin

Honda CRV

3

Dublin

Honda Civic

20

Fremont

NULL

32

Fremont

Honda Accord

15

Fremont

Honda CRV

7

Fremont

Honda Civic

10

San Jose

NULL

13

San Jose

Honda Accord

8

San Jose

Honda Civic

5

city

car_model

sum

NULL

NULL

78

NULL

Honda Accord

33

NULL

Honda CRV

10

NULL

Honda Civic

35

Dublin

NULL

33

Dublin

Honda Accord

10

Dublin

Honda CRV

3

Dublin

Honda Civic

20

Fremont

NULL

32

Fremont

Honda Accord

15

Fremont

Honda CRV

7

Fremont

Honda Civic

10

San Jose

NULL

13

San Jose

Honda Accord

8

San Jose

Honda Civic

5

Known Issues

No issues were found

Related EWIs

No related EWIs

Databricks SQL Language Reference GROUP BY
SELECT
GROUP BY CUBE
GROUP BY GROUPING SETS
GROUP BY ROLLUP
Snowflake SQL Language Reference GROUP BY