Clustered By
Create table option
Description
CLUSTERED BY
is an optional clause that controls table clustering. For more information, please refer to Bucketed Sorted Tables.
Grammar Syntax
clustered_by :=
CLUSTERED BY (col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)]
INTO num_buckets BUCKETS
Sample Source
Hive
CREATE TABLE clustered_table
(
customer_id INT,
transaction_amount NUMERIC
)
CLUSTERED BY (customer_id) SORTED BY (transaction_amount ASC) INTO 32 BUCKETS;
Snowflake
CREATE TABLE clustered_table
(
customer_id INT,
transaction_amount NUMERIC
)
CLUSTER BY (customer_id);
--** MSC-ERROR - MSC-HVXXXX - THE SORTED BY CLAUSE IS NOT SUPPORTED IN SNOWFLAKE. **
--SORTED BY (transaction_amount ASC)
--** MSC-ERROR - THE BUCKET NUMBER IS AUTOMATICALLY DEFINED IN THE SNOWFLAKE CLUSTER BY. **
--INTO 32 BUCKETS
Known Issues
Clustering in SnowFlake is defined by Clause
CLUSTER BY
that not only allows the use of columns for Clustering Keys as in Hive but also column Expressions. You can find more information on Cluster Tables and Cluster Keys definition in: (Snowflake) User Guide (Clustering Keys & Clustered Tables).In Snowflake the number of buckets in which the data will be grouped is determined automatically.
Related EWIs
MSC-HVXXXX: THE BUCKET NUMBER IS AUTOMATICALLY DEFINED IN THE SNOWFLAKE CLUSTER BY.
MSC-HVXXXX: THE SORTED BY CLAUSE IS NOT SUPPORTED IN SNOWFLAKE.
Last updated
Was this helpful?