Clustered By

Create table option

Description

CLUSTERED BY is an optional clause that controls table clustering. For more information, please refer to Bucketed Sorted Tables.

Clustered by option is supported by Snowflake by Cluster By Clause.

Grammar Syntax

clustered_by :=
    CLUSTERED BY (col_name, ...) 
    [SORTED BY (col_name [ASC|DESC], ...)] 
    INTO num_buckets BUCKETS

Sample Source

Hive

CREATE TABLE clustered_table
(
  customer_id INT,
  transaction_amount NUMERIC
)
CLUSTERED BY (customer_id) SORTED BY (transaction_amount ASC) INTO 32 BUCKETS;

Snowflake

CREATE TABLE clustered_table
(
  customer_id INT,
  transaction_amount NUMERIC
)
CLUSTER BY (customer_id);
--** MSC-ERROR - MSC-HVXXXX - THE SORTED BY CLAUSE IS NOT SUPPORTED IN SNOWFLAKE. **
--SORTED BY (transaction_amount ASC)
--** MSC-ERROR - THE BUCKET NUMBER IS AUTOMATICALLY DEFINED IN THE SNOWFLAKE CLUSTER BY. **
--INTO 32 BUCKETS

Known Issues

  1. Clustering in SnowFlake is defined by Clause CLUSTER BY that not only allows the use of columns for Clustering Keys as in Hive but also column Expressions. You can find more information on Cluster Tables and Cluster Keys definition in: (Snowflake) User Guide (Clustering Keys & Clustered Tables).

  2. In Snowflake the number of buckets in which the data will be grouped is determined automatically.

  1. MSC-HVXXXX: THE BUCKET NUMBER IS AUTOMATICALLY DEFINED IN THE SNOWFLAKE CLUSTER BY.

  2. MSC-HVXXXX: THE SORTED BY CLAUSE IS NOT SUPPORTED IN SNOWFLAKE.

Last updated