SSC-FDM-GP0001
The performance of the CLUSTER BY may vary compared to the performance of Distributed By
Description
The DISTRIBUTED BY in Greenplum is analogous to CLUSTER BY in Snowflake. However, performance implications may vary due to architectural differences between Greenplum and Snowflake.
DISTRIBUTED BYcontrols the physical distribution of data across the nodes (segments) in Greenplum's MPP architecture..CLUSTER BYin Snowflake organizes data into blocks based on designated columns, aiding in filtering and aggregation tasks.
Understanding these mechanisms is crucial for optimizing performance in each respective platform.
Code Example
Input Code:
CREATE TABLE table1 (colum1 int, colum2 int, colum3 smallint, colum4 int )
DISTRIBUTED BY (colum1, colum2);Output Code:
CREATE TABLE table1 (colum1 int, colum2 int, colum3 smallint, colum4 int )
--** SSC-FDM-GP0001 - THE PERFORMANCE OF THE CLUSTER BY MAY VARY COMPARED TO THE PERFORMANCE OF DISTRIBUTED BY **
CLUSTER BY (colum1, colum2)
COMMENT = '{ "origin": "sf_sc", "name": "snowconvert", "version": { "major": 0, "minor": 0, "patch": "0" }, "attributes": { "component": "greenplum", "convertedOn": "03/26/2025", "domain": "test" }}'
;Recommendations
If you need more support, you can email us at [email protected]
Last updated
