SSC-FDM-GP0001
The performance of the CLUSTER BY may vary compared to the performance of Distributed By
Description
The DISTRIBUTED BY
in Greenplum is analogous to CLUSTER BY
in Snowflake. However, performance implications may vary due to architectural differences between Greenplum and Snowflake.
DISTRIBUTED BY
controls the physical distribution of data across the nodes (segments) in Greenplum's MPP architecture..CLUSTER BY
in Snowflake organizes data into blocks based on designated columns, aiding in filtering and aggregation tasks.
Understanding these mechanisms is crucial for optimizing performance in each respective platform.
Code Example
Input Code:
CREATE TABLE table1 (colum1 int, colum2 int, colum3 smallint, colum4 int )
DISTRIBUTED BY (colum1, colum2);
Output Code:
CREATE TABLE table1 (colum1 int, colum2 int, colum3 smallint, colum4 int )
--** SSC-FDM-GP0001 - THE PERFORMANCE OF THE CLUSTER BY MAY VARY COMPARED TO THE PERFORMANCE OF DISTRIBUTED BY **
CLUSTER BY (colum1, colum2)
COMMENT = '{ "origin": "sf_sc", "name": "snowconvert", "version": { "major": 0, "minor": 0, "patch": "0" }, "attributes": { "component": "greenplum", "convertedOn": "03/26/2025", "domain": "test" }}'
;
Recommendations
If you need more support, you can email us at [email protected]
Last updated