SPRKPY1062

pyspark.sql.group.GroupedData.pivot without parameter "values" is not supported

Description

This issue appears when the SMA detects the usage of the pyspark.sql.group.GroupedData.pivot function without the "values" parameter (the list of values to pivot on).

At the moment, the Snowpark Python pivot function requires you to explicitly specify the list of distinct values to pivot on.

Scenarios

Scenario 1

When the SMA detects an expression that matches the pattern dataFrame.groupBy("columnX").pivot("columnY") it will add an EWI message indicating that the pivot function without the "values" parameter is not supported.

In addition, it will add as a second parameter of the pivot function a list comprehension that calculates the list of values that will be translated into columns. Keep in mind that this operation is not efficient for large datasets and it is advisable to indicate the values explicitly.

Input code

df.groupBy("date").pivot("category").sum("amount")

Output code

#EWI: SPRKPY1062 => pyspark.sql.group.GroupedData.pivot without parameter 'values' is not supported. See documentation for more info.
df.groupBy("date").pivot("category", [v[0] for v in df.select("category").distinct().limit(10000).collect()]]).sum("amount")

Scenario 2

When the SMA couldn't detect an expression that matches the pattern dataFrame.groupBy("columnX").pivot("columnY") it will only add an EWI message indicating that the pivot function without the "values" parameter is not supported.

Input code

df1.union(df2).groupBy("date").pivot("category").sum("amount")

Output code

#EWI: SPRKPY1062 => pyspark.sql.group.GroupedData.pivot without parameter 'values' is not supported. See documentation for more info.
df1.union(df2).groupBy("date").pivot("category").sum("amount")

Recommendation

Calculating the list of distinct values to pivot on is not an efficient operation on large datasets and could become a blocking call. Please consider indicating the list of distinct values to pivot on explicitly.
If you don't want to specify the list of distinct values to pivot on explicitly (not advisable), you can add the following code as the second argument of the pivot function to infer the values at runtime*

[v[0] for v in <df>.select(<column>).distinct().limit(<count>).collect()]]

*Replace <df> with the corresponding DataFrame, <column> with the column to pivot and <count> with the number of rows to select.

For more support, you can email us at [email protected]. If you have a contract for support with Snowflake, reach out to your sales engineer and they can direct your support needs.

PreviousSPRKPY1056 NextSPRKPY1063

Last updated 1 year ago

Category

Description

Scenarios

Scenario 1

Input code

Output code

Scenario 2

Input code

Output code

Recommendation