SPRKPY1062
pyspark.sql.group.GroupedData.pivot
Message: Snowpark does not support GroupedData.pivot without parameter "values".
Category: Warning
Description
This issue appears when the SMA detects the usage of the pyspark.sql.group.GroupedData.pivot function without the "values" parameter (the list of values to pivot on).
At the moment, the Snowpark Python pivot function requires you to explicitly specify the list of distinct values to pivot on.
Scenarios
Scenario 1
Input
The SMA detects an expression that matches the pattern dataFrame.groupBy("columnX").pivot("columnY")
and the pivot does not have the values parameter.
Output
The SMA adds an EWI message indicating that the pivot function without the "values" parameter is not supported.
In addition, it will add as a second parameter of the pivot function a list comprehension that calculates the list of values that will be translated into columns. Keep in mind that this operation is not efficient for large datasets, and it is advisable to indicate the values explicitly.
Recommended fix
For this scenario the SMA add a second parameter of the pivot function a list comprehension that calculates the list of values that will be translated into columns, but you can a list of distinct values to pivot on, as follows:
Scenario 2
Input
The SMA couldn't detect an expression that matches the pattern dataFrame.groupBy("columnX").pivot("columnY")
and the pivot does not have the values parameter.
Output
The SMA adds an EWI message indicating that the pivot function without the "values" parameter is not supported.
Recommended fix
Add a list of distinct values to pivot on, as follows:
Additional recommendations
Calculating the list of distinct values to pivot on is not an efficient operation on large datasets and could become a blocking call. Please consider indicating the list of distinct values to pivot on explicitly.
If you don't want to specify the list of distinct values to pivot on explicitly (not advisable), you can add the following code as the second argument of the pivot function to infer the values at runtime*
*Replace <df>
with the corresponding DataFrame, with the column to pivot and with the number of rows to select.
For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.
Last updated