SPRKPY1062
pyspark.sql.group.GroupedData.pivot without parameter "values" is not supported
Category
Warning
Description
This issue appears when the SMA detects the usage of the pyspark.sql.group.GroupedData.pivot function without the "values" parameter (the list of values to pivot on).
At the moment, the Snowpark Python pivot function requires you to explicitly specify the list of distinct values to pivot on.
Scenarios
Scenario 1
When the SMA detects an expression that matches the pattern dataFrame.groupBy("columnX").pivot("columnY")
it will add an EWI message indicating that the pivot function without the "values" parameter is not supported.
In addition, it will add as a second parameter of the pivot function a list comprehension that calculates the list of values that will be translated into columns. Keep in mind that this operation is not efficient for large datasets and it is advisable to indicate the values explicitly.
Input code
Output code
Scenario 2
When the SMA couldn't detect an expression that matches the pattern dataFrame.groupBy("columnX").pivot("columnY")
it will only add an EWI message indicating that the pivot function without the "values" parameter is not supported.
Input code
Output code
Recommendation
Calculating the list of distinct values to pivot on is not an efficient operation on large datasets and could become a blocking call. Please consider indicating the list of distinct values to pivot on explicitly.
If you don't want to specify the list of distinct values to pivot on explicitly (not advisable), you can add the following code as the second argument of the pivot function to infer the values at runtime*
*Replace <df>
with the corresponding DataFrame, <column>
with the column to pivot and <count>
with the number of rows to select.
For more support, you can email us at sma-info@snowflake.com. If you have a contract for support with Snowflake, reach out to your sales engineer and they can direct your support needs.
Last updated