SPRKPY1016
pyspark.sql.functions.collect_set has a workaround
This issue code has been deprecated since Spark Conversion Core Version 0.11.7
Message: pyspark.sql.functions.collect_set has a workaround
Category: Warning.
Description
This issue appears when the tool detects the usage of pyspark.sql.functions.collect_set which has a workaround.
Scenario
Input
Using collect_set to get the elements of colname without duplicates:
col = collect_set(colName)
Output
SMA returns the EWI SPRKPY1016 over the line where collect_set is used, so you can use to identify where to fix.
#EWI: SPRKPY1016 => pyspark.sql.functions.collect_set has a workaround, see documentation for more info
col = collect_set(colName)
Recommended fix
Use function array_agg, and add a second argument with the value True.
col = array_agg(col, True)
Additional recommendation
For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.
Last updated