SPRKPY1016

pyspark.sql.functions.collect_set has a workaround

This issue code has been deprecated since Spark Conversion Core Version 0.11.7

Message: pyspark.sql.functions.collect_set has a workaround

Category: Warning.

Description

This issue appears when the tool detects the usage of pyspark.sql.functions.collect_set which has a workaround.

Scenario

Input

Using collect_set to get the elements of colname without duplicates:

col = collect_set(colName)

Output

SMA returns the EWI SPRKPY1016 over the line where collect_set is used, so you can use to identify where to fix.

#EWI: SPRKPY1016 => pyspark.sql.functions.collect_set has a workaround, see documentation for more info
col = collect_set(colName)

Recommended fix

Use function array_agg, and add a second argument with the value True.

col = array_agg(col, True)

Additional recommendation

Last updated