SPRKPY1042

pyspark.sql.functions.posexplode has a workaround

Description

This issue appears when the tool detects the usage of pyspark.sql.functions.posexplode which has a workaround.

Input code:

df.select(posexplode(colList)) 
df.select(posexplode(colDict))

Output code:

#EWI: SPRKPY1042 => pyspark.sql.functions.posexplode has a workaround, see documentation for more info
df.select(posexplode(colList))
#EWI: SPRKPY1042 => pyspark.sql.functions.posexplode has a workaround, see documentation for more info
df.select(posexplode(colDict))

Scenarios

posexplode(col: ColumnOrName) -> pyspark.sql.column.Column

When column contains a list of values Action: you can use functions.row_number to get the position and Session.flatten with the name of the field to get the value for lists, or the key/value for dictionaries. Example:

df.select(row_number().as_("pos"), flatten(colList)["value"].as_("col"))

When column contains a map/dictionary (keys/values) Action: you can use snowflake.snowpark.Session.flatten with the name of the field to get the keys/values for dictionaries. Example:

df.select(row_number().as_("pos"), flatten(colDict)["key"], Session.flatten(col)["value"])
# or
flattened = flatten(colDict)
df.select(row_number().as_("pos"), flattened["key"], flattened["value"])

Note: using row_number is not full equivalent, because it starts with 1 (not zero as spark method)

Recommendation

  • For more support, you can email us at snowconvert-info@snowflake.com. If you have a contract for support with Snowflake, reach out to your sales engineer and they can direct your support needs.

#332: [SIT-1562] SQL Readiness

Change request updated