SPRKPY1043

pyspark.sql.functions.posexplode_outer has a workaround

Description

This issue appears when the tool detects the usage of pyspark.sql.functions.posexplode_outer which has a workaround.

Input code:

df.select(posexplode_outer(colList)) 
df.select(posexplode_outer(colDict))

Output code:

#EWI: SPRKPY1043 => pyspark.sql.functions.posexplode_outer has a workaround, see documentation for more info
df.select(posexplode_outer(colList))
#EWI: SPRKPY1043 => pyspark.sql.functions.posexplode_outer has a workaround, see documentation for more info
df.select(posexplode_outer(colDict))

Scenarios

posexplode_outer(col: ColumnOrName) -> pyspark.sql.column.Column

When column contains a list of values Action: you can use functions.row_number to get the position and Session.flatten with the name of the field to get the value for lists, or the key/value for dictionaries. Example:

df.select(row_number().as_("pos"), flatten(colList, outer=True)["value"].as_("col"))

When column contains a map/dictionary (keys/values) Action: you can use snowflake.snowpark.Session.flatten with the name of the field to get the keys/values for dictionaries. Example:

df.select(row_number().as_("pos"), flatten(colDict, outer=True)["key"], Session.flatten(col)["value"])
# or
flattened = flatten(colDict, outer=True)
df.select(row_number().as_("pos"), flattened["key"], flattened["value"])

Note: using row_number is not full equivalent, because it starts with 1 (not zero as spark method), and it will not return null when keys/values are null.

Recommendation

  • For more support, you can email us at snowconvert-info@snowflake.com. If you have a contract for support with Snowflake, reach out to your sales engineer and they can direct your support needs.

#332: [SIT-1562] SQL Readiness

Change request updated