SPRKPY1069

pyspark.sql.DataFrameWriter.parquet

Message: If partitionBy parameter is a list, Snowpark will throw an error.

Category: Warning

Description

When there is a usage of pyspark.sql.readwriter.DataFrameWriter.parquet method where it comes to the parameter partitionBy, the tool shows the EWI.

This is because in Snowpark the DataFrameWriter.parquet only supports a ColumnOrSqlExpr as a partitionBy parameter.

Scenarios

Scenario 1

Input code:

For this scenario the partitionBy parameter is not a list.

df = spark.createDataFrame([(25, "Alice", "150"), (30, "Bob", "350")], schema=["age", "name", "value"])

df.write.parquet(file_path, partitionBy="age")

Output code:

The tool adds the EWI SPRKPY1069 to let you know that Snowpark throws an error if parameter is a list.

df = spark.createDataFrame([(25, "Alice", "150"), (30, "Bob", "350")], schema=["age", "name", "value"])

#EWI: SPRKPY1069 => If partitionBy parameter is a list, Snowpark will throw and error.
df.write.parquet(file_path, partition_by = "age", format_type_options = dict(compression = "None"))

Recommended fix

There is not a recommended fix for this scenario because the tool always adds this EWI just in case the partitionBy parameter is a list. Remember that in Snowpark, only accepts cloud locations using a snowflake stage.

Scenario 2

Input code:

For this scenario the partitionBy parameter is a list.

Output code:

The tool adds the EWI SPRKPY1069 to let you know that Snowpark throws an error if parameter is a list.

Recommended fix

If the value of the parameter is a list, then replace it with a ColumnOrSqlExpr.

Additional recommendations

Last updated