SPRKPY1081
pyspark.sql.readwriter.DataFrameWriter.partitionBy
This issue code has been deprecated since Spark Conversion Core 4.12.0
Message: pyspark.sql.readwriter.DataFrameWriter.partitionBy has a workaround.
Category: Warning
Description
The Pyspark.sql.readwriter.DataFrameWriter.partitionBy function is not supported. The workaround is to use Snowpark's copy_into_location instead. See the documentation for more info.
Scenario
Input
This code will create a separate directories for each unique value in the FIRST_NAME
column. The data is the same, but it's going to be stored in different directories based on the column.
This code will create a separate directories for each unique value in the FIRST_NAME
column. The data is the same, but it's going to be stored in different directories based on the column.
Output code
Recommended fix
In Snowpark, copy_into_location has a partition_by parameter that you can use instead of the partitionBy function, but it's going to require some manual adjustments, as shown in the following example:
Spark code:
Snowpark code manually adjusted:
copy_into_location has the following parameters
location: The Snowpark location only accepts cloud locations using an snowflake stage.
partition_by: It can be a Column name or a SQL expression, so you will need to converted to a column or a SQL, using col or sql_expr.
Additional recommendations
For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.
Last updated