SPRKPY1088

pyspark.sql.readwriter.DataFrameWriter.option

Message: The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.

Category: Warning

Description

The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so validation might be needed to ensure that the behavior is correct.

Scenarios

There are some scenarios depending on the option it is supported or not, or the format used to write the file.

Scenario 1

Input

Below is an example of the usage of the method option, adding a sep option, which is currently supported.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])

df.write.option("sep", ",").csv("some_path")

Output

The tool adds the EWI SPRKPY1088 indicating that it is required validation.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])
#EWI: SPRKPY1088 => The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.
df.write.option("sep", ",").csv("some_path")

Recommended fix

The Snowpark API supports this parameter, so the only action can be to check the behavior after the migration. Please refer to the Equivalences table to see the supported parameters.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])
#EWI: SPRKPY1088 => The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.
df.write.option("sep", ",").csv("some_path")

Scenario 2

Input

Here the scenario shows the usage of option, but adds a header option, which is not supported.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])

df.write.option("header", True).csv("some_path")

Output

The tool adds the EWI SPRKPY1088 indicating that it is required validation is needed.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])
#EWI: SPRKPY1088 => The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.
df.write.option("header", True).csv("some_path")

Recommended fix

For this scenario it is recommended to evaluate the Snowpark format type options to see if it is possible to change it according to your needs. Also, check the behavior after the change.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])
#EWI: SPRKPY1088 => The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.
df.write.csv("some_path")

Scenario 3

Input

This scenario adds a sep option, which is supported and uses the JSON method.

  • Note: this scenario also applies for PARQUET.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])

df.write.option("sep", ",").json("some_path")

Output

The tool adds the EWI SPRKPY1088 indicating that it is required validation is needed.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])
#EWI: SPRKPY1088 => The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.
df.write.option("sep", ",").json("some_path")

Recommended fix

The file format JSON does not support the parameter sep, so it is recommended to evaluate the snowpark format type options to see if it is possible to change it according to your needs. Also, check the behavior after the change.

df = spark.createDataFrame([(100, "myVal")], ["ID", "Value"])
#EWI: SPRKPY1088 => The pyspark.sql.readwriter.DataFrameWriter.option values in Snowpark may be different, so required validation might be needed.
df.write.json("some_path")

Additional recommendations

  • Since there are some not supported parameters, it is recommended to check the table of equivalences and check the behavior after the transformation.

  • Equivalences table:

PySpark Option
SnowFlake Option
Supported File Formats
Description

SEP

FIELD_DELIMITER

CSV

One or more single byte or multibyte characters that separate fields in an input file.

LINESEP

RECORD_DELIMITER

CSV

One or more characters that separate records in an input file.

QUOTE

FIELD_OPTIONALLY_ENCLOSED_BY

CSV

Character used to enclose strings.

NULLVALUE

NULL_IF

CSV

String used to convert to and from SQL NULL.

DATEFORMAT

DATE_FORMAT

CSV

String that defines the format of date values in the data files to be loaded.

TIMESTAMPFORMAT

TIMESTAMP_FORMAT

CSV

String that defines the format of timestamp values in the data files to be loaded.

If the parameter used is not in the list, the API throws an error.

Last updated