SPRKPY1076

Parameters in pyspark.sql.readwriter.DataFrameReader

Message: Parameters in pyspark.sql.readwriter.DataFrameReader methods are not supported. This applies to CSV, JSON and PARQUET methods.

Category: Warning.

Description

For the CSV, JSON and PARQUET methods on the pyspark.sql.readwriter.DataFrameReader object, the tool will analyze the parameters and add a transformation according to each case:

  • All the parameters match their equivalent name in Snowpark: in this case, the tool will transform the parameter into a .option() call. For this case, the parameter won't add this EWI.

  • Some parameters do not match the equivalent in Snowpark: in this case, the tool will add this EWI with the parameter information and remove it from the method call.

List of equivalences:

  • Equivalences for CSV:

Spark keys
Snowpark Equivalences

sep

FIELD_DELIMITER

header

PARSE_HEADER

lineSep

RECORD_DELIMITER

pathGlobFilter

PATTERN

quote

FIELD_OPTIONALLY_ENCLOSED_BY

nullValue

NULL_IF

dateFormat

DATE_FORMAT

timestampFormat

TIMESTAMP_FORMAT

inferSchema

INFER_SCHEMA

delimiter

FIELD_DELIMITER

  • Equivalences for JSON:

Spark keys
Snowpark Equivalences

dateFormat

DATE_FORMAT

timestampFormat

TIMESTAMP_FORMAT

pathGlobFilter

PATTERN

  • Equivalences for PARQUET:

Spark keys
Snowpark Equivalences

pathGlobFilter

PATTERN

Scenarios

Scenario 1

Input

For CVS here are some examples:

Output

In the converted code the parameters are added as individual options to the cvs function

Scenario 2

Input

For JSON here are some example:

Output

In the converted code the parameters are added as individual options to the json function

Scenario 3

Input

For PARQUET here are some examples:

Output

In the converted code the parameters are added as individual options to the parquet function

Additional recommendations

Last updated