SPRKPY1082
pyspark.sql.readwriter.DataFrameReader.load
Last updated
pyspark.sql.readwriter.DataFrameReader.load
Last updated
Message: The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.
Category: Warning
The function is not supported. The workaround is to use Snowpark DataFrameReader methods instead.
The spark signature for this method DataFrameReader.load(path, format, schema, **options)
does not exist in Snowpark. Therefore, any usage of the load function is going to have an EWI in the output code.
Input
Below is an example that tries to load data from a CSV
source.
Output
The SMA adds the EWI SPRKPY1082
to let you know that this function is not supported by Snowpark, but it has a workaround.
Recommended fix
Fixing path
and format
parameters:
Replace the load
method with csv
method.
Below is an example that creates a temporal stage and puts the file into it, then calls the CSV
method.
Fixing schema
parameter:
Fixing options
parameter:
Below is an example that creates a dictionary with RECORD_DELIMITER
and DATE_FORMAT
, and calls the options
method with that dictionary.
Input
Below is an example that tries to load data from a JSON
source.
Output
The SMA adds the EWI SPRKPY1082
to let you know that this function is not supported by Snowpark, but it has a workaround.
Recommended fix
Fixing path
and format
parameters:
Replace the load
method with json
method
Below is an example that creates a temporal stage and puts the file into it, then calls the JSON
method.
Fixing schema
parameter:
Fixing options
parameter:
Below is an example that creates a dictionary with DATE_FORMAT
and TIMESTAMP_FORMAT
, and calls the options
method with that dictionary.
Input
Below is an example that tries to load data from a PARQUET
source.
Output
The SMA adds the EWI SPRKPY1082
to let you know that this function is not supported by Snowpark, but it has a workaround.
Recommended fix
Fixing path
and format
parameters:
Replace the load
method with parquet
method
Below is an example that creates a temporal stage and puts the file into it, then calls the PARQUET
method.
Fixing schema
parameter:
Fixing options
parameter:
Below is an example that creates a dictionary with PATTERN
, and calls the options
method with that dictionary.
Take into account that the options between spark and snowpark are not the same, but they can be mapped:
header
True or False
SKIP_HEADER = 1 / SKIP_HEADER = 0
To use the first line of a file as names of columns.
delimiter
Any single/multi character field separator
FIELD_DELIMITER
To specify single / multiple character(s) as a separator for each column/field.
sep
Any single character field separator
FIELD_DELIMITER
To specify a single character as a separator for each column/field.
encoding
UTF-8, UTF-16, etc...
ENCODING
To decode the CSV files by the given encoding type. Default encoding is UTF-8
lineSep
Any single character line separator
RECORD_DELIMITER
To define the line separator that should be used for file parsing.
pathGlobFilter
File pattern
PATTERN
To define a pattern to read files only with filenames matching the pattern.
recursiveFileLookup
True or False
N/A
To recursively scan a directory to read files. Default value of this option is False.
quote
Single character to be quoted
FIELD_OPTIONALLY_ENCLOSED_BY
To quote fields/columns containing fields where the delimiter / separator can be part of the value. This character To quote all fields when used with quoteAll option. Default value of this option is double quote(").
nullValue
String to replace null
NULL_IF
To replace null values with the string while reading and writing dataframe.
dateFormat
Valid date format
DATE_FORMAT
To define a string that indicates a date format. Default format is yyyy-MM-dd.
timestampFormat
Valid timestamp format
TIMESTAMP_FORMAT
To define a string that indicates a timestamp format. Default format is yyyy-MM-dd 'T'HH:mm:ss.
escape
Any single character
ESCAPE
To set a single character as escaping character to override default escape character(\).
inferSchema
True or False
INFER_SCHEMA
Automatically detects the file schema
mergeSchema
True or False
N/A
Not needed in snowflake as this happens whenever the infer_schema determines the parquet file structure
For modifiedBefore / modifiedAfter option you can achieve the same result in Snowflake by using the metadata columns and then adding a filter like: df.filter(METADATA_FILE_LAST_MODIFIED > ‘some_date’)
.
As a workaround, you can use methods instead.
The first parameter path
must be in a stage to make an equivalence with .
The schema can be set by using the function as follows:
The between spark and snowpark are not the same, in this case lineSep
and dateFormat
are replaced with RECORD_DELIMITER
and DATE_FORMAT
, the Additional recommendations section has a table with all the Equivalences.
As a workaround, you can use methods instead.
The first parameter path
must be in a stage to make an equivalence with .
The schema can be set by using the function as follows:
The between Spark and snowpark are not the same, in this case dateFormat
and timestampFormat
are replaced with DATE_FORMAT
and TIMESTAMP_FORMAT
, the Additional recommendations section has a table with all the Equivalences.
As a workaround, you can use methods instead.
The first parameter path
must be in a stage to make an equivalence with .
The schema can be set by using the function as follows:
The between Spark and snowpark are not the same, in this case pathGlobFilter
is replaced with PATTERN
, the Additional recommendations section has a table with all the Equivalences.
For more support, you can email us at or post an issue .