Message: The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.
The spark signature for this method DataFrameReader.load(path, format, schema, **options) does not exist in Snowpark. Therefore, any usage of the load function is going to have an EWI in the output code.
Scenario 1
Input
Below is an example that tries to load data from a CSV source.
The SMA adds the EWI SPRKPY1082 to let you know that this function is not supported by Snowpark, but it has a workaround.
path_csv_file ="/path/to/file.csv"schemaParam =StructType([StructField("Name", StringType(), True),StructField("Superhero", StringType(), True) ])#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_csv_file, "csv").show()#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_csv_file, "csv", schema=schemaParam).show()#EWI: The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_csv_file, "csv", schema=schemaParam, lineSep="\r\n", dateFormat="YYYY/MM/DD").show()
The options between spark and snowpark are not the same, in this case lineSep and dateFormat are replaced with RECORD_DELIMITER and DATE_FORMAT, the Additional recommendations section has a table with all the Equivalences.
Below is an example that creates a dictionary with RECORD_DELIMITER and DATE_FORMAT, and calls the options method with that dictionary.
The SMA adds the EWI SPRKPY1082 to let you know that this function is not supported by Snowpark, but it has a workaround.
path_json_file ="/path/to/file.json"schemaParam =StructType([StructField("Name", StringType(), True),StructField("Superhero", StringType(), True) ])#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_json_file, "json").show()#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_json_file, "json", schema=schemaParam).show()#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_json_file, "json", schema=schemaParam, dateFormat="YYYY/MM/DD", timestampFormat="YYYY-MM-DD HH24:MI:SS.FF3").show()
The options between Spark and snowpark are not the same, in this case dateFormat and timestampFormat are replaced with DATE_FORMAT and TIMESTAMP_FORMAT, the Additional recommendations section has a table with all the Equivalences.
Below is an example that creates a dictionary with DATE_FORMAT and TIMESTAMP_FORMAT, and calls the options method with that dictionary.
The SMA adds the EWI SPRKPY1082 to let you know that this function is not supported by Snowpark, but it has a workaround.
path_parquet_file ="/path/to/file.parquet"schemaParam =StructType([StructField("Name", StringType(), True),StructField("Superhero", StringType(), True) ])#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_parquet_file, "parquet").show()#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_parquet_file, "parquet", schema=schemaParam).show()#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.my_session.read.load(path_parquet_file, "parquet", schema=schemaParam, pathGlobFilter="*.parquet").show()
The options between Spark and snowpark are not the same, in this case pathGlobFilter is replaced with PATTERN, the Additional recommendations section has a table with all the Equivalences.
Below is an example that creates a dictionary with PATTERN, and calls the options method with that dictionary.
Take into account that the options between spark and snowpark are not the same, but they can be mapped:
Spark Options
Possible value
Snowpark equivalent
Description
header
True or False
SKIP_HEADER = 1 / SKIP_HEADER = 0
To use the first line of a file as names of columns.
delimiter
Any single/multi character field separator
FIELD_DELIMITER
To specify single / multiple character(s) as a separator for each column/field.
sep
Any single character field separator
FIELD_DELIMITER
To specify a single character as a separator for each column/field.
encoding
UTF-8, UTF-16, etc...
ENCODING
To decode the CSV files by the given encoding type. Default encoding is UTF-8
lineSep
Any single character line separator
RECORD_DELIMITER
To define the line separator that should be used for file parsing.
pathGlobFilter
File pattern
PATTERN
To define a pattern to read files only with filenames matching the pattern.
recursiveFileLookup
True or False
N/A
To recursively scan a directory to read files. Default value of this option is False.
quote
Single character to be quoted
FIELD_OPTIONALLY_ENCLOSED_BY
To quote fields/columns containing fields where the delimiter / separator can be part of the value. This character To quote all fields when used with quoteAll option. Default value of this option is double quote(").
nullValue
String to replace null
NULL_IF
To replace null values with the string while reading and writing dataframe.
dateFormat
Valid date format
DATE_FORMAT
To define a string that indicates a date format. Default format is yyyy-MM-dd.
timestampFormat
Valid timestamp format
TIMESTAMP_FORMAT
To define a string that indicates a timestamp format. Default format is yyyy-MM-dd 'T'HH:mm:ss.
escape
Any single character
ESCAPE
To set a single character as escaping character to override default escape character(\).
inferSchema
True or False
INFER_SCHEMA
Automatically detects the file schema
mergeSchema
True or False
N/A
Not needed in snowflake as this happens whenever the infer_schema determines the parquet file structure
For modifiedBefore / modifiedAfter option you can achieve the same result in Snowflake by using the metadata columns and then adding a filter like: df.filter(METADATA_FILE_LAST_MODIFIED > ‘some_date’).