Message: The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location.
The spark signature for this method DataFrameReader.load(path, format, schema, **options) does not exist in Snowpark. Therefore, any usage of the load function is going to have an EWI in the output code.
Scenario 1
Below is an example that tries to load data from a CSV source.
The SMA adds the EWI SPRKPY1082 to let you know that this function is not supported by Snowpark, but it has a workaround.
path_csv_file = "/path/to/file.csv"
schemaParam = StructType([
StructField("Name", StringType(), True),
StructField("Superhero", StringType(), True)
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "csv").show()
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "csv", schema=schemaParam).show()
#EWI: The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "csv", schema=schemaParam, lineSep="\r\n", dateFormat="YYYY/MM/DD").show()
The options between spark and snowpark are not the same, in this case lineSep and dateFormat are replaced with RECORD_DELIMITER and DATE_FORMAT, the Additional recommendations section has a table with all the Equivalences.
Below is an example that creates a dictionary with RECORD_DELIMITER and DATE_FORMAT, and calls the options method with that dictionary.
The SMA adds the EWI SPRKPY1082 to let you know that this function is not supported by Snowpark, but it has a workaround.
path_json_file = "/path/to/file.json"
schemaParam = StructType([
StructField("Name", StringType(), True),
StructField("Superhero", StringType(), True)
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "json").show()
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "json", schema=schemaParam).show()
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "json", schema=schemaParam, dateFormat="YYYY/MM/DD", timestampFormat="YYYY-MM-DD HH24:MI:SS.FF3").show()
The options between Spark and snowpark are not the same, in this case dateFormat and timestampFormat are replaced with DATE_FORMAT and TIMESTAMP_FORMAT, the Additional recommendations section has a table with all the Equivalences.
Below is an example that creates a dictionary with DATE_FORMAT and TIMESTAMP_FORMAT, and calls the options method with that dictionary.
The SMA adds the EWI SPRKPY1082 to let you know that this function is not supported by Snowpark, but it has a workaround.
path_parquet_file = "/path/to/file.parquet"
schemaParam = StructType([
StructField("Name", StringType(), True),
StructField("Superhero", StringType(), True)
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "parquet").show()
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "parquet", schema=schemaParam).show()
#EWI: SPRKPY1082 => The pyspark.sql.readwriter.DataFrameReader.load function is not supported. A workaround is to use Snowpark DataFrameReader format specific method instead (avro csv, json, orc, parquet). The path parameter should be a stage location., "parquet", schema=schemaParam, pathGlobFilter="*.parquet").show()
The options between Spark and snowpark are not the same, in this case pathGlobFilter is replaced with PATTERN, the Additional recommendations section has a table with all the Equivalences.
Below is an example that creates a dictionary with PATTERN, and calls the options method with that dictionary.
Take into account that the options between spark and snowpark are not the same, but they can be mapped:
Spark Options
Possible value
Snowpark equivalent
True or False
To use the first line of a file as names of columns.
Any single/multi character field separator
To specify single / multiple character(s) as a separator for each column/field.
Any single character field separator
To specify a single character as a separator for each column/field.
UTF-8, UTF-16, etc...
To decode the CSV files by the given encoding type. Default encoding is UTF-8
Any single character line separator
To define the line separator that should be used for file parsing.
File pattern
To define a pattern to read files only with filenames matching the pattern.
True or False
To recursively scan a directory to read files. Default value of this option is False.
Single character to be quoted
To quote fields/columns containing fields where the delimiter / separator can be part of the value. This character To quote all fields when used with quoteAll option. Default value of this option is double quote(").
String to replace null
To replace null values with the string while reading and writing dataframe.
Valid date format
To define a string that indicates a date format. Default format is yyyy-MM-dd.
Valid timestamp format
To define a string that indicates a timestamp format. Default format is yyyy-MM-dd 'T'HH:mm:ss.
Any single character
To set a single character as escaping character to override default escape character(\).
True or False
Automatically detects the file schema
True or False
Not needed in snowflake as this happens whenever the infer_schema determines the parquet file structure
For modifiedBefore / modifiedAfter option you can achieve the same result in Snowflake by using the metadata columns and then adding a filter like: df.filter(METADATA_FILE_LAST_MODIFIED > ‘some_date’).