Spark Reference Categories
Categories of references to the Spark API
SnowConvert for Spark divides Spark elements into several categories based on the kind of mapping that is present from Spark to Snowpark. Below is a summary of each of the categories that SnowConvert outputs to describe the translation of each Spark reference, along with a description, example, and whether the tool can automatically convert the reference (Tool Supported) and if it’s possible the Snowpark.
The following sections detail what each status means with some examples.
Direct
Direct translation. The same function exists in PySpark and Snowpark with no change needed.
Snowpark Supported: TRUE
Tool Supported: TRUE
Spark Example:
Snowpark Example:
Rename
The function from PySpark exists in Snowpark, but there is a rename that is needed.
Snowpark Supported: TRUE
Tool Supported: TRUE
Spark Example:
Snowpark Example:
Helper
Note: The Python extensions library has been deprecated as of Spark Conversion Core V2.40.0. No Spark elements from Python will be categorized as extensions from this version forward. Spark Scala will continue to support the helper classes in the Snowpark extensions library.
The function from Spark has a small difference in Snowpark than can be addressed by creating a function with an equivalent signature at an extension file that will resolve the difference. In other words, a "helper" function will be created in an extension library that will be called in each file where necessary.
You can find more information about the Snowpark extensions library in the extensions Git repository: https://github.com/Snowflake-Labs/snowpark-extensions.
Examples of this are "fixed" additional parameters, change order of parameters, etc.
Snowpark Supported: TRUE
Tool Supported: TRUE
Spark Example:
Snowpark Example:
Transformation
The function is completely recreated to a functionally equivalent function in Snowpark, but doesn't resemble the original function. This can include calling several functions, or adding multiple lines of code.
Snowpark Supported: TRUE
Tool Supported: TRUE
Spark Example:
Snowpark Example:
WorkAround
This category is employed when the tool cannot convert the PySpark element but there’s a known manual workaround to fix the conversion (the workaround is published in the tool documentation).
Snowpark Supported: TRUE
Tool Supported: FALSE
Spark Example:
Snowpark Example:
NotSupported
This category is employed when the tool cannot convert the PySpark element because there's no applicable equivalent in Snowflake.
Snowpark Supported: FALSE
Tool Supported: FALSE
Spark Example:
Snowpark Example:
NotDefined
This category is employed when the tool detects the usage of a Pyspark element as such but cannot be converted because it is not in the tool's conversion database.
This category is employed when the tool cannot convert the PySpark element because there's no applicable equivalent in Snowflake.
Snowpark Supported: FALSE
Tool Supported: FALSE
Spark Example: N/A
Snowpark Example: N/A
The output of the assessment will categorize all identified references to the Spark API with one of these categories.
Last updated