How the Conversion Works
What is converted? How?
The Snowpark Migration Accelerator (SMA) generates a comprehensize assessment, but it can also convert some elements in a source codebase to something that is compatible with a target codebase. The SMA does this in the exact same way that it builds the initial assessment, then is just one added step.
Conversion in the SMA
In both assessment and conversion mode, the SMA:
Scans all files in a directory
Identifies the code files
Parses the code files (based on the language of the source code)
Builds an Abstract Syntax Tree (AST)
Populates a Symbol Table
Categorizes and Reports on Errors
Generates the output reports
All of this happens AGAIN when the SMA is run in conversion mode, even if it was already run in assessment mode. There is one final step that is done in conversion mode:
Pretty print the output code from the AST
The AST is a semantic model representing the functionality of the source codebase. As a result, if that functionality exists in both the source and the target, the SMA can print in the output code the functional equivalent of what existed in the source (where there is a functional equivalent). This last step is only done during a conversion executions of the SMA.
Types of Conversion in the SMA
The SMA currently outputs only the following conversions:
References from the Spark API to the Snowpark API in Python or Scala code files
SQL Elements from Spark SQL or HiveQL to Snowflake SQL
Let's look at an example of the first one in both Scala and Python.
Examples of Conversion of References to the Spark API to the Snowpark API
Example of Spark Scala to Snowpark
When you select Scala as the source language, the SMA converts references to the Spark API in Scala code to references to the Snowpark API. Here's an example of the conversion of a simple Spark Application. This application reads, filters, joins, calculates an average, and shows results from a given dataset.
Apache Spark Scala Code
The Converted Snowflake Code:
In this example, most of the structure of the Scala code is the same, but the references to the Spark API have been changed to references to the Snowpark API.
Example of PySpark to Snowpark
When you select Python as the source language, the SMA converts references to the Spark API in Python code to references to the Snowpark API. Here is a script that uses several Pyspark functions:
The Converted Snowflake Code:
In this example, most of the structure of the Python code is the same, but the references to the Spark API have been changed to references to the Snowpark API.
This is what you can expect from conversion with the SMA.
Last updated