Walkthrough Setup
Everything you need for this walkthrough
Last updated
Everything you need for this walkthrough
Last updated
This walkthrough takes a hands-on approach to guide you through using the Snowpark Migration Accelerator (SMA) to assess actual codebases and evaluate the results. The goal is for you to get direct experience running the SMA on actual code to understand the tool's capabilities.
To follow along, you will need:
access to a computer with the Snowpark Migration Accelerator installed
access to the suggested sample codebase(s) from the same machine
No other materials are necessary - just the SMA, the code samples, and your computer. Let's look at where to access the tool and the code samples.
The Snowpark Migration Accelerator (SMA) is a tool that helps speed up the migration of PySpark and Spark Scala legacy code to Snowflake. The SMA takes references to the Spark API, analyzes, and converts them to the Snowpark API in any Python or Scala code. Exploring the full conversion capabilities of the SMA is out of the scope of this walkthrough. We will focus on using the SMA to analyze some sample Spark codebases and understand how they can be applied in migration scenarios.
When building the initial assessment, the SMA parses the source codebase and builds a complete semantic model of the functionality present in the source. A variety of reports including the detailed assessment report that will be reviewed in this walkthrough are generated from this model. This reporting contains migration readiness insights that can help you understand how “ready” a workload is for Snowpark. This will be shown in more detail later on in this lab.
To build an assessment in the SMA, you only need the application itself. While it is suggested that you go through , it is not required to use the SMA. Nor is there an access code necessary to build an assessment. Review the section on in this documentation to download the SMA. Once you've , the next step is to install it on your local machine. are also available in this documentation.
This walkthrough will focus on Python code as an input. We recommend using two sample codebases that are publicly available in third party Git repositories, since we don't want you to think we have curated these codebases. You can find them codebases below:
Spark Data Engineering Examples:
Spark ML Examples:
We will review the output execution of these two codebases throughout the rest of this walkthrough.
To run the SMA on these codebases, you will have to . It would be recommended to extract the contents to separate folders on your local machine as shown in this image:
These codebases will demonstrate how the SMA calculates based on the found references to the Spark API in each codebase, and how to validate that readiness score. One codebase will receive a high score, indicating that is already compatible with Snowpark. The other will get a low score, showing that more investigation is required to fully understand the workload's use case. Neither case guarantees that the workload can be migrated with ease. There will be additional considerations regardless of the score.
In the unzipped directories, there will be a variety of filetypes, but only will be analyzed for references to the Spark API and other Third Party APIs.
To get help installing the application or getting access to the code, reach out to .
If you've downloaded the codebases and have them unzipped into separate directories as shown above, you can move on to or reivew .