Walkthrough Setup

Everything you need for this walkthrough

This walkthrough takes a hands-on approach to guide you through using the Snowpark Migration Accelerator (SMA) to assess actual codebases and evaluate the results. The goal is for you to get direct experience running the SMA on actual code to understand the tool's capabilities.

Materials

To follow along, you will need:

access to a computer with the Snowpark Migration Accelerator installed
access to the suggested sample codebase(s) from the same machine

No other materials are necessary - just the SMA, the code samples, and your computer. Let's look at where to access the tool and the code samples.

SMA Application

The Snowpark Migration Accelerator (SMA) is a tool that helps speed up the migration of PySpark and Spark Scala legacy code to Snowflake. The SMA takes references to the Spark API, analyzes, and converts them to the Snowpark API in any Python or Scala code. Exploring the full conversion capabilities of the SMA is out of the scope of this walkthrough. We will focus on using the SMA to analyze some sample Spark codebases and understand how they can be applied in migration scenarios.

When building the initial assessment, the SMA parses the source codebase and builds a complete semantic model of the functionality present in the source. A variety of reports including the detailed assessment report that will be reviewed in this walkthrough are generated from this model. This reporting contains migration readiness insights that can help you understand how “ready” a workload is for Snowpark. This will be shown in more detail later on in this lab.

Download and Installation

To build an assessment in the SMA, you only need the application itself. While it is suggested that you go through the training on the SMA made available by Snowflake, it is not required to use the SMA. Nor is there an access code necessary to build an assessment. Review the section on Download and Access in this documentation to download the SMA. Once you've downloaded the installer, the next step is to install it on your local machine. Installation instructions are also available in this documentation.

Sample Codebase

This walkthrough will focus on Python code as an input. We recommend using two sample codebases that are publicly available in third party Git repositories, since we don't want you to think we have curated these codebases. You can find them codebases below:

Spark Data Engineering Examples: https://github.com/spark-examples/pyspark-examples
Spark ML Examples: https://github.com/apache/spark/tree/master/examples/src/main/python

To run the SMA on these codebases, you will have to download them as a zip file. It would be recommended to extract the contents to separate folders on your local machine as shown in this image:

These codebases will demonstrate how the SMA calculates the Spark API Readiness Score based on the found references to the Spark API in each codebase, and how to validate that readiness score. One codebase will receive a high score, indicating that is already compatible with Snowpark. The other will get a low score, showing that more investigation is required to fully understand the workload's use case. Neither case guarantees that the workload can be migrated with ease. There will be additional considerations regardless of the score.

In the unzipped directories, there will be a variety of filetypes, but only the supported code and notebook filetypes will be analyzed for references to the Spark API and other Third Party APIs.

We will review the output execution of these two codebases throughout the rest of this walkthrough.

Support

To get help installing the application or getting access to the code, reach out to [email protected].

If you've downloaded the codebases and have them unzipped into separate directories as shown above, you can move on to running the tool or reivew the notes on code preparation on the next page.

PreviousAssessment Walkthrough NextNotes on Code Preparation

Last updated 1 year ago