Lab Setup

Setup and configuration

Materials

This walkthrough takes a hands-on approach to guide you through using the Snowpark Migration Accelerator (SMA) to assess actual codebases and evaluate the results. The goal is for you to get direct experience running the SMA on actual code to understand the tool's capabilities.

To follow along, you will need:

access to a computer with the Snowpark Migration Accelerator installed
a sample codebase available

No other materials are necessary - just the SMA, the code samples, and your computer. Let's look at where to access the tool and the code samples.

SMA Application

The Snowpark Migration Accelerator (SMA) is a tool that helps speed up the migration of PySpark and Spark Scala legacy code to Snowflake. The SMA takes references to the Spark API, analyzes, and converts them to the Snowpark API in any Python or Scala code. Exploring the full conversion capabilities of the SMA is out of the scope of this walkthrough. We will focus on using the SMA to analyze some sample Spark codebases and understand how they can be applied in migration scenarios.

In assessment mode, the SMA parses the source codebase and builds a complete semantic model of the functionality present in the source. A variety of reports including the detailed assessment report that will be reviewed in this lab are generated from this model. This reporting contains migration readiness insights that can help you understand how “ready” a workload is for Snowpark. Which reports get generated are dependent on who is using the tool and what the resulting “readiness score” is. This will be shown in more detail later on in this lab.

Download and Installation

To run the Snowpark Migration Accelerator in assessment mode, you only need the application itself. There are currently no access codes or other training required. Review the section on Download and Access available in this documentation to download the SMA.

After downloading the installer, the next step is to install it on your local machine. Installation instructions are also available in this documentation.

Sample Codebase

For this lab, we recommend using two sample codebases in Python. Since we don't want you to think we have curated these codebases, they are available from third party Git repositories (thanks to them for making them publicly available!). You can find the codebases below:

Good candidate for migration: https://github.com/spark-examples/pyspark-examples
Bad candidate for migration: https://github.com/apache/spark/tree/master/examples/src/main/python

To run the SMA on these codebases, you will have to download them as a zip file. It would be recommended to extract the contents to separate folders on your local machine as shown in this image:

These codebases will demonstrate how the SMA's readiness score is calculated based on the found references to the Spark API in each codebase, and how to validate that readiness score. One codebase will receive a high score, indicating it is a good candidate for migration to Snowpark. The other will get a low score, showing that more investigation is required to fully understand the workload's use case. Neither case guarantees that the workload can be migrated with ease. There will be additional considerations regardless of the score. Both high and low-readiness examples will be evaluated to highlight this concept.

We will review the output execution of these two codebases throughout this lab.

Support

To get help installing the application or getting access to the code, reach out to sma-support@snowflake.com.

PreviousAssessment Walkthrough NextExtraction, Preprocessing, and Code Preparation

Last updated 1 month ago