Glossary

Every term you ever wanted to know

The Snowpark Migration Accelerator may use some language that is not a standard part of your lexicon. Here's a page where you can learn more.

Snowpark Migration Accelerator (SMA)

The software for this documentation. This software securely and automatically assesses and converts references to the Spark API written in Scala or Python into the equivalent in Snowflake's Snowpark.

The SMA has also been known as SnowConvert and SnowConvert for Spark. Note that SnowConvert is still available as a SQL conversion tool.

Readiness Score

The readiness score is the primary indicator of readiness that the Snowpark Migration Accelerator produces. However, this readiness is based only on references to the Spark API, not on any other factors or third-party libraries that may be present in the codebase. As a result, this readiness score can be misleading. There are other factors that you will want to take into account including the presence of third-party libraries. The readiness score should be used as a starting point. The Readiness Score is simply a measure of how many references to the Spark API can be converted to the Snowpark API divided by the total number of references to the Spark API. Both of these values are present in the Spark API Summary section (3413 / 3748 in this example). The higher this value, the better prepared the workload is to work with the Snowpark API. However, recall that this does not take into account any third-party libraries.

Spark Reference Categories

The Snowpark Migration Accelerator divides Spark elements into several categories based on the kind of mapping that is present from Spark to Snowpark. A summary of each of the categories that SnowConvert outputs to describe the translation of each Spark reference, along with a description, example, and whether the tool can automatically convert the reference (Tool Supported) and if it’s possible the Snowpark can be found on this page.

SnowConvert Qualification Tool

The version of SnowConvert for Spark that runs in assessment mode. Ultimately, this is software that identifies, precisely and automatically, all Apache Spark Python usages in a codebase.

File Inventory

An inventory of all the files present in the input directory of the tool. This could be any file, not just the ones listed above. You will get a breakdown by file type that includes the source technology, code lines, comment lines, and size of the source files.

Keyword Counts

A count of all present keywords broken out by technology. For example, if you have a PySpark statement in a .py file, this file will keep track of all of them. You will get a count of how many of each keyword you have by filetype.

Spark Reference Inventory

Finally, you will get an inventory of every reference to the Spark API present in Python code.

Readiness Score

The spark references will form the basis for assessing the level of conversion that can be applied to a given codebase.

Conversion Score

This score is calculated by taking all spark usages that were converted automatically divided by all spark references found.

Conversion/Transformation Rule

Rules that allow SnowConvert to convert from a portion of source code to the expected target code.

Parse

An initial process done by SnowConvert to understand the source code and build up an internal data structure to process conversion rules.

Last updated

#332: [SIT-1562] SQL Readiness

Change request updated