Readiness Scores

How ready is your codebase for Snowpark? There's more than one way to tell.

The Snowpark Migration Accelerator (SMA) generates a large amount of assessment information when it is executed. To help interpret this, the SMA generates a series of scores designed to help understand how ready a workload is for migration. These Readiness Scores are simple metrics designed to show the user how "ready" a codebase is for Snowflake. Think of this as a compatibility metric. The higher the Readiness Score, the more compatible this codebase is with Snowflake by only running the SMA.

The following Readiness Scores are currently being output by the SMA:

It should be noted that the Readiness Scores are not effort estimates. A high readiness scores does not mean that the manual work remaining is small. It simply means that based on that measure of readiness, the majority of the codebase is already compatible with Snowflake. It does not mean that what is not compatible will not require much effort. Use the full output of the assessment to help determine how much effort is required to complete a migration. Feel free to reach out to us if you would like more support on building a migration plan and effort estimate.

Levels

The Snowpark Migration Accelerator (SMA) has Each score reports a red, yellow, or green status. Think of these scores as a stoplight. For each color:

Red - Stop and solve. There is an issue that has a significant impact on the migration or the tool’s ability to report on this codebase. Stop at this score, read the action steps, and act immediately to resolve.
Yellow - Proceed with caution. There is an issue that might have a significant impact on the migration. Read the action steps, ensure you fully understand the impact of the yellow result, and move on to the next score when ready.
Green - Keep going. The tool did not detect a significant blocker for the migration of this codebase. This does not mean that the code is immediately ready to migrate. Read the action steps and proceed.

How to Interpret the Scores

Each score will give you a number, a red/yellow/green status (described above), and a suggestion on what to do next. It is strongly advised to:

Go through the scores in order - once you hit a red score, attempt to understand that problem immediately
Read all of the suggested next steps for each score - regardless of the result (even the green results), that’s where you’ll find action items.

Let's take a closer look at each of the readiness scores that are available today.

Spark API Readiness Score

The Spark API Readiness Score is the primary indicator of readiness that the Snowpark Migration Accelerator produces. However, this readiness is based only on references to the Spark API, not on any other factors or third-party libraries that may be present in the codebase.

When the SMA runs, it will identify all references to the Spark API. This could be import usages or functions. They are logged in the Spark API Usages Inventory available in the local output. Each reference depending on how it is used in the codebase is logged as either supported or not supported based on the Spark Reference Categories used by the tool. To calculate this readiness score, the tool takes all of the supported references and divides them by the total references found in the codebase:

This result will be shown as a percentage. The higher the number, the more references to the Spark API are supported in Snowflake. You can find this score in both the output detailed report and the assessment summary in the application.

Note that this is the original Readiness Score that was produced by the SMA. In versions of the SMA where only a single Readiness Score is produced, that Readiness Score is actually the Spark API Readiness Score.

Spark API Readiness Levels

Depending on the result of the above calculation, the score can be categorized as green, yellow, or red. In the application and the output report, you will see the corresponding guidance issued based on the calculated result.

For the Spark API Readiness Score, one of the following levels will be produced:

Green - The majority of references to the Spark API are supported. This indicates a workload that is a good candidate for a migration. If the other indicators are green, it might be time to run a quick Proof of Concept.
Yellow - There are enough references that are not supported to indicate some significant effort to migrate this workload. The next step would be to inventory the items that are not supported, and determine the effort required to convert those.
Red - There are a significant number of not-supported references. This workload may not be the best candidate for migration. A good next step would be to inventory the elements that are not supported, and determine if there is a common thread. If there are elements that may require significant re-architecture, include that planning as part of the migration process. If you need help, feel free to reach out to [email protected].

Snowpark Connect Readiness Score

The Snowpark Connect Readiness Score measures the percentage of Spark API references in your codebase that are supported by Snowpark Connect. This score provides an assessment of your existing Spark API code's readiness for execution within the Snowpark Connect environment.

How It's Calculated

During its execution, the SMA scans your codebase to identify all references to the Spark API. Examples of such references include import statements, function calls, and class instantiations. All discovered references are then logged in the Spark API Usages Inventory. This inventory is generated as a file in your local output directory. For every reference listed in the inventory, the SMA populates the IsSnowparkConnectSupported column, setting it to True if the API usage is supported by Snowpark Connect, or False if it is not.

To calculate the readiness score, the SMA takes all of the supported references and divides them by the total references found in the codebase:

For example, if your codebase has 100 Spark API references and 90 of them are supported by Snowpark Connect, your Snowpark Connect Readiness Score would be 90%.

A higher percentage for the Snowpark Connect Readiness Score indicates a greater degree of compatibility with Snowpark Connect, suggesting that a larger portion of your Spark code aligns with functionalities supported by Snowpark Connect.

Readiness Levels

The compatibility analysis yields a readiness score, which is categorized into one of three distinct levels: Green, Yellow, or Red. Both the application's assessment summary and the generated detailed report will display this readiness level, accompanied by specific guidance tailored to the findings:

Green - This workload is highly compatible with Snowpark Connect because the majority of references to the Spark API are supported without any change needed. Files that are fully compatible can be run immediately, though some files will still require issue resolution.
Yellow - There are some elements of the Spark API in this workload that are not supported in this workload or are incompatible with Snowpark Connect. This workload can be migrated, but there are elements that may require rearchitecture.
Red - There are a significant number of not-supported references. While migration is still possible, this workload may not be the best candidate for Snowpark Connect. Consider converting this workload to the Snowpark API by taking a look at the Snowpark API Readiness Score. If you need help, feel free to reach out to [email protected].

Third-Party API Readiness Score

The Third-Party Readiness Score represents the percentage of imported libraries that are categorized as supported in Snowflake. First, let's define what "Third Party" means in this context:

Third Party Library - any package or library that is not produced, created, or managed by Snowflake (or Snowpark in Snowflake).

This readiness score represents how many third party libraries or packages are already supported in Snowflake. In Python, this could be supported by being included in Snowpark via the Anaconda collection of packages. In Scala or Java, it could be references to any package that is not already a part of Snowpark.

To calculate this readiness score, the tool takes all of the supported import statements to a third party library or package and divides them by the total import statements to a third party library or package:

Some notes about this readiness score:

Third-party library calls that are supported by Snowpark: This refers to all those libraries that Snowpark has support (org.apache.spark is considered as supported)
Total of identified third-party library calls: This refers to the total number of third-party library calls identified, it takes into account all Spark and Non-spark libraries, as well as the supported and not supported ones.
The calculation will take into account only imports where the origin column (see Import Usages Inventory) is "ThirdPartyLib". Internal dependencies or import statements to other elements created/included in the codebase are not a part of this calculation.
This is not DISTINCT or UNIQUE calls to a library. This is the total number. If there are 100 calls in a codebase with 80 of those to 1 unsupported library and 20 to another supported library, the score will be 20%. Even though, 50% of the unique calls will be supported. The goal is to show the amount of code that has references to these imports. It's not to show how many unique references exist.

Third Party API Readiness Levels

For the Third Party API Readiness Score, one of the following levels will be produced:

Green - All of the import calls are supported in Snowflake. There should be no or minimal additional effort required to get these libraries to work with this codebase in Snowflake.
Yellow - There is at least 1 package or library referenced in this codebase that is not already supported in Snowpark. There are quite a few ways to include third party packages that are not already supported. Identify which ones are not supported using the Import Usages Inventory generated by the SMA, and determine how they are implemented in the source codebase. Make a plan to re-architect or include those in Snowflake.
Red - There are a significant number of packages or libraries referenced in this codebase that are not already supported in Snowpark. This could still refer to a single library that is used over a large amount of this codebase or it could be a large number of libraries that are used across the codebase. Either way, a significant assessment should be done on these import statements to determine the impact of them on this codebase. Luckily, Snowflake is ready to support you in this process. Regardless of whether you need guidance along the way or need some packages supported in Snowflake, feel free to reach out to [email protected].

SQL Readiness Score

The SQL Readiness Score measures how many of the identified SQL Elements present in a codebase can be converted to Snowflake SQL. The percentage shown is the percentage that are ready for Snowflake by using the SMA, so a higher number is better.

To calculate this readiness score, the tool takes all of the SQL supported elements and divides them by the total SQL elements identified:

SQL Readiness Score Levels

For the SQL Readiness Score, one of the following levels will be produced:

Green - The majority of SQL present in this codebase is supported in either natively supported in Snowflake or can be automated by the SMA. No conversion is 100%, so there will still be SQL to convert, but this workload is nearly ready to Snowflake.
Yellow - There is enough SQL that is not supported to indicate that there will need to be some effort applied to get this codebase ready for Snowflake. Take a look at the list of unsupported elements in the SQL Element Inventory and utilize the EWI's in the issues output to build a plan forward. It it possible that some minor adjustment is needed or that rearchitecture may be necessary on parts of this workload.
Red - There is a significant amount of SQL that is not supported in this codebase. This may indicate that this codebase will require some rearchitecture to ensure that it works with Snowflake. A good next step would be to take a look at the list of unsupported elements in the SQL Element Inventory and utilize the EWI's in the issues output to build a plan forward. If you need help, feel free to reach out to [email protected].

As with all readiness scores, only looking at a single one to determine the overall readiness of this workload can be misleading. There are other factors that you will want to take into account when migrating. The readiness scores should be used as a starting point. If there is a measure of readiness that you think is not well reported by the tool, let us know! The SMA team is constantly improving the readiness metrics available.

PreviousUnderstanding the Assessment Summary NextOutput Reports

Last updated 3 days ago