Running the SMA Again

Execution of the second workload in the walkthrough

But what if the codebase is not a good candidate? Now that you’ve run the tool on a bad workload, let’s run the tool on a workload that does not run as well.

Running on the Second Codebase

To rerun the tool, you have a couple of options:

  • Close the Snowpark Migration Accelerator (SMA) and reopen it. You can open the existing project you created earlier or start a new one.

  • Select “RETRY ASSESSMENT” at the bottom of the application, as shown here.

    Retry Assessment

For this lab, choose the first option. Close the SMA and go back through the same steps as before (from Running the Tool earlier in this walkthrough). This time, choose the directory with the “needs_more_analysis…” codebase for the input folder.

Run back through the same steps until you reach the “Analysis Complete” screen. This time, you should see a different message in the result panel:

This workload needs further analysis because the Readiness Score was less than 60%. Does this mean that this workload is not a good candidate? Not necessarily. Like in the previous example, we should consider a few other details.

Considerations

When we had a good result, recall that we looked at the Readiness Score, size, and third party imports. Here are a few things to consider when we have a “needs more analysis” result.

Possible code that could not be analyzed:

If the tool encountered many parsing errors (meaning that it could not recognize the input code), that will significantly affect the readiness score. It could indicate that many coding patterns have not been seen before, but that is unlikely. There’s more likely a problem with the exported code or code that does not work in the source platform.

The tool will indicate this in several ways. The quickest is to look at the margin of error in the summary on the report's first page.

Margin of Error

If this number is high (more than 5% would be considered high), then you should check that the source code runs in the source platform first. Then, reach out to the Snowpark Migration Accelerator team to see what could be causing those parsing errors.

Another way you can find this information is to check the Snowpark Migration Accelerator Issue Summary at the end of the report. If there is a high instance of error code SPRKPY1001 (and high would be this error count higher than 5% of the total file count), then you know that there is code that the tool cannot parse. Check if it works in the source, and contact the Snowpark Migration Accelerator team if you think the code is valid.

Non-Supported Spark Libraries

If the score is low, then there are functions present in the codebase that are not yet supported in Snowpark. Some of these will fall into specific categories, indicating that more analysis will be needed, particularly Spark ML, MLlib, and streaming. If you have a high instance of these categories present, that will tell you a lot about this use case (they are using ML libraries and streaming). This is currently not well supported in Snowpark.

Size

You could have a bad score, but a low count of references. If you are not paying attention to the file count and quantity of code in these files, this could give a false impression. For example, you could receive a very low score of 20%, but only five references are found in the codebase and 100 lines of code. This is a tiny use case and can be quickly migrated manually.

Extending from that example, you could have a large workload (say >100000 lines of code), but only a small amount of spark references. In this case, you may have a lot of code that is unnecessary to convert, or it could be customer libraries created by the customer. This would require further analysis.

In this example, there is nothing off on the size. We have 150 files (most of which have references to the Spark API) and <1000 lines of code.

Summary

For this example, we have a low readiness score, not too many difficult third-party libraries, and no size discrepancies (or parsing errors). The low readiness score is caused by a high number of references to the ml, mllib, and streaming libraries in Spark. This is a scenario where you would want to reach out to sma-support@snowflake.com or post in the Snowflake Community forums on Spark Migration to better understand the complexities present in your workload.

Last updated

#332: [SIT-1562] SQL Readiness

Change request updated