Understanding the Assessment Summary

Understanding the assessment and what come next

When an assessment is run, an initial report and summary is shown to the user in the application. Let's review that now. This is called the assessment summary report. View the report by clicking on View Results after running the assessment:

This will take you to the assessment report. However, note that all the information shown here comes from the inventory files generated in the Output Reports folder when the SMA is run. This is simply a summary of that information. For a more in depth summary, open the Detailed Report document in the output directory.

Each section of the Assessment Results viewable in the application is detailed below.

Standard Assessment Summary

When you enter the summary it will look something like this:

Note that there are two options at the top: Assessment Report and Code Compare. This page will focus on the report. You can learn more about the code compare feature by visiting that section in this documentation.

At the top of the report, you will find a drop down menu with the date of your execution displayed. If you have run the accelerator multiple times while the same project is open, you will see multiple options in the dropdown menu. The options you see are executions from the project you have open. You will also find the execution ID and source language for that particular execution.

On to the first section, the Spark API Summary.

Spark API Summary

This includes includes several items, but the key metric is the Readiness Score.

Let's take a look at each section:

  1. Readiness Score - Before we get too far, let’s define what this Readiness Score is and how it is calculated. This is the Spark API Readiness Score. It is the primary indicator of readiness that the Snowpark Migration Accelerator produces. However, this readiness is based only on references to the Spark API, not on any other factors or third-party libraries that may be present in the codebase. As a result, this readiness score can be misleading. There are other factors that you will want to take into account including the presence of third-party libraries. The readiness score should be used as a starting point. The Readiness Score is simply a measure of how many references to the Spark API can be converted to the Snowpark API divided by the total number of references to the Spark API. Both of these values are present in the Spark API Summary section (3413 / 3748 in this example). The higher this value, the better prepared the workload is to work with the Snowpark API. However, recall that this does not take into account any third-party libraries. Note that the Readiness Score can also be found on the first page of the detailed report generated by the tool.

  2. Identified Usages - This is the count of references to the Spark API that were found in a workload. A reference is any function, element, or import statement that refers to the Spark API.

  3. Usages Ready for Conversion - This is the count of references to the Spark API that can be converted to the Snowpark API.

  4. Explanation of the Readiness Score - The explanation on how to interpret the readiness score is included in every output assessment screen.

Below the Spark API Summary is the Spark API Usages section.

Spark API Usages

In this section, there are three tabs: Overall Usage Classification, Spark API Usage Categorization, and Spark API Usages By Status. Each one will be reviewed below.

Overall Usage Classification

This tab has a table with three rows on supported usages, unsupported usages, and the total usage

More detail is given here:

  1. Usages Count - This is total count of references to the Spark API that were found. Each row represents if those usages are supported, unsupported, and the total is given below.

  2. Files with at least 1 usage - This is the count of files that have at least 1 references to the Spark API. If there is less than the total number of files, then there are files without any references to the Spark API.

  3. Percentage of All Files - This is the percentage of all files that have at least 1 usage. That percentage is calculated as the total code files with at least 1 usage (the column to the left of this one) divided by all code files.

Spark API Usage Categorization

This tab shows what categories of spark references are found in the scanned codebase. The Readiness Score is again (this is the same readiness score presented at the top of this page) given, but then is broken down by each category.

You can see the categorizations that are available in the section on Spark Reference Categories.

Spark API Usages By Status

The last tab will show another categorical breakdown, but this time by mapping status.

There are 7 primary mapping statuses. They are also detailed in the section on Spark Reference Categories.

Import Calls

The section on import calls shows the most commonly imported third part libraries in the codebase. This section should NOT include calls to the Spark API as those are included in the section on the Spark API.

In this table, you can find:

  • 5 rows with

    • the top 3 most commonly used import libraries

    • an "Other" row the will sum all the remaining packages

    • a "Total" row that sums all the imported libraries

  • Supported in Snowpark column that says if a given third party library has support in Snowflake's list of supported packages in Snowpark.

  • Count of instances of an import of that library. Note that this is not a count of files that library was imported in, but a count of all import statements to that library.

  • Percentage of all files with an import call. This should reflect a percentage based on file, not based on count, though that may be confusing. For example, in the above table, sys is imported 29 times in 28.16% of files. This makes sense given that it's likley only imported once per file. However, in the "Other" category, there are 56 imports that are not listed in this table. But those imports are present in 100% of the files. To see exactly what imports are recorded in each file, you can use the ImportUsagesInventory.csv file in the Output Reports.

File Summary

The file summary has several tables in it. These tables show file-level metrics based on filetype and size. This helps to better understand the volume of code present in this codebase and can help get a rough estimate of the volume of work to be done.

Note that the Snowpark Migration Accelerator scans all files that are present in your source codebase. This includes files that are not just code files, but any other files. Most of this information comes from the files.csv file output by the SMA.

Note that the File Summary has several sections. Let's walk through them all.

File Type Summary

The File Type Summary shows what file extensions were present in the scanned codebase.

The shown extensions are related to the type of code files that can be scanned by the SMA. For each extension, the following is shown:

  • Lines of Code - coutn of all lines of code in all files with that extension. Note that these are "code lines", and do not include "comment" or "blank" lines.

  • File Count - count of all files with that extension.

  • Percentage of Total Files - This shows the percent of the total represented by that particular extension.

One of the simplest use cases for this is to determine if this workload has a lot of script files, notebook files, or if it's dominated by SQL files as opposed to other code files.

Notebook Sizing by Language

If you have any notebooks in your codebase, the tool will give a "t-shirt" sizing for them based on the lines of code present in the notebook.

These sizings are divided by language based on the primary presumed language present in the notebook itself.

Notebook Stats By Language

This table shows a quick count of lines of code and cells in all notebooks based on the language.

Note that these are divided by language based on the primary presumed language present in the notebook itself.

Code File Content

Note that if you are running the SMA with Python as your source, this tab will say "Python File Content", and if you are running the SMA with Scala as your source, this tab will say "Scala File Content".

This section actually refers to the count of files that have references to the Spark API. The first row will be "Spark Usages", and this will indicate the count of files that have references to the Spark API and the percentage that this set of files represents out of all code files present in the scanned codebase.

This can be a great indicator on the percentage of files that may not have references to the Spark API. If the percentage shown is low, there are a lot of code files that do not have Spark references and may indicate that there is less to migrate than originally thought.

Code File Sizing

Note that if you are running the SMA with Python as your source, this tab will say "Python File Sizing", and if you are running the SMA with Scala as your source, this tab will say "Scala File Sizing".

The sizing refers to the "t-shirt" sizing for code files in this codebase. The description for each size is given in the "Size" column. There is also a percentage breakdown representing each sizing category's percentage of all python files.

This information is helpful to understand the breakdown in size of files that are present in this codebase. With a larger percentage of small files, that could indicate less complex workloads and more simple ones.

Issues Summary

The issues summary is essential for understanding the conversion issues if you move from the assessment to the conversion phase. The issue output shows the EWI's (Errors, Warnings, and Issues) from the scanned codebase. These issues are gone over in detail in the Issue Analysis section of the documentation.

In the top part of the issue summary, there is a table that sums the different issues that are present.

There are two rows in this table:

  • Number of issues is the total count of issue codes present in each category.

  • Number of unique issues is the unique error codes present in each category.

The issues are broken up into three categories:

  • Warnings are issues that do not necessarily need any resolution, but should be kept in mind as you move into testing the output code. This is usually something that may be slightly different in some edge cases because the source and the target. It could also be a message to let you know that something has changed and looks different to what you would have seen in the source platform.

  • Conversion issues are elements that did not convert or that require additional input to run in the target platform.

  • Parsing issues are code elements that the tool was not able to interpret (it could not parse the code). These issues are critical and should be resolved immediately. In most cases, they represent code that does not compile in the source platform or that is otherwise incorrect because of the way it was extracted. In some cases, this could be a grammar pattern that the SMA does not yet recognize. If you think the source code is syntactically correct but are still receiving a parsing error, you should report an issue in the application. Be sure to include the section of source code that is not being parsed.

The counts of each of these are summarized in the table.

Below this table, is a list of each unique issue code.

This give you the Issue Code, a description of the code, the count of how many times that code occurred, and the severity level (based on the Warning, Conversion Error, or Parsing Error categories shown above). Each issue code is hyperlinked to a page in this documentation that describes the code, provides an example, and gives a suggested resolution. (For example, clicking on the the top code shown in the image above in the application will take you to this page on issue code SPRKPY1002.)

Only the top 5 issues are shown by default, but you can expand the list to include the all issue by choosing the SHOW ALL ISSUES button below the table. You can also search for specific issue by using the search bar above the table.

These issues are most helpful in conversion, but in assessment mode, it is still critical to understand the conversion work that remains to be done in any execution. As with all of these reports, you can find the full list of every issue along with it's location in the issue inventory that is created in the Reports folder.

Execution Summary

The execution summary has details on the tool execution that was just done. It has the user and folder information that the user input in the tool in the Project Creation screen, as well as the version numbers for the SMA and the Snowpark API.

Appendixes

The appendixes have additional reference information that might be relevant to understanding the output of the SMA tool.

This information may change over time, but it will always be reference information generic to using the SMA as opposed to information specific to the scanned codebase.


This is what the majority of users will see when the run the SMA. If you are user an older version, it's possible you would see the Abbreviated Assessment Summary. This is shown below.

Abbreviated Assessment Summary [Deprecated]

If you have a low readiness score, your summary could look like this:

In this summary, you will find the following information:

  • Execution Date: This will show you the date and time when your execution took place. You can also choose from any execution that was run with this project information.

  • Result: This will indicate whether your workload is a good candidate for a migratin or needs further analysis based on the readiness score. Note that the readiness score does not tell the user that a workload is ready for snowpark. It is merely the first indicator of whether a workload is a good candidate for migration.

  • Input Folder: The input directory that was scanned.

  • Output Folder: The output directory where the accelerator placed the output reports (and for conversion, code files).

  • Total Files: This is the total amount of files that was scanned.

  • Execution Time: The time it took for the tool to run on the source codebase.

  • Identified Spark References: The count of all references to the Spark API present in the source codebase.

  • Count of Python (or Scala) Files: This is the total amount of code files that were identified as the source technology (language).


Next Steps

  • Retry Assessment - After you execute an assessment, on the Assessment Results page you can select the Retry Assessment button to run the assessment again. After examining the output, it is not uncommon to make some changes in the source code and run the assessment again.

  • View Log Folder: This will take you to the log folder. This folder is created when you run the assessment and a complete set of logs made up of several text files that describe the status of the execution will be deposited here. You can view the logs in any text editor. The logs will be most valuable when it's necessary to troubleshoot through an execution that failed. If there was a failure, the tool may ask you to send the logs. The logs in this folder is what will be sent.

  • View Reports: This will take you to the reports folder. Like the logs folder, this one is created when you run the assessment. The reports folder contains the output of the assessment including the detailed report, spark reference inventory, and other inventories built from the source codebase. Each report is detailed in this documentation.

  • Continue to Conversion: Conversion may seem like the logical next step, but before you move on to conversion, make sure you've reviewed the assessment output. To run a conversion, you will need an access code. You can learn more about this in the conversion section of this documentation.

The output reporting generated each time the tool executes will be detailed on the next few pages.

Last updated