Curated Reports

Reporting to guide you on the road to a successful migration

The Snowpark Migration Accelerator (SMA) builds several reports that combine information from the detailed data that is generated by the assessment. Those reports are listed here.

The detailed spreadsheets with the inventoried elements from the assessment are listed on the next couple of pages.

Detailed Report

The detailed report comes in 2 formats:

  • Word Document File (DetailedReport.docx)

  • HTML File (DetailedReport.html)

Note: This page will cover all sections of the detailed report as it appears in the document file. However, that the html file has exactly the same information presented here.

The detailed report is the primary report generated by the SMA. This report contains multiple sections.

Below are the sections of the assessment report with descriptions of each section:

Assessment Summary

The assessment summary is the first page in the detailed report. It describes the SMA and gives you a summary of the references to the Spark API (with the readiness score) as well as a summary of general information about your execution of the tool. It should look something like this:

In the assessment summary, there are a couple of subsections:

  • Spark API Summary: This will give you the count of references (usages) to the Spark API, the count that are ready for conversion, and the Readiness Score. The readiness score is the [usages ready for conversion] / [identified usages]. Note that the readiness score can be misleading, so there is a description to the right on Interpreting the Snowpark Readiness Score and Understanding the Snowpark Readiness Score.

  • Execution Summary: The execution summary has your name, organization, and email that you entered on the project creation page. There is also a unique id number for each execution of the SMA that is given here (and will appear often in the inventories section). There is also a timestamp, and version information for both the SMA and the Snowpark API.

File Summary

On the next page starts the file summary. Depending on the quantity of unique filetypes present in this execution of the tool, this section may take multiple pages in the report.

Note that most of this information is also available in the assessment summary presented in the application itself. To summarize:

  • File Type Summary: Gives the count of each file extension that was recognized, how many lines of code were in those files total, and the percentage of all files that match that file extension.

  • Notebook Sizing by Language: This will give a "t-shirt" sizing for each notebook file based on the files of code present in that file. The notebook "type" (Python, Scala, or SQL) is related to the count of cells related to any of those languages. The sizing is determined as follows:

    • XS - less than 50 lines of code

    • S - between 50 and 200 lines of code

    • M - between 200 and 500 lines of code

    • L - between 500 and 1000 lines of code

    • XL - greater than 1000 lines of code

  • Notebook Stats by Language: a count of the lines of code and cells belonging to a certain technology for all notebooks scanned.

There are two more tables of information that will often appear on the next page:

  • Code File Content: This gives the API (listed as Technology) that the tool is looking for, the count of files with references to that api, and the percentage of all Python files with references to that API. A couple of notes here:

    • This will give the source language (Python or Scala) in the header for this metric (so, Python File Content or Scala File Content).

    • There is no "Total" row. The percentages are not meant to add up to 100%.

  • Code File Sizing: This gives the "t-shirt" sizing for the code files present in this execution of the SMA just as was done for notebook previously. The size buckets are given in the table along with the count of files in each description on the percentage of all code files that meet that description. A couple of notes here:

    • This will give the source language (Python or Scala) in the header for this metric (so, Python File Sizing or Scala File Sizing).

    • There is no "Total" row, but the percentages should still add up to 100%.

Spark API Usage Summary

The Spark API Usage Summary is a deeper dive into what makes up the readiness score presented on the first page of the document. There are three tables in this section. The first is a summary of what is supported or not-supported, the second breaks the readiness score down by Spark API category, and the third breaks it down by Mapping Status.

In each, the concept of supported and unsupported usages (references to the Spark API) will be shared. To be clear on the definition of supported vs. unsupported:

  • Supported: The SMA knows of a conversion or workaround that can take the listed API element to the Snowpark API.

  • Unsupported: The SMA does not know of a conversion or workaround that can take the listed API element to the Snowpark API or it does not recognize the element. This is does not mean that there is no conversion path forward. It simply means that the conversion cannot be automated.

Spark API Supported Summary

The first table is the count of supported vs. unsupported usages in this codebase. Usages count refers to the count of Spark API references in the whole workload, the file count is what files have Spark usages, and the percentage of all Python files is the percentage of all files with a Spark reference. It's important to note that:

  • The file count is specific to supported vs. unsupported usages. A single file may have both supported and unsupported usages. It would be counted for each row.

  • The total is not a sum of the first two rows. The total is how many files have references to the Spark API.

  • The percentage is specific to that particular row. There is no "total" for the percentage. It is not meant to add up to 100%.

Spark API Usage Summary

The second table breaks this out by category of the Spark API. The count of references in each category and the readiness score for each category is given here.

The final reported value (shown in green in the image above) is the Spark API Readiness Score. This should be the same value as reported on the first page.

Spark API Usages by Support Category

The final table is shown by status category:

This breaks the count of references to the Spark API (usages) out by the mapping status or category that the tool defines. These are listed and described on the Spark Reference Categories page of this documentation.

Pandas API Usage Summary

Note that the Pandas API Usage Summary is only available for Python executions of the SMA.

Much like the Spark API usage summary shown above, the Pandas API usage summary list the references to the Pandas API.

These are currently not summarized by category. This will list each unique usage (up to a limit), the count of that usage, the count of files with that usage, and the percentage of all python files with that references to the Pandas API.

References to the Pandas API can be helpful to understand as you setup your dataframes in Snowpark.

Code Import Calls

This section will show anything imported into a file in the codebase. This could be third party libraries or other elements imported into any file in the codebase. This table should exclude imports from other files in the workload.

The table shows imported packages, whether that package is supported in the anaconda distributin in Snowpark, the count of how time it is imported (likely correlated to the number of files using that import), and the percent of all files with that import. It's important to note that the percent column will show a total value of 100%, but the percent values above it do not necessarily need to add up to 100%. It's likely that multiple imports will occur in the same files.

Snowpark Migration Accelerator (SMA) Issue Summary

The SMA generates issues each time it needs to report a warning, conversion error, or parsing error in for the scanned codebase. These issues and working through them are the basis for completing a successful migration using the Snowpark Migration Accelerator.

For more detailed information on the issues and analyzing the issues, review the issue analysis section of this documentation.

In this summary, each issue will be listed along with the issue code (including a link to the documentation site with more information on each issue), the count of how many times that issue occurs in a workload, and the severity level.

The severity levels (Warning, Conversion Error, and Parsing Error) are described above as well as a summary organized by severity level.

As general advice, parsing errors should be resolved immediately, conversion errors should be resolved programmatically, and warnings should be noted and watched as the migration moves forward.

Appendixes

There is currently only one appendix. This shows a description of each mapping status category.


This is the full detailed report. All of the information in the report comes from the inventory files generated by the SMA.

Looking for more information in the detailed report? Reach out to the SMA team at sma-support@snowflake.com.

Summary Report

Like, the detailed report, the summary report comes in two formats:

  • Word Document File (SummaryReport.docx)

  • HTML File (SummaryReport.html)

The summary report is easily described. It is a 1-page subset of what is present in the detailed report. In fact, it is the exact same first page from the detailed report. To better understand what is available in the Summary Report, review the Assessment Summary section of the Detailed Report.


These are the output reports generated by the SMA. Next up is the detailed spreadsheets available in the output.

Last updated