Curated Reports
Reporting to guide you on the road to a successful migration
Last updated
Reporting to guide you on the road to a successful migration
Last updated
The Snowpark Migration Accelerator (SMA) builds several reports that combine information from the detailed data that is generated by the assessment. Those reports are listed here.
The detailed spreadsheets with the inventoried elements from the assessment are listed on the next couple of pages.
The DetailedReport.html is deprecated since Spark Conversion Core V2.43.0
Note: This page will cover all sections of the detailed report as it appears in the document file.
The detailed report is the primary report generated by the SMA. This report contains multiple sections.
Below are the sections of the assessment report with descriptions of each section:
The first page in the detailed report has a brief description of the SMA tool.
This page has a subsection:
Execution Summary: The execution summary has your organization name and email that you entered on the project creation page. There is also a unique id number for each execution of the SMA that is given here (and will appear often in the inventories section). There is also a timestamp, and version information for both the SMA and the Snowpark API.
On the next page starts the readiness scores summary. Provides the readiness scores for Spark API and Third-Party libraries with detailed information in how to interpret them. These scores are simple metrics used to show the user how "ready" a codebase is for Snowflake. It should look something like this:
In this section also provides more information for each readiness score:
Spark API: This will give you the count of references (usages) to the Spark API, the count that are ready for conversion. The readiness score is the [usages ready for conversion] / [identified usages].
Third Party Libraries: This will give you the percentage of imported third-party libraries that are categorized as supported in Snowflake. The readiness score is the [third party import supported in Snowflake] / [all third-party imports]
On the next page starts the file summary. Depending on the quantity of unique filetypes present in this execution of the tool, this section may take multiple pages in the report.
Note that most of this information is also available in the assessment summary presented in the application itself.
File Type Summary: Gives the count of each technology that was recognized, how many lines of code were in those files total, and the percentage of all files that match that file extension.
File Extension Summary: Gives the count of each file extension that was recognized, how many lines of code were in those files total, and the percentage of all files that match that file extension.
Code File Sizing: This gives the "t-shirt" sizing for the code files present in this execution of the SMA . The size buckets are given in the table along with the count of files in each description on the percentage of all code files that meet that description.
Notebook Stats by Language: A count of the lines of code and cells belonging to a certain technology for all notebooks scanned.
Notebook Sizing by Language: This will give a "t-shirt" sizing for each notebook file based on the files of code present in that file. The notebook "type" (Python, Scala, or SQL) is related to the count of cells related to any of those languages. The sizing is determined as follows:
XS - less than 50 lines of code
S - between 50 and 200 lines of code
M - between 200 and 500 lines of code
L - between 500 and 1000 lines of code
XL - greater than 1000 lines of code
The Spark API Summary is a deeper dive into what makes up the readiness score presented on the Readiness Score section. There are four tables in this section. The first one is a summary of what files have Spark API references, the second table is a summary of what is supported or unsupported, the third breaks the readiness score down by Spark API category, and the fourth breaks it down by Mapping Status.
In each, the concept of supported and unsupported usages (references to the Spark API) will be shared. To be clear on the definition of supported vs. unsupported:
Supported: The SMA knows of a conversion or workaround that can take the listed API element to the Snowpark API.
Unsupported: The SMA does not know of a conversion or workaround that can take the listed API element to the Snowpark API or it does not recognize the element. This is does not mean that there is no conversion path forward. It simply means that the conversion cannot be automated.
Files with Spark Usages: This table breaks down by technology. This table gives you an idea of how many Spark references are in the whole workload.
Files with Spark Usages by Support Status: This table is the count of supported vs unsupported usages in the source codebase order by technology.
Spark API Usage Summaries: This table breaks down by category of the Spark API. Each category is given the count of each type by supported or not supported for Python and Scala. The final reported value is the Spark API Readiness Score. This should be the same value as reported on the Readiness Score section.
Spark API Usage by Support Category: This breaks the count of references to the Spark API (usages) out by the mapping status or category that the tool defines. These are listed and described on the Spark Reference Categories page of this documentation.
Note that the Pandas API Usage Summary is only for thus executions that have Python files.
Much like the Spark API Summary shown above, the Pandas API Summary list the references to the Pandas API.
Files with Pandas Usages: This table breaks down by technology. This table gives you an idea of how many Pandas references are in the whole workload.
Pandas API Usages Summary: This table is order based on the Pandas library in the source codebase and gives you an idea of how many times a specific Pandas library is been used.
This section will show anything imported into a file in the codebase. This could be third party libraries or other elements imported into any file in the codebase. This table should exclude imports from other files in the workload.
The table shows imported packages, whether that package is supported in the anaconda distribution in Snowpark, the count of how time it is imported (likely correlated to the number of files using that import), and the percent of all files with that import. It's important to note that the percent column will show a total value of 100%, but the percent values above it do not necessarily need to add up to 100%. It's likely that multiple imports will occur in the same files.
SQL Usages by File Type: This table breaks down by technology. This table gives you an idea of how many SQL files or SQL cells are being identified in the whole workload.
SQL Usages by Support Status: This table is order based on if exist or not an equivalent in Snowflake.
The SMA generates issues each time it needs to report a warning, conversion error, or parsing error in for the scanned codebase. These issues and working through them are the basis for completing a successful migration using the Snowpark Migration Accelerator.
For more detailed information on the issues and analyzing the issues, review the issue analysis section of this documentation.
In this summary, each issue will be listed along with the issue code (including a link to the documentation site with more information on each issue), the count of how many times that issue occurs in a workload, and the severity level.
The severity levels (Warning, Conversion Error, and Parsing Error) are described above as well as a summary organized by severity level.
As general advice, parsing errors should be resolved immediately, conversion errors should be resolved programmatically, and warnings should be noted and watched as the migration moves forward.
Appendixes
There is currently only one appendix. This shows a description of each mapping status category.
This is the full detailed report. All of the information in the report comes from the inventory files generated by the SMA.
Looking for more information in the detailed report? Reach out to the SMA team at sma-support@snowflake.com.
The Summary Report is deprecated since Spark Conversion Core V2.43.0
These are the output reports generated by the SMA. Next up is the detailed spreadsheets available in the output.