LogoLogo
SnowflakeDocumentation Home
  • Snowpark Migration Accelerator Documentation
  • General
    • Introduction
    • Getting Started
      • Download and Access
      • Installation
        • Windows Installation
        • MacOS Installation
        • Linux Installation
    • Conversion Software Terms of Use
      • Open Source Libraries
    • Release Notes
      • Old Version Release Notes
        • SC Spark Scala Release Notes
          • Known Issues
        • SC Spark Python Release Notes
          • Known Issues
    • Roadmap
  • User Guide
    • Overview
    • Before Using the SMA
      • Supported Platforms
      • Supported Filetypes
      • Code Extraction
      • Pre-Processing Considerations
    • Project Overview
      • Project Setup
      • Configuration and Settings
      • Tool Execution
    • Assessment
      • How the Assessment Works
      • Assessment Quick Start
      • Understanding the Assessment Summary
      • Readiness Scores
      • Output Reports
        • Curated Reports
        • SMA Inventories
        • Generic Inventories
        • Assessment zip file
      • Output Logs
      • Spark Reference Categories
    • Conversion
      • How the Conversion Works
      • Conversion Quick Start
      • Conversion Setup
      • Understanding the Conversion Assessment and Reporting
      • Output Code
    • Using the SMA CLI
      • Additional Parameters
  • Use Cases
    • Assessment Walkthrough
      • Walkthrough Setup
        • Notes on Code Preparation
      • Running the Tool
      • Interpreting the Assessment Output
        • Assessment Output - In Application
        • Assessment Output - Reports Folder
      • Running the SMA Again
    • Conversion Walkthrough
    • Sample Project
    • Using SMA with Docker
    • SMA CLI Walkthrough
  • Issue Analysis
    • Approach
    • Issue Code Categorization
    • Issue Codes by Source
      • General
      • Python
        • SPRKPY1000
        • SPRKPY1001
        • SPRKPY1002
        • SPRKPY1003
        • SPRKPY1004
        • SPRKPY1005
        • SPRKPY1006
        • SPRKPY1007
        • SPRKPY1008
        • SPRKPY1009
        • SPRKPY1010
        • SPRKPY1011
        • SPRKPY1012
        • SPRKPY1013
        • SPRKPY1014
        • SPRKPY1015
        • SPRKPY1016
        • SPRKPY1017
        • SPRKPY1018
        • SPRKPY1019
        • SPRKPY1020
        • SPRKPY1021
        • SPRKPY1022
        • SPRKPY1023
        • SPRKPY1024
        • SPRKPY1025
        • SPRKPY1026
        • SPRKPY1027
        • SPRKPY1028
        • SPRKPY1029
        • SPRKPY1030
        • SPRKPY1031
        • SPRKPY1032
        • SPRKPY1033
        • SPRKPY1034
        • SPRKPY1035
        • SPRKPY1036
        • SPRKPY1037
        • SPRKPY1038
        • SPRKPY1039
        • SPRKPY1040
        • SPRKPY1041
        • SPRKPY1042
        • SPRKPY1043
        • SPRKPY1044
        • SPRKPY1045
        • SPRKPY1046
        • SPRKPY1047
        • SPRKPY1048
        • SPRKPY1049
        • SPRKPY1050
        • SPRKPY1051
        • SPRKPY1052
        • SPRKPY1053
        • SPRKPY1054
        • SPRKPY1055
        • SPRKPY1056
        • SPRKPY1057
        • SPRKPY1058
        • SPRKPY1059
        • SPRKPY1060
        • SPRKPY1061
        • SPRKPY1062
        • SPRKPY1063
        • SPRKPY1064
        • SPRKPY1065
        • SPRKPY1066
        • SPRKPY1067
        • SPRKPY1068
        • SPRKPY1069
        • SPRKPY1070
        • SPRKPY1071
        • SPRKPY1072
        • SPRKPY1073
        • SPRKPY1074
        • SPRKPY1075
        • SPRKPY1076
        • SPRKPY1077
        • SPRKPY1078
        • SPRKPY1079
        • SPRKPY1080
        • SPRKPY1081
        • SPRKPY1082
        • SPRKPY1083
        • SPRKPY1084
        • SPRKPY1085
        • SPRKPY1086
        • SPRKPY1087
        • SPRKPY1088
        • SPRKPY1089
        • SPRKPY1101
      • Spark Scala
        • SPRKSCL1000
        • SPRKSCL1001
        • SPRKSCL1002
        • SPRKSCL1100
        • SPRKSCL1101
        • SPRKSCL1102
        • SPRKSCL1103
        • SPRKSCL1104
        • SPRKSCL1105
        • SPRKSCL1106
        • SPRKSCL1107
        • SPRKSCL1108
        • SPRKSCL1109
        • SPRKSCL1110
        • SPRKSCL1111
        • SPRKSCL1112
        • SPRKSCL1113
        • SPRKSCL1114
        • SPRKSCL1115
        • SPRKSCL1116
        • SPRKSCL1117
        • SPRKSCL1118
        • SPRKSCL1119
        • SPRKSCL1120
        • SPRKSCL1121
        • SPRKSCL1122
        • SPRKSCL1123
        • SPRKSCL1124
        • SPRKSCL1125
        • SPRKSCL1126
        • SPRKSCL1127
        • SPRKSCL1128
        • SPRKSCL1129
        • SPRKSCL1130
        • SPRKSCL1131
        • SPRKSCL1132
        • SPRKSCL1133
        • SPRKSCL1134
        • SPRKSCL1135
        • SPRKSCL1136
        • SPRKSCL1137
        • SPRKSCL1138
        • SPRKSCL1139
        • SPRKSCL1140
        • SPRKSCL1141
        • SPRKSCL1142
        • SPRKSCL1143
        • SPRKSCL1144
        • SPRKSCL1145
        • SPRKSCL1146
        • SPRKSCL1147
        • SPRKSCL1148
        • SPRKSCL1149
        • SPRKSCL1150
        • SPRKSCL1151
        • SPRKSCL1152
        • SPRKSCL1153
        • SPRKSCL1154
        • SPRKSCL1155
        • SPRKSCL1156
        • SPRKSCL1157
        • SPRKSCL1158
        • SPRKSCL1159
        • SPRKSCL1160
        • SPRKSCL1161
        • SPRKSCL1162
        • SPRKSCL1163
        • SPRKSCL1164
        • SPRKSCL1165
        • SPRKSCL1166
        • SPRKSCL1167
        • SPRKSCL1168
        • SPRKSCL1169
        • SPRKSCL1170
        • SPRKSCL1171
        • SPRKSCL1172
        • SPRKSCL1173
        • SPRKSCL1174
        • SPRKSCL1175
      • SQL
        • SparkSQL
          • SPRKSPSQL1001
          • SPRKSPSQL1002
          • SPRKSPSQL1003
          • SPRKSPSQL1004
          • SPRKSPSQL1005
          • SPRKSPSQL1006
        • Hive
          • SPRKHVSQL1001
          • SPRKHVSQL1002
          • SPRKHVSQL1003
          • SPRKHVSQL1004
          • SPRKHVSQL1005
          • SPRKHVSQL1006
      • Pandas
        • PNDSPY1001
        • PNDSPY1002
        • PNDSPY1003
        • PNDSPY1004
      • DBX
        • SPRKDBX1001
    • Troubleshooting the Output Code
      • Locating Issues
    • Workarounds
    • Deploying the Output Code
  • Translation Reference
    • Translation Reference Overview
    • SIT Tagging
      • SQL statements
    • SQL Embedded code
    • HiveSQL
      • Supported functions
    • Spark SQL
      • Spark SQL DDL
        • Create Table
          • Using
      • Spark SQL DML
        • Merge
        • Select
          • Distinct
          • Values
          • Join
          • Where
          • Group By
          • Union
      • Spark SQL Data Types
      • Supported functions
  • Workspace Estimator
    • Overview
    • Getting Started
  • INTERACTIVE ASSESSMENT APPLICATION
    • Overview
    • Installation Guide
  • Support
    • General Troubleshooting
      • How do I give SMA permission to the config folder?
      • Invalid Access Code error on VDI
      • How do I give SMA permission to Documents, Desktop, and Downloads folders?
    • Frequently Asked Questions (FAQ)
      • Using SMA with Jupyter Notebooks
      • How to request an access code
      • Sharing the Output with Snowflake
      • DBC files explode
    • Glossary
    • Contact Us
Powered by GitBook
On this page
  1. User Guide
  2. Assessment
  3. Output Reports

SMA Inventories

Data for Decision Making

PreviousCurated ReportsNextGeneric Inventories

Last updated 29 days ago

The Snowpark Migration Accelerator (SMA) generates a large amount of data when it is run on a codebase. That data is used to create the summary reporting present in both the and the output by the tool. The raw data itself is also made available in the Reports folder when the tool is run as a series of inventories (spreadsheets).

Each inventory can be overwhelming, but understanding this information can unlock additional insight into the condition of both the original workload and the converted workload. Each column in every output file is given below along with the name of each file.

Some of these inventories are also shared via telemetry. More information can be found in the telemetry section of this documentation.

Assessment Report Details

The AssessmentReport.json file contains information that is shown in both the Detailed Report and the Assessment Summary in the application. This information is specifically to populate those reports and likely includes information that is also present in other spreadsheets.

DBX Elements Inventory

The DbxElementsInventory.csv lists the DBX elements found inside notebooks.

  • Element: The DBX element name.

  • ProjectId: Name of the project (root directory the tool was run on)

  • FileId: File where the element was found and the relative path to that file.

  • Count: The number of times that element shows up in a single line.

  • Category: The element category.

  • Alias: The alias of the element (applies just for import elements).

  • Kind: A category for each element. These could include Function or Magic.

  • Line: The line number in the source files where the element was found.

  • PackageName: The name of the package where the element was found.

  • Supported: Whether this reference is “supported” or not. Values: True/False.

  • Automated: Whether or not the tool can automatically convert it. Values: True/False.

  • Status: The categorization of each element. The options are Rename, Direct, Helper, Transformation, WorkAround, NotSupported, NotDefined.

  • Statement: The code where the element was used. [NOTE: This column is not sent via telemetry.]

  • SessionId: Unique identifier for each run of the tool.

  • SnowConvertCoreVersion: The version number for the core code process of the tool.

  • SnowparkVersion: The version of Snowpark API available for the specified technology and run of the tool.

  • CellId: If this element was found in a notebook file, the numbered location of the cell where this element was in the file.

  • ExecutionId: The unique identifier for this execution of the SMA.

Execution Flow Inventory

The ExecutionFlowInventory.csv lists the relations between the different workload scopes, based on the function calls found. This inventory main purpose is to serve as the base for the entry points identification.

  • Caller: The full name of the scope where the call was found.

  • CallerType: The type of the scope where the call was found. This can be: Function, Class, or Module.

  • Invoked: The full name of the element that was called.

  • InvokedType: The type of the element. This can be: Function or Class.

  • FileId: The relative path of the file. (Starting from the input folder the user chose in the SMA tool)

  • CellId: The cell number where the call was found inside a notebook file, if applies.

  • Line: The line number where the call was found.

  • Column: The column number where the call was found.

  • ExecutionId: The execution id.

Checkpoints Inventory

The Checkpoints.csv lists the generated checkpoints for the user workload, these checkpoints are completely capable to be used in the Checkpoints Feature from the Snowflake Exentesion.

  • Name: The checkpoint name (using the format described before).

  • FileId: the relative path of the file (starting from the input folder the user chose in the SMA tool).

  • CellId: the number of cell where the DataFrame operation was found inside a notebook file.

  • Line: line number where the DataFrame operation was found.

  • Column: the column number where the DataFrame operation was found.

  • Type: the use case of the checkpoints (Collection or Validation).

  • DataFrameName: The name of the DataFrame.

  • Location: The assignment number of the DataFrame name.

  • Enabled: Indicates whether the checkpoint is enabled (True or False).

  • Mode: The mode number of the collection (Schema [1] or DataFrame [2]).

  • Sample: The sample of the DataFrame.

  • EntryPoint: The entry point that guide the flow to execute the checkpoint.

  • ExecutionId: the execution id.

DataFrames Inventory

The DataFramesInventory.csv lists the dataframes assignments found in order to be used to generate checkpoints for the user workload.

  • FullName: The full name of the DataFrame.

  • Name: The simple name of the variable of the DataFrame.

  • FileId: The relative path of the file (starting from the input folder the user chose in the SMA tool).

  • CellId: The number of cells where the DataFrame operation was found inside a notebook file.

  • Line: The line number where the DataFrame operation was found.

  • Column: The column number where the DataFrame operation was found.

  • AssignmentNumber: The number of assignments for this particular identifier (not symbol) in the file.

  • RelevantFunction: The relevant function why this was collected.

  • RelatedDataFrames: The full qualified name of the DataFrame(s) involved in the operation (separated by semicolon).

  • EntryPoints: it will be empty for this phase. In a later phase, it will be filled.

  • ExecutionId: the execution id.

Artifact Dependency Inventory

The ArtifactDependencyInventory.csv lists the artifact dependencies of each file analyzed by the SMA. This inventory allows the user to determine which artifacts are needed for the file to work properly in Snowflake.

The following are considered artifacts: a third-party library, SQL entity, source of a read or write operation, and another source code file in the workload.

  • ExecutionId: the identifier of the execution.

  • FileId: the identifier of the source code file.

  • Dependency: the artifact dependency that the current file has.

  • Type: the type of the artifact dependency.

    • UserCodeFile: source code or notebook.

    • IOSources: resource required for input and output operation.

    • ThirdPartyLibraries: a third-party library.

    • UnknownLibraries: a library whose origin was not determined by SMA.

    • SQLObjects: an SQL entity: table or view, for example.

  • Success: If the artifact needs any intervention, it shows FALSE; otherwise, it shows TRUE.

  • Status_Detail: the status of the artifact dependency, based on the type.

    • UserCodeFile:

      • Parsed: the file was parsed successfully.

      • NotParsed: the file parsing failed.

    • IOSources:

      • Exists: the resource of the operation is in the workload.

      • DoesNotExists: the resource of the operation is not present in the input.

    • ThirdPartyLibraries:

      • Supported: the library is supported by Snowpark Anaconda.

      • NotSupported: the library is not supported by Snowpark Anaconda.

    • UnknownLibraries:

      • NotSupported: since the origin was not determined by SMA.

    • SQLObject

      • DoesNotExists: the embedded statement that creates the entity is not in the input source code.

      • Exists: the embedded statement that creates the entity is in the input source code.

  • Arguments: an extra data of the artifact dependency, based on the type.

  • Location: the collection of cell ID and line number where the artifact dependency is being used in the source code file.

Files Inventory

The files.csv has an inventory of each file present in that execution of the tool. The filetype and size are reported in this inventory.

  • Path: the filepath for each file. (Note: this is only within the root directory. For example, if a file is in the root folder only the filename will be recorded.)

  • Technology: source language scanner (Python or Scala)

  • FileKind: whether the file is a file with source code or another kind of file (like a text file or log)

  • BinaryKind: whether the file is readable or if it’s a binary file

  • Bytes: size of the file in bytes.

  • SupportedStatus: files are neither supported nor not-supported, so this file only reports "DoesNotApply"

Import Usages Inventory

The ImportUsagesInventory.csv has all the referenced import calls in the codebase. An import is classified as an external library that gets imported in at any point in the file.

  • Element: is the unique name for the actual spark reference.

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: the number of times that element shows up in a single line.

  • Alias: the alias of the element (if any).

  • Kind: null/empty value because all elements are imports.

  • Line: the line number in the source files where the element was found..

  • PackageName: the name of the package where the element was found.

  • Supported: Whether this reference is “supported” or not. Values: True/False.

  • Automated: null/empty. This column is deprecated.

  • Status: value Invalid. This column is deprecated.

  • Statement: the code where the element was used. [NOTE: This column is not sent via telemetry.]

  • SessionId: Unique identifier for each run of the tool.

  • SnowConvertCoreVersion: the version number for the core code process of the tool

  • SnowparkVersion: the version of snowpark API available for the specified technology and run of the tool.

  • ElementPackage: the package name where the imported element is declared (when available).

  • CellId: if this element was found in a notebook file, the numbered location of the cell where this element was in the file.

  • ExecutionId: the unique identifier for this execution of the SMA.

  • Origin: category of the import reference. Possible values are BuiltIn, ThirdPartyLib, or blank.

  • FullName: It represents the correct full path for the current element.

Input Files Inventory

Similar to the files inventory, the InputFilesInventory.csv has a list of every file by filetype and size.

  • Element: filename (same as FileId)

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: count of files with that filename

  • SessionId: Unique identifier for each session of the tool.

  • Extension: the file’s extension

  • Technology: the source file’s technology based on extension

  • Bytes: size of the file in bytes

  • CharacterLength: count of characters in the file

  • LinesOfCode: lines of code in the file

  • ParsingResult: “Successful” if the cell was fully parsed, “Error” if it was not parsed.

Input and Ouput Files Inventory

The IOFilesInventory.csv lists all external elements that are being read from or written to in the codebase.

  • Element: the file, variable, or other element being read or written

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: count of files with that filename

  • isLiteral: if the read/write location was in a literal

  • Format: if the SMA can determine the format of the element (such as csv, json, etc.)

  • FormatType: if the format above is specific

  • Mode: value will be Read or Write depending on whether there is a reader or writer

  • Supported: Whether this operation is supported in Snowpark

  • Line: the line in the file where the read or write occurs

  • SessionId: Unique identifier for each session of the tool

  • OptionalSettings: if a parameter is defined in the element, it will be listed here

  • CellId: cell id where that element was in that FileId (if in a notebook, null otherwise)

  • ExecutionId: Unique identifier for each run of the tool

Issue Inventory
  • Code: is the unique code for the issues reported by the tool .

  • Description: the text describing the issue and the name of the spark reference when applies.

  • Category: the classification of each issue. The options are Warning, Conversion Error, and Parser Error, Helper, Transformation, WorkAround, NotSupported, NotDefined.

  • NodeType: the name associated to the syntax node where the issue was found.

  • FileId: file where the spark reference was found and the relative path to that file.

  • ProjectId: name of the project (root directory the tool was run on)

  • Line: the line number in the source file where the issue was found.

  • Column: the column position in the source file where the issue was found.

Joins Inventory

The JoinsInventory.csv has an inventory of all dataframe joins done in that codebase.

  • Element: line number where the join begins (and ends, if not on a single line)

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: count of files with that filename

  • isSelfJoin: TRUE if the join is a self join, FALSE if not

  • HasLeftAlias: TRUE if the join has a left alias, FALSE if not

  • HasRightAlias: TRUE if the join has a right alias, FALSE if not

  • Line: line number where the join begins

  • SessionId: Unique identifier for each session of the tool

  • CellId: cell id where that element was in that FileId (if in a notebook, null otherwise)

  • ExecutionId: Unique identifier for each run of the tool

Notebook Cells Inventory

The NotebookCellsInventory.csv gives an inventory of all cells in a notebook based on the source code for each cell and the lines of code in that cell.

  • Element: source language (Python, Scala, or SQL)

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: count of files with that filename

  • CellId: cell id where that element was in that FileId (if in a notebook, null otherwise)

  • Arguments: null (this field will be empty)

  • LOC: lines of code in that cell

  • Size: count of characters in that cell

  • SupportedStatus: TRUE, unless there are any unsupported elements in that cell (FALSE)

  • ParsingResult: “Successful” if the cell was fully parsed, “Error” if it was not parsed.

Notebook Size Inventory

The NotebookSizeInventory.csv lists the size in lines of code of different source languages present in notebook files.

  • Element: filename (for this spreadsheet, it is the same as the FileId)

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: count of files with that filename

  • PythonLOC: Python lines of code present in notebook cells (will be 0 for non-notebook files)

  • ScalaLOC: Scala lines of code present in notebook cells (will be 0 for non-notebook files)

  • SqlLOC: SQL lines of code present in notebook cells (will be 0 for non-notebook files)

  • Line: null (this field will be empty)

  • SessionId: Unique identifier for each session of the tool.

  • ExecutionId: Unique identifier for each run of the tool.

Pandas Usages Inventory

[Python Only] The PandasUsagesInventory.csv lists every reference to the Pandas API present in the scanned codebase.

  • Element: is the unique name for the actual pandas reference.

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: the number of times that element shows up in a single line.

  • Alias: the alias of the element (applies just for import elements).

  • Kind: a category for each element. These could include Class, Variable, Function, Import and others.

  • Line: the line number in the source files where the element was found..

  • PackageName: the name of the package where the element was found.

  • Supported: Whether this reference is “supported” or not. Values: True/False.

  • Automated: Whether or not the tool can automatically convert it. Values: True/False.

  • Status: the categorization of each element. The options are Rename, Direct, Helper, Transformation, WorkAround, NotSupported, NotDefined.

  • Statement: how that element was used. [NOTE: This column is not sent via telemetry.]

  • SessionId: Unique identifier for each run of the tool.

  • SnowConvertCoreVersion: the version number for the core code process of the tool

  • SnowparkVersion: the version of Snowpark API available for the specified technology and run of the tool.

  • PandasVersion: version number of the pandas API that was used to identify elements in this codebase

  • CellId: cell id where that element was in that FileId (if in a notebook, null otherwise)

  • ExecutionId: Unique identifier for each run of the tool.

Spark Usages Inventory
  • Element: is the unique name for the actual spark reference.

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: the number of times that element shows up in a single line.

  • Alias: the alias of the element (applies just for import elements).

  • Kind: a category for each element. These could include Class, Variable, Function, Import and others.

  • Line: the line number in the source files where the element was found..

  • PackageName: the name of the package where the element was found.

  • Supported: Whether this reference is “supported” or not. Values: True/False.

  • Automated: Whether or not the tool can automatically convert it. Values: True/False.

  • Status: the categorization of each element. The options are Rename, Direct, Helper, Transformation, WorkAround, NotSupported, NotDefined.

  • Statement: the code where the element was used. [NOTE: This column is not sent via telemetry.]

  • SessionId: Unique identifier for each run of the tool.

  • SnowConvertCoreVersion: the version number for the core code process of the tool

  • SnowparkVersion: the version of Snowpark API available for the specified technology and run of the tool.

  • CellId: if this element was found in a notebook file, the numbered location of the cell where this element was in the file.

  • ExecutionId: the unique identifier for this execution of the SMA.

SQL Statements Inventory

The SqlStatementsInventory.csv has a count of SQL keywords present in sql spark elements.

  • Element: name for the code element where the SQL was found

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: the number of times that element shows up in a single line.

  • InterpolationCount: count of other elements inserted into the element

  • Keywords: a dictionary of the keywords and count of each

  • Size: character count for each sql statement

  • LiteralCount: count of strings in this element

  • NonLiteralCount: sql components of the element not in a literal

  • Line: the line number where that element occurs

  • SessionId: Unique identifier for each session of the tool.

  • CellId: cell id where that element was in that FileId (if in a notebook, null otherwise)

  • ExecutionId: Unique identifier for each run of the tool.

SQL Elements Inventory

The SQLElementsInventory.csv has a count of SQL present in sql spark elements.

  • Element: Name for the code element where the SQL was found (e.g., SqlFromClause, SqlSelect, SqlSelectBody, SqlSignedNumericLiteral).

  • ProjectId: Name of the project (root directory the tool was run on).

  • FileId: File where the SQL reference was found and the relative path to that file.

  • Count: The number of times that element shows up in a single line.

  • NotebookCellId: The notebook cell ID.

  • Line: The line number where that element occurs.

  • Column: The column number where that element occurs.

  • SessionId: Unique identifier for each session of the tool.

  • ExecutionId: Unique identifier for each run of the tool.

  • SqlFlavor: The SQL flavor being used (e.g., Spark SQL, Hive SQL).

  • RootFullName: The fully qualified name of the root element in the code.

  • RootLine: The line number where the root element is located.

  • RootColumn: The column number where the root element is located.

  • TopLevelFullName: The fully qualified name of the top-level SQL statement or code block.

  • TopLevelLine: The line number where the top-level statement is located.

  • TopLevelColumn: The column number where the top-level statement is located.

  • ConversionStatus: The status of the SQL conversion (e.g., Success, Failed).

  • Category: The category of the SQL element (e.g., DDL, DML, DQL, DCL, TCL).

  • EWI: The EWI (Error Warning Information) code associated with the SQL element.

  • ObjectReference: The reference name of the object involved in the SQL (e.g., table, view).

SQL Embedded Usage Inventory

The SqlEmbeddedUsageInventory.csv has a count of SQL keywords present in sql spark elements.

  • Element: Name for the code element where the SQL was found (e.g., SqlFromClause, SqlSelect, SqlSelectBody, SqlSignedNumericLiteral).

  • ProjectId: Name of the project (root directory the tool was run on).

  • FileId: File where the SQL reference was found and the relative path to that file.

  • Count: The number of times that element shows up in a single line.

  • ExecutionId: Unique identifier for each run of the tool.

  • LibraryName: Name of the library being used.

  • HasLiteral: Indicates whether the element contains literals.

  • HasVariable: Indicates whether the element contains variables.

  • HasFunction: Indicates whether the element contains functions.

  • ParsingStatus: Indicates the parsing status (e.g., Success, Failed, Partial).

  • HasInterpolation: Indicates whether the element contains interpolations.

  • CellId: The notebook cell ID.

  • Line: The line number where that element occurs. Column: The column number where that element occurs.

Third Party Usages Inventory

The ThirdPartyUsagesInventory.csv has

  • Element: is the unique name for the third party reference.

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where the spark reference was found and the relative path to that file.

  • Count: the number of times that element shows up in a single line.

  • Alias: the alias of the element (if any).

  • Kind: categorization of the element such as variable, type, function, or class.

  • Line: the line number in the source files where the element was found.

  • PackageName: package name for the element (concatenation of ProjectId and FileId in Python).

  • Statement: the code where the element was used. [NOTE: This column is not sent via telemetry.]

  • SessionId: Unique identifier for each session of the tool.

  • CellId: cell id where that element was in that FileId (if in a notebook, null otherwise)

  • ExecutionId: Unique identifier for each execution of the tool.

Packages Inventory

The packagesInventory.csv has

  • Element: is the name of the package.

  • ProjectId: name of the project (root directory the tool was run on)

  • FileId: file where package was found and the relative path to that file.

  • Count: the number of times that element shows up in a single line.

Tool Execution Summary

The tool_execution.csv has some basic information about this run of the SMA tool.

  • ExecutionId: Unique identifier for each run of the tool.

  • ToolName: the name of the tool. Values: PythonSnowConvert SparkSnowConvert (scala tool)

  • Tool_Version: the version number of the tool.

  • AssemblyName: the name of the code processor (essentially, a longer version of the ToolName)

  • LogFile: whether a log file was sent on an exception/failure

  • FinalResult: where the tool stopped if there was an exception/failure

  • ExceptionReport: if an exception report was sent on an exception/failure

  • StartTime: The timestamp for when the tool started executing.

  • EndTime: The timestamp for when the tool stopped executing.

  • SystemName: The serial number of the machine where the tool was executing (this is only used for troubleshooting and license validation purposes).

The Issues.csv lists every conversion issue found in that codebase. A description, the exact location of the issue in the file, and a code associated with that issue will be reported in this document. You can find out more about each issue in the section of this documentation.

The SparkUsagesInventory.csv shows the exact location and usage for each reference to the Spark API. This information is used to build the .

assessment summary
curated reports
issue analysis
Readiness Score