Assessment Report
Last updated
Last updated
The purpose of this document is to provide guidance for users to understand the summary results from the SnowConvert conversion tools. It will guide thru the different metrics returned and how these metrics can be used to determine the automation level achieved, and the amount of manual effort needed to make the output code into functionally equivalent Snowflake code.
Most of the concepts presented in this document are already explained on the Report's main page. But here is some other helpful information about the most important information of the image above.
Total Parsing Errors: The number of times that the conversion tool found text fragments that could not be recognized as syntactically correct elements for the source language under conversion. A parsing error could have a little or large impact. It is important to determine the number of LOC affected by parsing errors and how much they represent of the total workload. Sometimes parsing errors can occur due to encoding issues or because the workload needs some preparation.
Code Conversion Rate: The conversion rate is the percentage of the total source code that was successfully converted by SnowConvert into functionally equivalent Snowflake code. Every time that the tool identifies not supported elements, i.e, fragments in the input source code that were not converted into Snowflake, this will affect the conversion rate. Check the conversion rate section, for a deeper explanation.
Identified Objects: The count of all the Top Level DDL Objects ( Table, View, Procedure, etc.. ) that the SnowConvert identified. If there were a parsing error on an object, it wouldn't be an Identified Object. Example: The first objects from line #1 to line #6. There is evidently a parsing error, so the SnowConvert cannot identify that as an object.
As mentioned before, when an element is marked as not supported (due to parsing errors or because there is no support for it in Snowflake) the conversion rate will be punished. How much of the conversion rate is punished for each not-supported element depends on the unit of code selected, two units are available: characters or lines.
When characters of code are selected, the total amount of characters in the input source will represent the overall units to convert. So, if there are 100 characters total and there is only one not-supported element with 10 characters, the conversion rate will be 90%. The conversion rate using characters is more precise because only the characters belonging to the not-supported elements are punished but, it is harder to manually calculate and understand.
When lines of code are chosen (default option), the number of lines of code in the input source code will represent the overall units to convert, and lines containing not-supported elements will be entirely considered as not-supported units of code. So, if the same input code with those 100 characters is split into 5 lines of code, and the not-supported element is in just one line, then the conversion rate will be 80%; the entire line containing the not-supported element is considered not supported as well. The conversion rate using lines is easier to follow however, it is less accurate because entire lines of code containing not-supported elements are punished (even if there are other supported elements in that same line).
The next example shows how the conversion rate is calculated using both metrics.
Input source code
The above code has exactly 100 code characters because whitespaces and line breaks are not considered code characters. The comment above Table1
belongs to the table and is part of those 100 characters. This is the output code that SnowConvert generates for this input.
Output source code
The second column of the table has a parsing error and therefore this is a not-supported element. Let's take a look a how the conversion rate is punished using the two units available.
Here is the breakdown of the characters in the input code
Total amount of code characters: 100
Code characters in not-supported elements: 10
Result: 90.00%
Notice there are 11 characters in the 4th line but only 10 are marked as not supported. This is because of how the parsing recovery mechanism works. When the parser encounters an error, it will consider all the following characters, until the next delimiter character, in this case the comma (','), as part of the error. That means that the amount of not-supported characters in any input code can greatly depend on the type of parsing errors. In some cases, the parser will be able to recover close to where the actual error is, but sadly in other cases, a lot of code can be swallowed by the error.
The conversion rate using lines of code as units is much simpler to calculate.
Total amount of lines of code: 5
Lines of code with not-supported elements: 1
Result: 80%
When using lines of code as the unit, the conversion rate will greatly depend on how the input code is formatted. For example, the following two code samples are equivalent but in the first case all the code is put into the same line and in the second case the code is split into 5 lines of code
Notice that the second column that is being referenced in the SELECT has an error because it starts with an invalid character. In the first case, since the whole code is in the same line, the conversion rate will be 0%. But in the second case, since the code is split, only one line of code is punished and therefore the conversion rate will be 80%.
There could be differences in the conversion results on the same migration between different operative systems.
This occurs because most of the time, Microsoft Windows uses CRLF line-breaking in their files. This format uses the characters \r\n
, but UNIX OS only \n
(LF).
Due to that format difference, when our code processor is reading the input files, it will count the CRLF format as two characters and just one in LF files. These counting differences generate different results in the conversion rates, specifically, in string expressions present in your code.
To avoid this problem, you can use Visual Studio Code or similar tools to change the line-breaking format.