Links

SnowConvert for Spark - Qualification Tool

Ready to get a better understanding of the PySpark code you have? SnowConvert is here to help.
Welcome to the SnowConvert for Spark – Qualification Tool. Let this be your guide to a successful analysis with the qualification tool.
In this guide, we will work through the following items:

Qualification Process

This tool is designed to give you some inventory information about the spark references and keywords that you have present in your codebase. To get the most out of this tool, you should use it as part of a qualification process.
  • Identify workloads that may have Spark code
  • Analyze these workloads by running them through this tool
  • Work with Snowflake to determine how ready this codebase is for migration.
For more information on a more detailed qualification process, please reach out to [email protected].

Accessing the Tool

If you’re ready to start using the SnowConvert for Spark Qualification Tool, you will need to download it onto your local machine or into a container (SaaS version coming soon!). Follow the steps below:
  1. 2.
    Choose the SnowConvert for Spark (Scala or Python) download link.
And that's it!
Currently, the tool is specific to the source language the user specifies. If a user selects Scala, SnowConvert for Spark Scala will download in qualification mode. If a user selects Python (PySpark), SnowConvert for PySpark will download in qualification mode. Note that ANY files can be run through the tool, not just Scala or Python files. The output reports may change as a result. Specifically, for the Spark Reference Inventory to generate, you will need a file with the following extensions:
  1. 1.
    For SnowConvert for Spark Scala -> *.scala file extensions
  2. 2.
    For SnowConvert for PySpark -> *.py or *.ipynb file extensions
You will also have to choose an OS. The Qualification Tool works with both Windows and Mac.
  1. 1.
    Click on the Download button, and your download will begin immediately.
  2. 2.
    When the form is submitted, your license information will be generated and emailed to you at the email address you specified in the form. Your license email will look something like this:
    Note that the email that is sent to you will have the source code language you specified in the form (Scala or Python). It will also tell you that your license properties are “Execution Mode: Assessment” for any license issued for the qualification tool. Any version of SnowConvert run for analysis purposes and not actual conversion will be run in this "assessment" mode.

Installation

SnowConvert can be installed on either of the following operating systems:
  • Windows
  • Mac
Follow the steps below to install the tool for your OS.

Windows Installation

  1. 1.
    Click on the downloaded .exe file.
  2. 2.
    On the initial setup screen, you will have to agree with the license agreement.
    (Note that you can read the entire End User License Agreement for SnowConvert on this documentation site.)
  3. 3.
    The installer sets up all the required files in your machine.
  4. 4.
    Once the installation finishes, the tool is ready to use.
    You can click "Finish" to start using SnowConvert.

Mac Installation

  1. 1.
    Click on the downloaded .dmg file.
  2. 2.
    Double click on the Mobilize.Net SnowConvert logo.

Initial Launch and Licensing

When you first use the tool, it will prompt you for a license. This license should have been emailed to you after you downloaded the tool. (For more information on licensing, review the licensing section of this documentation.)
To activate your license, follow these steps:
  1. 1.
    Run the downloaded Mobilize.Net SnowConvert executable program file.
  2. 2.
    When you launch SnowConvert for the first time, the license key is required.
  3. 3.
    Enter your License Key, then click on Activate button.
Note that the assessment license for qualification can be used to run an unlimited quantity of code through the tool.

Configuration and Settings

The following screen appears when you start Spark Qualification tool. (Some versions may vary slightly.)
The following are the configuration and settings options:

Notifications

Each time you start Spark Qualification Tool, it might show you one of several notifications in the top right corner of the screen. The most common types of notifications are listed below.

Update Available

We are constantly working to improve Spark Qualification Tool. If there is an available update, the system notifies you to download the latest version.
Follow these steps to update to the latest version:
  • If an update is available, an update button appears in the notification message. Press that button to download the newest version of Spark Qualification Tool.
  • The download will start automatically. New license information is not required.
  • Run the installer once the download is complete. To learn more about running the installer, go to the Installation section of this documentation.
  • Once the installation is complete, you can launch the updated version of the tool. When you launch this version of the Spark Qualification Tool, another notification will show in the top right of your screen indicating that an update took place. In this notification, you will have the option to view the release notes. If you select “View Release Notes”, SnowConvert will open your default browser and direct you to the Release Notes section SnowConvert’s documentation site.

Help Menu

The help menu can be found in the top left corner of the Qualification Tool screen.
When you choose the help menu, you will get the following options:

User Guide

By clicking on User Guide, SnowConvert will take you to the documentation site for the Spark Qualification Tool in your default browser.

License Information

The License Information option provides your license status and allows you to change or update your current license. It also lets you know whom the license is registered to, the execution mode for your license (assessment is the only option for the qualification tool), and when the license expires.

Check for updates

The Spark Qualification Tool should check for updates every time it starts up. You can manually select the Check for Updates option to check for updates and validate the current version. If an update is available, you will be given the option to download it. If no updates are available, you will get a brief message indicating that "you're up to date!"

Terms and Conditions

This option guides you to the Terms and Conditions page of the Snowflake website. If you're looking for the terms and conditions of use for this product, here it lies.

About SnowConvert

The final option provides basic information about the Spark Qualification Tool version you are currently running.
Alright... that’s a summary of the settings. Let's move on and take a look at using the tool and the Qualification analysis.

Using the Tool

Two notes before you start running code through the tool:
  • Keep in mind that this tool is looking for Spark code. You can run any code through the tool, but to get the most out of it, you will need to get Spark Scala or PySpark code in a directory where you want to point the tool. Consider also that the tool will be reading the code, so if there are multiple encodings present or any security on those files, the tool will not be able to process them.
  • SnowConvert for Spark collects information on your execution of the tool. The output reports (the file inventory, keyword inventory, and spark reference inventory) are what the tool collects and sends back to Snowflake. You can review these in the output folder. If you have any concerns about Snowflake's telemetry collection, please reach out to [email protected].
After all of that, how do you use Spark Qualification Tool? Follow the steps below. This will be your guide on the inputs you need to provide to the Spark Qualification Tool and the processes that the tool goes through internally.

Declaring the input and output folder

Choosing an input and output directory are the first steps to running the tool. A welcome screen is displayed when you first launch the SnowConvert for Spark Qualification Tool.
Follow the steps below:
  1. 1.
    Click "Let's Begin" in the middle of the screen.
  2. 2.
    On the following screen, select the input folder.
    This can be any input folder with files that you want to run through the tool. What code files can you run through the tool? You can run any files, but for the Spark Reference inventory, you will need a file with the following extensions: For SnowConvert for Spark Scala -> *.scala file extensions For SnowConvert for PySpark -> *.py and *.ipynb file extensions The Spark Qualification Tool will work better if you include as much of the source code as possible in the input folder. You can either click on the “Browse” button or type the path manually.
  3. 3.
    Select the output folder. The output folder is where Spark Qualification Tool places the logs and output code from the qualification process.
NOTE: for both the input and output folder, the program validates the input and the output directories that you've selected. The validation checks (1) to determine the files in the input directory that can be analyzed, (2) whether the output directory already has files in it (if this is true, you will get a message asking if you’d still like to proceed with the analysis when you hit “Start Analysis”), and (3) to determine if both the input and output file paths are valid.
Once you have declared both an input and output folder, you can select Start Analysis.

Start the Analysis

When you begin your analysis, the tool will work through two different states that will be shown on the screen: Rapid Scanner and the SparkSnowCodeProcessor (this name may vary based on source code language).
Those states are described below:
  • Rapid Scanner: The files are scanned to make a file inventory and a keyword count (these will be described later in the Evaluating the Output section)
  • SparkSnowCodeProcessor: This analysis process produces some metadata about the loaded files, including the size and quantity of the files reported on the screen. While the identification process is taking place, the Qualification tool builds a complete Abstract Syntax Tree (AST) and symbol table to properly analyze each spark reference present in the source codebase.
During each stage, the section depicting each stage on the screen will change color to indicate the process currently taking place. These processes can be described as:
  • Loading - The loading icon shows that the state is executing.
  • Success - The check icon shows that the state was successfully executed.
  • Error - A red stage with an X icon indicates that something went wrong while executing that specific task.

Qualification Summary

Once the qualification has finished, click the Next button.
The following screen will display a summary of statistics about the analysis completed.
The summary will provide a high-level view of the analysis, including the following:
  • Total files: The number of files that the Spark Qualification Tool processed during this analysis.
  • Total bytes: The size (in bytes) of all the input files.
  • Total issues: The number of warnings and errors generated by the Spark Qualification Tool while analyzing this set of files. Learn more about issues and errors by visiting our page on Issues and Troubleshooting.
  • Analysis Time: The actual time it took to analyze these files.
  • Analysis Summary: Total units: The total number of lines of code (LOC) in the source code files.
After this basic summary, there are a few options for examining the output:
  • View Logs: Clicking on View Logs takes you to the folder containing the log files. The logs are text files that contain time-stamped entries that illustrate the process that Spark Qualification Tool has gone through. If there is a critical error, the logs are the best place to troubleshoot what went wrong, and exactly when it went wrong.
  • View Reports: The View Report button takes you directly to the report folder containing the analyzed results.
  • New execution: This button closes and restarts Spark Qualification Tool. The tool needs to reset between runs, so this is required if you want to re-run Spark SnowConvert, even on the same workload. You can also start over by closing Spark Qualification Tool and restarting it from your desktop.

Errors and Troubleshooting

This section displays errors that may show up when running the Spark Qualification Tool application. This section does not refer to errors generated by the tool when parsing the source code.
The most common types of errors are listed below.

Input and output path errors

On declaring the input and output folder, you must declare an input and output folder for the analysis. You could receive an error message below the pathway if there's an error with the chosen input/output folder:
The following list contains the possible errors that could be shown:
  • Please enter an input path If you type an invalid file path or don't browse to a folder in either the input or output directory, you get a short error in the window asking you to "Please enter an input/output path."
  • Input folder must contain the correct extensions If you choose a folder that doesn't have any files that Spark Qualification Tool can analyze, then you get the following error message: " There are no files with a known extension in that path." Resolution of this error requires you to choose a directory containing files with known extension.
  • The path must have the valid format (Can't contain / : * ? " < > |). Choosing a pathway containing an invalid character gets the following error message: "The path has incorrect syntax." Choose a folder with a valid file path that doesn't contain invalid characters. Invalid character includes: / : * ? " < > |

Output folder isn’t empty

A warning will appear when you execute Spark Qualification Tool, and the output folder is not empty. It does not come up when you attempt to write an input/output folder name but rather when you click "Start Conversion." The warning looks like this:
Clicking on the No, Cancel button will not proceed with the analysis. You can select another folder for the output. If you choose Yes, Proceed, the tool will start the analysis with the risk of potentially overwriting certain files. If the output of the Spark Qualification Tool writes to a filename that is already present in the output folder, the file that is already there will be overwritten.
NOTE: Folders located in network paths are not supported. The folder must be local to the machine running the Spark Qualification Tool.

System Errors

If the system crashes, a "Something went wrong" message displays. If the tool encounters an error during the Qualification process, an error will be generated, and you will not be able to go to the next screen.
You can try the conversion again, reach out to [email protected] for additional support. Also you have the option to send an error report email to us.

Evaluating the Output

There are several output reports generated by the qualification tool, but there are also log files generated. All the output is organized into the following folders in the output directory from the tool:
Let’s review each of these folders:

Inventories

The inventories folder contains several .pam files. These are text files that contain information that is output from the tool. These files also contain the information used to build the output reports. Currently, the tool outputs the following .pam files:
  • files.pam This file has information related to the files and filetypes included in the analyzed codebase.
  • line_counts.pam This file connects the file counts with lines of code present in each file.
  • PreConversionAssessmentInventory.pam This contains the spark references. The information here is used to build the Spark References Inventory in the Reports folder.
  • tool_execution.pam Basic information about the run of the toolincluding the tool version and status.
  • word_counts.pam This is related to the identification and count of keywords associated with a given technology. This information is used to build the keyword count inventory.

Logs

The log files contain the details on the tool’s execution. Any processes, validation, or tasks run by the tool will be printed in the log files. There are three log files likely to print when running the Qualification Tool:
  • Generic Infrastructure Controller Log This is a log file related to the execution of the tool such as validation of the license and session information.
  • Generic Scanner Log This just logs a session id and execution id.
  • Python [or Scala] Converter Log This log writes information about the running of the tool such as processes and tasks executed by the qualification tool.

Reports

This is the key result of running SnowConvert in assessment mode. The report files will be generated into this folder, which is accessible directly from the final screen by clicking on the “View Reports” button in the middle of the screen:
The following reports will be generated in the reports folder:
  • FilesInventory.csv The files inventory contains the following information about each file scanned:
    • Filename – the name of the file.
    • File Extension – the extension of the file.
    • Technology – the technology is the source language present in that file. For files under consideration by the qualification tool, Python and Scala would be examples of the technology value. If a source file is not a technology the tool recognizes, it will be classified as Other.
    • Status – if the file was successfully processed, the status value will be “OK”. Any other value indicates the file was not processed correctly.
    • isBinary – if the file is a binary file, this will read TRUE or FALSE. Binary files cannot be read by the tool. If the technology is classified as Other, the column will read as Unknown.
    • Bytes – the size of the file in bytes.
    • ContentType – For code files, this will read Code. If the technology is classified as Other, this category will also report Other.
    • ContentLines – The total lines of code present in the file.
    • CommentLines – The total lines of comment code present in the file.
    • BlankLines – The count of blank lines present in the file.
  • KeywordCounts.csv The keyword counts are counts of recognized keywords present in each file. The fields present in this csv:
    • File – the filename.
    • Technology – the technology is the source language present in that file. For files under consideration by the qualification tool, Python and Scala would be examples of the technology value. If a source file is not a technology the tool recognizes, it will be classified as Other.
    • Keyword – the actual keyword present in the file that is recognized by the tool. The tool does not tokenize every word in the file.
    • Count – The count of that specific keyword in that file.
  • SparkReferenceInventory.csv This is the report containing all the references to the Spark API present in the scanned codebase. The fields present in this csv:
    • Element – This is the specific element of the spark API that was counted.
    • ProjectID – The root folder name. However, in some versions of the tool, this will be blank.
    • FileID – The file where this reference occurred
    • Count – The count of the element in the source file
    • ClassName – The class where this element exists. This is more relevant to Scala code than Python
    • Kind – The category that this element belongs to. This could be classified as a module, a class, or a function
    • Line – The line number where this element occurs in the source file
    • PackageName – The name of the package where this element exists

Next Steps

The information in the Spark Reference Inventory can be used to build an automation or readiness score. This score will allow you to understand how much of your workload can be automatically migrated to Snowflake using the Snowpark API. This readiness score is available upon request. Please reach out to [email protected] or contact Pablo Navarro at [email protected] for more information. You may need to submit your output Spark Reference Inventory report to build the readiness score, but we will contact you to follow up.
And that will wrap up how we use the SnowConvert for Spark as a Qualification Tool.