Code Extraction
How do you get the code?
The Snowpark Migrator Accelerator (SMA) takes files in a directory as input. You can have any number of files with any extension. Each file will be counted in a file inventory, but only files with specific extensions will be scanned for references to the Spark API.
So how do you populate this directory?
If you already have code files, then that's straightforward. Take any and all code files relevant to your codebase and put them into a directory.
If you have notebooks as part of an existing environment (such as Databricks), you may benefit from an extraction script.
Extraction Scripts
Snowflake supports some external extraction scripts that are made publicly available on the Snowflake Labs GitHub page. Specific to Spark, the following platforms are currently supported.
Databricks
If you have a Jupyter (.ipynb) or Databricks (.dbc) notebook that runs in Databricks, you do not need to perform any extraction. Those can be put into a directory, and the SMA will properly analyze them. For more details on exporting Databricks notebook files, refer to the Databricks documentation at this link: https://docs.databricks.com/en/notebooks/notebook-export-import.html#export-notebooks.
You could also follow the instructions and utilize the scripts posted here: https://github.com/Snowflake-Labs/SC.DDLExportScripts/tree/main/Databricks.
More to come on extraction!
Last updated