Supported Filetypes
What will the SMA accept as an input?
The Snowpark Migration Accelerator (SMA) will scan the files in the specified source directory on the Project Creation page, with a few exclusions depending on the file type. The SMA's output includes a basic file summary that reports the number of each file extension found.
However, only specific file extensions will be searched for references to the Spark API, SQL Statements, and any other contributing factor to one of the Readiness Scores generated by the tool. These files could be code files or notebooks, and they can be in any directory or subdirectory to be analyzed by the SMA.
Code Files
The following file types are scanned for references to the Spark API and other third party APIs:
*.scala
*.py
*.python
SQL elements can also be identified in certain files. The following file types will be searched for SQL statements in Spark SQL or HiveQL code:
*.sql
.*hql
Notebooks
Both the Spark Scala and PySpark parsers in the SMA will search Jupyter Notebook files and exported Databricks files when included in the source code directory.
*.ipynb
*.dbc
These notebook files will be searched for references to the Spark API, any other third party API, and SQL Statements based on the cell type of each cell in the notebook. Note that notebooks can contain any combination of SQL, Python, or Scala cells. Each cell type will be inventoried in the output of the SMA.
Excluded Files and folders
There are files and folders that are excluded from the scanning, mostly specific projects configurations files and folders.
Folders type excluded from the scanning:
Pip
Dist
venv
site-packages
Files type excluded from the scanning:
input.wsp
.DS_Store
build.gradle
build.sbt
pom.xml
storage.lck
Last updated