Supported Filetypes
What will the SMA accept as an input?
Last updated
What will the SMA accept as an input?
Last updated
The Snowpark Migration Accelerator (SMA) will scan the files in the specified source directory on the Project Creation page, with a few depending on the file type. The SMA's output includes a basic file summary that reports the number of each file extension found.
However, only specific file extensions will be searched for references to the Spark API, SQL Statements, and any other contributing factor to one of generated by the tool. These files could be code files or notebooks, and they can be in any directory or subdirectory to be analyzed by the SMA.
The following file types are scanned for references to the Spark API and other third party APIs:
*.scala
*.py
*.python
SQL elements can also be identified in certain files. The following file types will be searched for SQL statements in Spark SQL or HiveQL code:
*.sql
.*hql
Both the Spark Scala and PySpark parsers in the SMA will search Jupyter Notebook files and exported Databricks files when included in the source code directory.
*.ipynb
*.dbc
These notebook files will be searched for references to the Spark API, any other third party API, and SQL Statements based on the cell type of each cell in the notebook. Note that notebooks can contain any combination of SQL, Python, or Scala cells. in the output of the SMA.
There are files and folders that are excluded from the scanning, mostly specific projects configurations files and folders.
Pip
Dist
venv
site-packages
input.wsp
.DS_Store
build.gradle
build.sbt
pom.xml
storage.lck