Supported Filetypes

What will the SMA accept as an input?

The Snowpark Migration Accelerator (SMA) will scan the files in the specified source directory on the Project Creation page, with a few exclusions depending on the file type. The SMA's output includes a basic file summary that reports the number of each file extension found.

However, only specific file extensions will be searched for references to the Spark API, SQL Statements, and any other contributing factor to one of the Readiness Scores generated by the tool. These files could be code files or notebooks, and they can be in any directory or subdirectory to be analyzed by the SMA.

Code Files

The following file types are scanned for references to the Spark API and other third party APIs:

  • *.scala

  • *.py

  • *.python

SQL elements can also be identified in certain files. The following file types will be searched for SQL statements in Spark SQL or HiveQL code:

  • *.sql

  • .*hql

Notebooks

Both the Spark Scala and PySpark parsers in the SMA will search Jupyter Notebook files and exported Databricks files when included in the source code directory.

  • *.ipynb

  • *.dbc

These notebook files will be searched for references to the Spark API, any other third party API, and SQL Statements based on the cell type of each cell in the notebook. Note that notebooks can contain any combination of SQL, Python, or Scala cells. Each cell type will be inventoried in the output of the SMA.

Excluded Files and folders

There are files and folders that are excluded from the scanning, mostly specific projects configurations files and folders.

Folders type excluded from the scanning:

  • Pip

  • Dist

  • venv

  • site-packages

Files type excluded from the scanning:

  • input.wsp

  • .DS_Store

  • build.gradle

  • build.sbt

  • pom.xml

  • storage.lck

Last updated