Release Notes
Release Notes for the Snowpark Migration Accelerator (SMA)
Note that the release notes below are organized by release date. Version numbers for both the application and the conversion core will appear below.
November 14th, 2024
Application & CLI Version 2.3.1
Included SMA Core Versions
Snowpark Conversion Core 4.12.0
Desktop App
Fixed
Fix case-sensitive issues in --sql options.
Removed
Remove platform name from show-ac message.
Snowpark Conversion Core 4.12.0
Added
Added support for Snowpark Python 1.23.0 and 1.24.0.
Added a new EWI for the
pyspark.sql.dataframe.DataFrame.writeTo
function. All the usages of this function will now have the EWI SPRKPY1087.
Changed
Updated the documentation of the Scala EWIs from
SPRKSCL1137
toSPRKSCL1156
to align with a standardized format, ensuring consistency and clarity across all the EWIs.Updated the documentation of the Scala EWIs from
SPRKSCL1117
toSPRKSCL1136
to align with a standardized format, ensuring consistency and clarity across all the EWIs.Updated the message that is shown for the following EWIs:
Updated the documentation of the Scala EWIs from
SPRKSCL1100
toSPRKSCL1105
, fromSPRKSCL1108
toSPRKSCL1116
; fromSPRKSCL1157
toSPRKSCL1175
; to align with a standardized format, ensuring consistency and clarity across all the EWIs.Updated the mapping status of the following PySpark elements from NotSupported to Direct with EWI:
pyspark.sql.readwriter.DataFrameWriter.option
=>snowflake.snowpark.DataFrameWriter.option
: All the usages of this function now have the EWI SPRKPY1088pyspark.sql.readwriter.DataFrameWriter.options
=>snowflake.snowpark.DataFrameWriter.options
: All the usages of this function now have the EWI SPRKPY1089
Updated the mapping status of the following PySpark elements from Workaround to Rename:
pyspark.sql.readwriter.DataFrameWriter.partitionBy
=>snowflake.snowpark.DataFrameWriter.partition_by
Updated EWI documentation: SPRKSCL1000, SPRKSCL1001, SPRKSCL1002, SPRKSCL1100, SPRKSCL1101, SPRKSCL1102, SPRKSCL1103, SPRKSCL1104, SPRKSCL1105.
Removed
Removed the
pyspark.sql.dataframe.DataFrameStatFunctions.writeTo
element from the conversion status, this element does not exist.
Deprecated
Deprecated the following EWI codes:
October 30th, 2024
Application & CLI Version 2.3.0
Snowpark Conversion Core 4.11.0
Snowpark Conversion Core 4.11.0
Added
Added a new column called
Url
to theIssues.csv
file, which redirects to the corresponding EWI documentation.Added new EWIs for the following Spark elements:
[SPRKPY1082] pyspark.sql.readwriter.DataFrameReader.load
[SPRKPY1083] pyspark.sql.readwriter.DataFrameWriter.save
[SPRKPY1084] pyspark.sql.readwriter.DataFrameWriter.option
[SPRKPY1085] pyspark.ml.feature.VectorAssembler
[SPRKPY1086] pyspark.ml.linalg.VectorUDT
Added 38 new Pandas elements:
pandas.core.frame.DataFrame.select
andas.core.frame.DataFrame.str
pandas.core.frame.DataFrame.str.replace
pandas.core.frame.DataFrame.str.upper
pandas.core.frame.DataFrame.to_list
pandas.core.frame.DataFrame.tolist
pandas.core.frame.DataFrame.unique
pandas.core.frame.DataFrame.values.tolist
pandas.core.frame.DataFrame.withColumn
pandas.core.groupby.generic._SeriesGroupByScalar
pandas.core.groupby.generic._SeriesGroupByScalar[S1].agg
pandas.core.groupby.generic._SeriesGroupByScalar[S1].aggregate
pandas.core.indexes.datetimes.DatetimeIndex.year
pandas.core.series.Series.columns
pandas.core.tools.datetimes.to_datetime.date
pandas.core.tools.datetimes.to_datetime.dt.strftime
pandas.core.tools.datetimes.to_datetime.strftime
pandas.io.parsers.readers.TextFileReader.apply
pandas.io.parsers.readers.TextFileReader.astype
pandas.io.parsers.readers.TextFileReader.columns
pandas.io.parsers.readers.TextFileReader.copy
pandas.io.parsers.readers.TextFileReader.drop
pandas.io.parsers.readers.TextFileReader.drop_duplicates
pandas.io.parsers.readers.TextFileReader.fillna
pandas.io.parsers.readers.TextFileReader.groupby
pandas.io.parsers.readers.TextFileReader.head
pandas.io.parsers.readers.TextFileReader.iloc
pandas.io.parsers.readers.TextFileReader.isin
pandas.io.parsers.readers.TextFileReader.iterrows
pandas.io.parsers.readers.TextFileReader.loc
pandas.io.parsers.readers.TextFileReader.merge
pandas.io.parsers.readers.TextFileReader.rename
pandas.io.parsers.readers.TextFileReader.shape
pandas.io.parsers.readers.TextFileReader.to_csv
pandas.io.parsers.readers.TextFileReader.to_excel
pandas.io.parsers.readers.TextFileReader.unique
pandas.io.parsers.readers.TextFileReader.values
pandas.tseries.offsets
October 24th, 2024
Application Version 2.2.3
Included SMA Core Versions
Snowpark Conversion Core 4.10.0
Desktop App
Fixed
Fixed a bug that caused the SMA to show the label SnowConvert instead of Snowpark Migration Accelerator in the menu bar of the Windows version.
Fixed a bug that caused the SMA to crash when it did not have read and write permissions to the
.config
directory in macOS and theAppData
directory in Windows.
Command Line Interface
Changed
Renamed the CLI executable name from
snowct
tosma
.Removed the source language argument so you no longer need to specify if you are running a Python or Scala assessment / conversion.
Expanded the command line arguments supported by the CLI by adding the following new arguments:
--enableJupyter
|-j
: Flag to indicate if the conversion of Databricks notebooks to Jupyter is enabled or not.--sql
|-f
: Database engine syntax to be used when a SQL command is detected.--customerEmail
|-e
: Configure the customer email.--customerCompany
|-c
: Configure the customer company.--projectName
|-p
: Configure the customer project.
Updated some texts to reflect the correct name of the application, ensuring consistency and clarity in all the messages.
Updated the terms of use of the application.
Updated and expanded the documentation of the CLI to reflect the latests features, enhancements and changes.
Updated the text that is shown before proceeding with the execution of the SMA to improve
Updated the CLI to accept “Yes” as a valid argument when prompting for user confirmation.
Allowed the CLI to continue the execution without waiting for user interaction by specifying the argument
-y
or--yes
.Updated the help information of the
--sql
argument to show the values that this argument expects.
Snowpark Conversion Core Version 4.10.0
Added
Added a new EWI for the
pyspark.sql.readwriter.DataFrameWriter.partitionBy
function. All the usages of this function will now have the EWI SPRKPY1081.Added a new column called
Technology
to theImportUsagesInventory.csv
file.
Changed
Updated the Third-Party Libraries readiness score to also take into account the
Unknown
libraries.Updated the
AssessmentFiles.zip
file to include.json
files instead of.pam
files.Improved the CSV to JSON conversion mechanism to make processing of inventories more performant.
Improved the documentation of the following EWIs:
Updated the mapping status of the following Spark Scala elements from
Direct
toRename
.org.apache.spark.sql.functions.shiftLeft
=>com.snowflake.snowpark.functions.shiftleft
org.apache.spark.sql.functions.shiftRight
=>com.snowflake.snowpark.functions.shiftright
Updated the mapping status of the following Spark Scala elements from
Not Supported
toDirect
.org.apache.spark.sql.functions.shiftleft
=>com.snowflake.snowpark.functions.shiftleft
org.apache.spark.sql.functions.shiftright
=>com.snowflake.snowpark.functions.shiftright
Fixed
Fixed a bug that caused the SMA to incorrectly populate the
Origin
column of theImportUsagesInventory.csv
file.Fixed a bug that caused the SMA to not classify imports of the libraries
io
,json
,logging
andunittest
as Python built-in imports in theImportUsagesInventory.csv
file and in theDetailedReport.docx
file.
October 11th, 2024
Application Version 2.2.2
Features Updates include:
Snowpark Conversion Core 4.8.0
Snowpark Conversion Core Version 4.8.0
Added
Added
EwiCatalog.csv
and .md files to reorganize documentationAdded the mapping status of
pyspark.sql.functions.ln
Direct.Added a transformation for
pyspark.context.SparkContext.getOrCreate
Please check the EWI SPRKPY1080 for further details.
Added an improvement for the SymbolTable, infer type for parameters in functions.
Added SymbolTable supports static methods and do not assume the first parameter will be self for them.
Changed
Updated the mapping status of:
pyspark.sql.functions.array_remove
fromNotSupported
toDirect
.
Fixed
Fixed the Code File Sizing table in the Detail Report to exclude .sql and .hql files and added the Extra Large row in the table.
Fixed missing the
update_query_tag
whenSparkSession
is defined into multiple lines onPython
.Fixed missing the
update_query_tag
whenSparkSession
is defined into multiple lines onScala
.Fixed missing EWI
SPRKHVSQL1001
to some SQL statements with parsing errors.Fixed keep new lines values inside string literals
Fixed the Total Lines of code showed in the File Type Summary Table
Fixed Parsing Score showed as 0 when recognize files successfully
Fixed LOC count in the cell inventory for Databricks Magic SQL Cells
September 26th, 2024
Application Version 2.2.0
Feature Updates include:
Snowpark Conversion Core 4.6.0
Snowpark Conversion Core Version 4.6.0
Added
Add transformation for
pyspark.sql.readwriter.DataFrameReader.parquet
.Add transformation for
pyspark.sql.readwriter.DataFrameReader.option
when it is a Parquet method.
Changed
Updated the mapping status of:
pyspark.sql.types.StructType.fields
fromNotSupported
toDirect
.pyspark.sql.types.StructType.names
fromNotSupported
toDirect
.pyspark.context.SparkContext.setLogLevel
fromWorkaround
toTransformation
.More detail can be found in EWIs SPRKPY1078 and SPRKPY1079
org.apache.spark.sql.functions.round
fromWorkAround
toDirect
.org.apache.spark.sql.functions.udf
fromNotDefined
toTransformation
.More detail can be found in EWIs SPRKSCL1174 and SPRKSCL1175
Updated the mapping status of the following Spark elements from
DirectHelper
toDirect
:org.apache.spark.sql.functions.hex
org.apache.spark.sql.functions.unhex
org.apache.spark.sql.functions.shiftleft
org.apache.spark.sql.functions.shiftright
org.apache.spark.sql.functions.reverse
org.apache.spark.sql.functions.isnull
org.apache.spark.sql.functions.unix_timestamp
org.apache.spark.sql.functions.randn
org.apache.spark.sql.functions.signum
org.apache.spark.sql.functions.sign
org.apache.spark.sql.functions.collect_list
org.apache.spark.sql.functions.log10
org.apache.spark.sql.functions.log1p
org.apache.spark.sql.functions.base64
org.apache.spark.sql.functions.unbase64
org.apache.spark.sql.functions.regexp_extract
org.apache.spark.sql.functions.expr
org.apache.spark.sql.functions.date_format
org.apache.spark.sql.functions.desc
org.apache.spark.sql.functions.asc
org.apache.spark.sql.functions.size
org.apache.spark.sql.functions.locate
org.apache.spark.sql.functions.ntile
Fixed
Fixed value showed in the Percentage of total Pandas Api
Fixed Total percentage on ImportCalls table in the DetailReport
Deprecated
Deprecated the following EWI code:
September 12th, 2024
Application Version 2.1.7
Feature Updates include:
Snowpark Conversion Core 4.5.7
Snowpark Conversion Core 4.5.2
Snowpark Conversion Core Version 4.5.7
Hotfixed
Fixed Total row added on Spark Usages Summaries when there are not usages
Bumped of Python Assembly to Version=
1.3.111
Parse trail comma in multiline arguments
Snowpark Conversion Core Version 4.5.2
Added
Added transformation for
pyspark.sql.readwriter.DataFrameReader.option
:When the chain is from a CSV method call.
When the chain is from a JSON method call.
Added transformation for
pyspark.sql.readwriter.DataFrameReader.json
.
Changed
Executed SMA on SQL strings passed to Python/Scala functions
Create AST in Scala/Python to emit temporary SQL unit
Create SqlEmbeddedUsages.csv inventory
Deprecate SqlStatementsInventroy.csv and SqlExtractionInventory.csv
Integrate EWI when the SQL literal could not be processed
Create new task to process SQL-embedded code
Collect info for SqlEmbeddedUsages.csv inventory in Python
Replace SQL transformed code to Literal in Python
Update test cases after implementation
Create table, views for telemetry in SqlEmbeddedUsages inventory
Collect info for SqlEmbeddedUsages.csv report in Scala
Replace SQL transformed code to Literal in Scala
Check line number order for Embedded SQL reporting
Filled the
SqlFunctionsInfo.csv
with the SQL functions documented for SparkSQL and HiveSQLUpdated the mapping status for:
org.apache.spark.sql.SparkSession.sparkContext
from NotSupported to Transformation.org.apache.spark.sql.Builder.config
fromNotSupported
toTransformation
. With this new mapping status, the SMA will remove all the usages of this function from the source code.
September 5th, 2024
Application Version 2.1.6
Hotfix change for Snowpark Engines Core version 4.5.1
Spark Conversion Core Version 4.5.1
Hotfix
Added a mechanism to convert the temporal Databricks notebooks generated by SMA in exported Databricks notebooks
August 29th, 2024
Application Version 2.1.5
Feature Updates include:
Updated Spark Conversion Core: 4.3.2
Spark Conversion Core Version 4.3.2
Added
Added the mechanism (via decoration) to get the line and the column of the elements identified in notebooks cells
Added an EWI for pyspark.sql.functions.from_json.
Added a transformation for pyspark.sql.readwriter.DataFrameReader.csv.
Enabled the query tag mechanism for Scala files.
Added the Code Analysis Score and additional links to the Detailed Report.
Added a column called OriginFilePath to InputFilesInventory.csv
Changed
Updated the mapping status of pyspark.sql.functions.from_json from Not Supported to Transformation.
Updated the mapping status of the following Spark elements from Workaround to Direct:
org.apache.spark.sql.functions.countDistinct
org.apache.spark.sql.functions.max
org.apache.spark.sql.functions.min
org.apache.spark.sql.functions.mean
Deprecated
Deprecated the following EWI codes:
SPRKSCL1135
SPRKSCL1136
SPRKSCL1153
SPRKSCL1155
Fixed
Fixed a bug that caused an incorrect calculation of the Spark API score.
Fixed an error that avoid copy SQL empty or commented files in the output folder.
Fixed a bug in the DetailedReport, the notebook stats LOC and Cell count is not accurate.
August 14th, 2024
Application Version 2.1.2
Feature Updates include:
Updated Spark Conversion Core: 4.2.0
Spark Conversion Core Version 4.2.0
Added
Add technology column to SparkUsagesInventory
Added an EWI for not defined SQL elements .
Added SqlFunctions Inventory
Collect info for SqlFunctions Inventory
Changed
The engine now processes and prints partially parsed Python files instead of leaving original file without modifications.
Python notebook cells that have parsing errors will also be processed and printed.
Fixed
Fixed
pandas.core.indexes.datetimes.DatetimeIndex.strftime
was being reported wrongly.Fix mismatch between SQL readiness score and SQL Usages by Support Status.
Fixed a bug that caused the SMA to report
pandas.core.series.Series.empty
with an incorrect mapping status.Fix mismatch between Spark API Usages Ready for Conversion in DetailedReport.docx is different than UsagesReadyForConversion row in Assessment.json.
August 8th, 2024
Application Version 2.1.1
Feature Updates include:
Updated Spark Conversion Core: 4.1.0
Spark Conversion Core Version 4.1.0
Added
Added the following information to the
AssessmentReport.json
fileThe third-party libraries readiness score.
The number of third-party library calls that were identified.
The number of third-party library calls that are supported in Snowpark.
The color code associated with the third-party readiness score, the Spark API readiness score, and the SQL readiness score.
Transformed
SqlSimpleDataType
in Spark create tables.Added the mapping of
pyspark.sql.functions.get
as direct.Added the mapping of
pyspark.sql.functions.to_varchar
as direct.As part of the changes after unification, the tool now generates an execution info file in the Engine.
Added a replacer for
pyspark.sql.SparkSession.builder.appName
.
Changed
Updated the mapping status for the following Spark elements
From Not Supported to Direct mapping:
pyspark.sql.functions.sign
pyspark.sql.functions.signum
Changed the Notebook Cells Inventory report to indicate the kind of content for every cell in the column Element
Added a
SCALA_READINESS_SCORE
column that reports the readiness score as related only to references to the Spark API in Scala files.Partial support to transform table properties in
ALTER TABLE
andALTER VIEW
Updated the conversion status of the node
SqlSimpleDataType
from Pending to Transformation in Spark create tablesUpdated the version of the Snowpark Scala API supported by the SMA from
1.7.0
to1.12.1
:Updated the mapping status of:
org.apache.spark.sql.SparkSession.getOrCreate
from Rename to Directorg.apache.spark.sql.functions.sum
from Workaround to Direct
Updated the version of the Snowpark Python API supported by the SMA from
1.15.0
to1.20.0
:Updated the mapping status of:
pyspark.sql.functions.arrays_zip
from Not Supported to Direct
Updated the mapping status for the following Pandas elements:
Direct mappings:
pandas.core.frame.DataFrame.any
pandas.core.frame.DataFrame.applymap
Updated the mapping status for the following Pandas elements:
From Not Supported to Direct mapping:
pandas.core.frame.DataFrame.groupby
pandas.core.frame.DataFrame.index
pandas.core.frame.DataFrame.T
pandas.core.frame.DataFrame.to_dict
From Not Supported to Rename mapping:
pandas.core.frame.DataFrame.map
Updated the mapping status for the following Pandas elements:
Direct mappings:
pandas.core.frame.DataFrame.where
pandas.core.groupby.generic.SeriesGroupBy.agg
pandas.core.groupby.generic.SeriesGroupBy.aggregate
pandas.core.groupby.generic.DataFrameGroupBy.agg
pandas.core.groupby.generic.DataFrameGroupBy.aggregate
pandas.core.groupby.generic.DataFrameGroupBy.apply
Not Supported mappings:
pandas.core.frame.DataFrame.to_parquet
pandas.core.generic.NDFrame.to_csv
pandas.core.generic.NDFrame.to_excel
pandas.core.generic.NDFrame.to_sql
Updated the mapping status for the following Pandas elements:
Direct mappings:
pandas.core.series.Series.empty
pandas.core.series.Series.apply
pandas.core.reshape.tile.qcut
Direct mappings with EWI:
pandas.core.series.Series.fillna
pandas.core.series.Series.astype
pandas.core.reshape.melt.melt
pandas.core.reshape.tile.cut
pandas.core.reshape.pivot.pivot_table
Updated the mapping status for the following Pandas elements:
Direct mappings:
pandas.core.series.Series.dt
pandas.core.series.Series.groupby
pandas.core.series.Series.loc
pandas.core.series.Series.shape
pandas.core.tools.datetimes.to_datetime
pandas.io.excel._base.ExcelFile
Not Supported mappings:
pandas.core.series.Series.dt.strftime
Updated the mapping status for the following Pandas elements:
From Not Supported to Direct mapping:
pandas.io.parquet.read_parquet
pandas.io.parsers.readers.read_csv
Updated the mapping status for the following Pandas elements:
From Not Supported to Direct mapping:
pandas.io.pickle.read_pickle
pandas.io.sql.read_sql
pandas.io.sql.read_sql_query
Updated the description of Understanding the SQL Readiness Score.
Updated
PyProgramCollector
to collect the packages and populate the current packages inventory with data from Python source code.Updated the mapping status of
pyspark.sql.SparkSession.builder.appName
from Rename to Transformation.Removed the following Scala integration tests:
AssesmentReportTest_AssessmentMode.ValidateReports_AssessmentMode
AssessmentReportTest_PythonAndScala_Files.ValidateReports_PythonAndScala
AssessmentReportTestWithoutSparkUsages.ValidateReports_WithoutSparkUsages
Updated the mapping status of
pandas.core.generic.NDFrame.shape
from Not Supported to Direct.Updated the mapping status of
pandas.core.series
from Not Supported to Direct.
Deprecated
Deprecated the EWI code
SPRKSCL1160
sinceorg.apache.spark.sql.functions.sum
is now a direct mapping.
Fixed
Fixed a bug by not supporting Custom Magics without arguments in Jupyter Notebook cells.
Fixed incorrect generation of EWIs in the issues.csv report when parsing errors occur.
Fixed a bug that caused the SMA not to process the Databricks exported notebook as Databricks notebooks.
Fixed a stack overflow error while processing clashing type names of declarations created inside package objects.
Fixed the processing of complex lambda type names involving generics, e.g.,
def func[X,Y](f: (Map[Option[X], Y] => Map[Y, X]))...
Fixed a bug that caused the SMA to add a PySpark EWI code instead of a Pandas EWI code to the Pandas elements that are not yet recognized.
Fixed a typo in the detailed report template: renaming a column from "Percentage of all Python Files" to "Percentage of all files".
Fixed a bug where
pandas.core.series.Series.shape
was wrongly reported.
July 19th, 2024
Application Version 2.1.0
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 3.2.0
Spark Conversion Core Version 3.2.0
Changed
New Readiness Score for SQL in the results screen
Settings were added to the desktop application to enable or disable Pandas to Snowpark Pandas API conversion
July 11th, 2024
Application Version 2.0.2
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 3.0.0
Spark Conversion Core Version 3.0.0
Breaking Changes
This new version includes major changes:
New Download links, previous version won’t be autoupdate.
A single Access is required for Python, Scala and SparkSQL. Previous access codes for Python will continue working. However, Scala ones won’t work anymore. You need to request a new access code.
No need to select language to analyze them.
After executing the tool, you won’t received the email
Snowpark Qualification Report
. As the report information is available locally to the user.
Removed
Unify Python/Scala conversion tools.
Remove Select Source from Inquire form.
Remove Select Source from NewProject/SeeSample project.
Remove table generation from SMA.Desktop.
Changed
Unify Python/Scala conversion tools.
Update to remove Python and Scala Conversion Core Version and have just an Engine Conversion Core Version.
Update results screen.
Access Code toast has information related to the Source Language.
The summary Report screen has references to the Source Language.
Input folder path validation is displaying the wrong message.
Deprecate Scala licenses so that it only uses Python access code.
June 27, 2024
Application Version 1.3.1
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.48.0
Spark Conversion Core Version 2.48.0
Added
Improved the parsing recovery mechanism for Scala files and Scala notebook cells to orphan fewer lines of code
Added support for
HADOOP
shell command related to HiveQLAdded support for
HDFS
shell command related to HiveQLAdded support for
TBLPROPERTIES
inALTER VIEW
statementsUpdated the conversion status for SQL nodes in HiveQL that doesn't need conversion
Updated the conversion status for SQL nodes in SparkSQL that doesn't need conversion
The SQL nodes without a migration status were updated to
PENDING
Improved the Jupyter parser to support as parameters the filename and the package name
Fixed
Fixed a bug that caused the SMA to not show the readiness score even though there were uses of the Spark API
Fixed a bug that caused the EWI
SPRKSCL1000
to show a wrong description in the issue list table of the detailed reportFixed the parsing of
Comment
clauses in SQL Statements with new linesFixed the parsing of statements after a
Lateral View
clause in HiveQL
June 13, 2024
Application Version 1.3.0
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.47.0
Spark Conversion Core Version 2.47.0
Added
Added transformation for Hive Table Comment.
Added transformation for adding or replace Comment on Create View, Create table and Create function.
Added tag to comments for
CREATE FUNCTION
nodes.Removed the generation of the
conversion_rates.csv
,files.csv
, andparse_errors.csv
inventories.
Fixed
Fixed DotNames (such as in this example:
select * from id.12id.12_id3
) which starts with numbers.Parsed and refactored Comment Clause in the Create View.
Fixed missing columns on empty inventories.
May 30, 2024
Application Version 1.2.5
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.46.0
Spark Conversion Core Version 2.46.0
Added
Added a parsing score indicator that shows the percentage of all the files that were successfully parsed .
Added SPRKPY1074 EWI for mixed indentation errors.
Updates to the Detailed Report
Updated the look and feel of the report for both Python and Scala.
Added a Total row for Code File Sizing table in the detailed report.
Added files with Pandas usages table and Pandas API usages summaries table.
Added the new File Type Summary table
Added a new table called Files with Spark Usages.
Added a new table called Files with Spark Usages by Support Status.
Added SQL usages by file type table.
Added SQL usages by status table.
Transpose Notebook stats by language table
Updated the detailed docx report to classify the readiness scores with N/A values as a green result
Reindex order of tables in the deatiled report.
Updated conversion Status for SQL nodes in HiveSql and SparkSql that doesn't need conversion
Updates to SQL parsing support
Identify and register mixed indentation error.
Parse IS as Binary Operator
Support RLike as Binary Operator
Support DotNames which starts with numbers
Parse Lateral View Clause
Parse Parquet as Name in the Using table option
Parsing IF as Function name
Parse query parameters as expressions in SparkSQL.
Parse IMAGE as alias
Parse module(%) operator
Parse ALL as alias
Parse of SQL notebook cell with %% in magic commands
Added a core library mapping table to support the third party library analysis
Added ConversionStatusLibraries.csv
Fixed
Comment out remaining semicolon in top level statement in HiveQL.
Fixed Parse Lateral View with multiple AsClauses
Fixed Parse Lateral View parsing order
May 16, 2024
Application Version 1.2.4
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.45.1
Spark Conversion Core Version 2.45.1
Added
Argument/parameter information in Python listed in the usages inventories
Added mappings:
General PySpark
pyspark.sql.functions.map_from_arrays
pyspark.sql.dataframe.DataFrame.toPandas
ML related Spark mappings for:
pyspark.ml
pyspark.ml.classification
pyspark.ml.clustering
pyspark.ml.feature
pyspark.ml.regression
pyspark.ml.feature StringIndexer
pyspark.ml.clustering KMeans
pyspark.ml.feature OneHotEncoder
pyspark.ml.feature MinMaxScaler
pyspark.ml.regression LinearRegression
pyspark.ml.feature StandardScaler
pyspark.ml.classification RandomForestClassifier
pyspark.ml.classification LogisticRegression
pyspark.ml.feature PCA
pyspark.ml.classification GBTClassifier
pyspark.ml.classification DecisionTreeClassifier
pyspark.ml.classification LinearSVC
pyspark.ml.feature RobustScaler
pyspark.ml.feature Binarizer
pyspark.ml.feature MaxAbsScaler
pyspark.ml.feature Normalizer
Pandas API mappings have begun to the new Snowpark implementation of Pandas. These will not be converted, but will now be reported in the Pandas Usages Inventory. 82 mappings for the Pandas API were mapped. All are direct mappings with the exception of the first one:
pandas.core.series.Series.transpose
[rename]pandas
pandas.core.frame.DataFrame
pandas.core.frame.DataFrame.abs
pandas.core.frame.DataFrame.add_suffix
pandas.core.frame.DataFrame.axes
pandas.core.frame.DataFrame.columns
pandas.core.frame.DataFrame.copy
pandas.core.frame.DataFrame.cummax
pandas.core.frame.DataFrame.cummin
pandas.core.frame.DataFrame.describe
pandas.core.frame.DataFrame.diff
pandas.core.frame.DataFrame.drop
pandas.core.frame.DataFrame.drop_duplicates
pandas.core.frame.DataFrame.dtypes
pandas.core.frame.DataFrame.duplicated
pandas.core.frame.DataFrame.empty
pandas.core.frame.DataFrame.first
pandas.core.frame.DataFrame.first_valid_index
pandas.core.frame.DataFrame.head
pandas.core.frame.DataFrame.iloc
pandas.core.frame.DataFrame.isin
pandas.core.frame.DataFrame.isna
pandas.core.frame.DataFrame.isnull
pandas.core.frame.DataFrame.iterrows
pandas.core.frame.DataFrame.itertuples
pandas.core.frame.DataFrame.keys
pandas.core.frame.DataFrame.last
pandas.core.frame.DataFrame.last_valid_index
pandas.core.frame.DataFrame.max
pandas.core.frame.DataFrame.mean
pandas.core.frame.DataFrame.median
pandas.core.frame.DataFrame.min
pandas.core.frame.DataFrame.ndim
pandas.core.frame.DataFrame.notna
pandas.core.frame.DataFrame.notnull
pandas.core.frame.DataFrame.rename_axis
pandas.core.frame.DataFrame.reset_index
pandas.core.frame.DataFrame.select_dtypes
pandas.core.frame.DataFrame.set_axis
pandas.core.frame.DataFrame.set_index
pandas.core.frame.DataFrame.shape
pandas.core.frame.DataFrame.size
pandas.core.frame.DataFrame.squeeze
pandas.core.frame.DataFrame.sum
pandas.core.frame.DataFrame.tail
pandas.core.frame.DataFrame.take
pandas.core.frame.DataFrame.update
pandas.core.frame.DataFrame.value_counts
pandas.core.frame.DataFrame.values
pandas.core.groupby.generic.DataFrameGroupBy.count
pandas.core.groupby.generic.DataFrameGroupBy.max
pandas.core.groupby.generic.DataFrameGroupBy.sum
pandas.core.series.Series.abs
pandas.core.series.Series.add_prefix
pandas.core.series.Series.add_suffix
pandas.core.series.Series.array
pandas.core.series.Series.axes
pandas.core.series.Series.cummax
pandas.core.series.Series.cummin
pandas.core.series.Series.describe
pandas.core.series.Series.diff
pandas.core.series.Series.dtype
pandas.core.series.Series.dtypes
pandas.core.series.Series.first_valid_index
pandas.core.series.Series.hasnans
pandas.core.series.Series.idxmax
pandas.core.series.Series.idxmin
pandas.core.series.Series.keys
pandas.core.series.Series.last
pandas.core.series.Series.last_valid_index
pandas.core.series.Series.median
pandas.core.series.Series.notna
pandas.core.series.Series.rename_axis
pandas.core.series.Series.set_axis
pandas.core.series.Series.squeeze
pandas.core.series.Series.T
pandas.core.series.Series.tail
pandas.core.series.Series.take
pandas.core.series.Series.to_list
pandas.core.series.Series.to_numpy
pandas.core.series.Series.update
Updated Mappings:
Added transformation for csv, json, and parquet functions including:
pyspark.sql.readwriter.DataFrameWriter.json
pyspark.sql.readwriter.DataFrameWriter.csv
pyspark.sql.readwriter.DataFrameWriter.parquet
Updated mapping for
pyspark.rdd.RDD.getNumPartitions
to transformationUpdated mapping for
pyspark.storagelevel.StorageLevel
to transformation
Added end-to-end test infrastructure and input/output validations
Changed the import statement transformation: not supported imports are removed and EWI messages are not generated in the code
Updated conversion Status for SQL nodes in Hive that doesn't need conversion (multiple expressions - part 02)
Update the SqlElementsInfo.csv with new identified elements
Updated Replacer and SqlElementsInfo items to include Transformation
Enable decorations in transformation to comment out unsupported nodes
Fixed the groupBy function in the source code of
org.apache.spark.sql.DataFrame
to place it correctly in the symbol tabletoPandas added as pyspark in the ThirdPartyLibs
Fixed
Fixed some scenarios where EWI comments were not being added to the output code
Fixed processing of empty source cells presents in Jupyter Notebooks
Fixed parsing error message not being added in the output code
Fixed issue of
pyspark.sql.functions.udf
requiring the return_type parameter
May 2, 2024
Application Version 1.2.2
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.44.0
Spark Conversion Core Version 2.44.0
Added
Argument information available in in Python usages inventory
Updated conversion Status for SQL nodes in Hive that don't need conversion
Operators - numeric expressions
Function expressions
Multiple expressions
Name expressions and literals
Parsing improvments in SparkSQL:
DESCRIBE TABLE Clause
REFRESH Clause
Add the groupBy parameters in the analysis of org.apache.spark.sql.DataFrame
Improved the logging mechanism to indicate if the logs are only written when errors happened or if all messages were logged (introduced the DebugLogger to log all messages)
Updated the default value of Scala parser timeout from 150ms to 300ms
Update SqlElementsInfo.csv to Direct Status
Changed order in the SqlElementsInfo.csv
Update parsing error message when a SQL statement is not parsed
Statements without recovery are now added to Issues.csv
Changed SqlElements mapping status to Direct and Partial
Updated the fully qualified names for the following Spark elements in the conversion status file:
pyspark.sql.streaming.readwriter.DataStreamReader
pyspark.sql.streaming.readwriter.DataStreamWriter
pyspark.sql.streaming.query.StreamingQuery
Added the following Spark elements to the conversion status file as **NotSupported**:
pyspark.sql.streaming.readwriter.DataStreamReader.format
pyspark.sql.streaming.readwriter.DataStreamReader.table
pyspark.sql.streaming.readwriter.DataStreamWriter.partitionBy
pyspark.sql.streaming.query.StreamingQuery.awaitTermination
Removed the generation of the SummaryReport.docx, SummaryReport.html, and DetailedReport.html report files. Only the DetailedReport.docx will be generated.
Fixed
Fixed the issue of the SMA tool not detecting Python cells (%magic) in .scala notebooks
Fixed EWI comments not being added to the output code
Fixed processing of empty source cells presents in jupyter notebooks
Fixed categorization of Spark identified usages and data display in Spark API usage summary table.
April 19, 2024
Application Version 1.0.4
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.42.1
Spark Conversion Core Version 2.42.1
Added
ThirdPartyLibrary to Report Additional Third Party Library Indicator.
Added Transform for Hive Set Statement.
Removed warning related to Unsupported .hql files in Symbol Table Loader for Python.
Added Transform for Hive Drop Table Statement.
Added ConversionBeginTaskBase and refactored tasks.
Added Transform for session.read("query", qry) to session.sql(qry).
Added handling for ImplicitImports node from JsonObjects.
Updated the parsing errors mechanism to avoid commenting out files with parsing errors.
Updated reporting mechanism to generate empty SQL reports when no SQL is found.
Updated the status conversion for the nodes (Create statements) that do not need conversion for Hive Inventory.
Updated the status conversion for the nodes that do not need conversion for Hive Inventory.
Changed EWI SPRKHVSQL1004 to indicate 'Information from underlying data files can not be recovered' instead of 'Purge removed from DROP TABLE statement'' and Change DROP TABLE transformation, to add ewi SPRKHVSQL1004 when PURGE statement is not present.
Collapse SqlNames and SqlJoins in the SQL Usages Inventory.
Updates Several SQL Statement with status and transformations:
Nodes related with MERGE.
Nodes with INSERT, ALTER, DROP TABLE, and CTEs.
Nodes with create table, function, view, and table.
Direct transformations for SqlSelect and related nodes.
Add support for DBC implicit imports.
Fixed
Updated the parsing errors mechanism to avoid commenting out notebooks cells with parsing errors.
Updated CallFunction parse rule to verify if has backslash or not new line to avoid parsing error when return statement has id and next statement is a deconstructed tuple assignment.
Fixed an issue that caused the Import Calls section of the reports to calculate incorrect percentage values.
Fixed issue related to not generating the detailed report.
Fixed EWI SPRKHVSQL1004 not being added to DROP TABLE transformation.
Fixed parsing error about return statement with id and deconstructed tuple assignment.
Fixed an issue that caused the Issues.csv and the notifications.pam files to not show the line, column, and file id of the files with parsing errors.
Fixed the text about ranges of readiness score.
March 19, 2024
Application Version 1.0.4
Feature Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.40.1
Spark Conversion Core 2.40.1
Added
Parsing support for HiveQL including support for HiveSql files (.hql)
Remove the import for
snowpark-extensions
in PythonLogo updated in the Detailed Report
Ignored files are now noted in the Detailed Report
SQL elements calculator and SQL elements table added to the detailed report
Added transformation for
WHEN NOT MATCH BY SOURCE
when multiple match conditions existSite-packages,
pip
,dist
,venv
, and hidden directories now excluded from processing by the SMARename Supported to IsSnowparkAnacondaSupported in the Import Usages spreadsheet
Added SQL elements to the SqlElementsInfo.csv catalog
Added a new column named Flavor to the SqlElementsInfo.csv inventory to distinguish between SparkSQL and HiveQL
Added parsing errors for SQL code to the Issues.csv file
New EWI's added for
org.apache.spark.sql.functions.split
related parameter errors36 additional RDD elements added to the core mapping table (currently will be listed as unsupported)
Transformation and conversion support for:
org.apache.spark.sql.types.StructField
org.apache.spark.sql.functions.translate
org.apache.spark.sql.Builder.enableHiveSupport
pyspark.sql.functions.split
org.apache.spark.sql.functions.split
Adjusted the replacer for
pyspark.sql.functions.unix_timestamp
Fixed
Modified the source concatenation process to ensure that magic commands are kept distinct. Now, strings are concatenated continuously until a magic command is encountered, at which point each magic command is handled separately.
Removed new lines in the format of Single line SQL when printing
Path for the generation of assessment zip files has been corrected
Corrected unnecessary imports of
org.apache.spark.sql.Dataset
Conversion now removes Apache Spark imports remain after migration
March 18, 2024
Application Version 1.0.0
Feature Updates include:
New Snowpark Migration Accelerator logo.
Improved Assessment reports.
Updated Spark (Scala and Python) Conversion Core: 2.33.0
Spark Conversion Core 2.33.0
Added
Added additional inventory elements to the core mapping tables (currently, listed as not supported):
Pandas not supported cases in the pandas mappings
Added ML, Streaming and Blank not supported cases
Updated custom EWIs for Micro-partition, clustering, and streaming cases
February 12, 2024
Application Version 0.38.0
Feature Updates include:
Automatic license provisioning, now you can request a new SMA license directly from the app and receive it in your email.
Updated Spark (Scala and Python) Conversion Core: 2.29.0
Spark Conversion Core 2.29.0
Added
Added SQL elements inventory
Reports are no longer filtered by readiness score or Snowflake user
Group Import Call Summary table in Assessment Report by package
Added support Snowpark API Versions:
Snowpark API version 1.10.0 on Python
Snowpark API version 1.9.0 on Python
Snowpark API version 1.8.0 on Python
Added/Updated mappings for:
Pyspark
pyspark.sql.functions.pandas_udf
pyspark.sql.group.GroupedData.pivot
pyspark.sql.functions.unix_timestamp
Scala
Multiple scenarios of
contains
functions, includingorg.apache.spark.sql.Column.contains(scala.Any)
org.apache.spark.sql.types.StructField.name
org.apache.spark.sql.types.StructField.fields
org.apache.spark.sql.function.array_agg
Recollection of Pandas data:
Created Inventory for Pandas Usages
Supported Pandas at ConversionStatus
Added Pandas Information in reports
Generates assessment zip file
Support for parsing of an empty interpolation scenario (${})
Updated examples of the DetailedReport template in Appendix A for Python and Scala
Avoid adding hardcoded credentials to SparkConf transformation
Add JSON inventory conversion logic to code processor
Fixed
Fixed inconsistencies of table called notebook sizing by language
Fixed issue with try/except in sprocs creation
Exclude internal imports in Assessment Report and add origin to import inventory
Improve EWI message for parsing errors
Fixed error missing .map files in scala
Fixed no file type summary for other code extensions
Fixed parsing errors for methods named 'match'.
Fixed an error that omitted some files in the File Sizing table
Remove useless statement after removal of not required functions
Fix replacer to remove unsupported clearCache function
Fix parsing for *args and **kwargs with backslash
Fix scenario where alias of column with brackets was removed in transformation due to bad resolution
November 27, 2023
The tool's name has changed from SnowConvert for Spark to the Snowpark Migration Accelerator (SMA).
Application Version 0.33.1
Feature Updates include:
Name Change: SnowConvert for Spark -> Snowpark Migration Accelerator (SMA)
Updated Spark (Scala and Python) Conversion Core: 2.20.0
Trial Mode Enabled
Code Compare
See the Code Compare section of the documentation.
Updated assessment report in the UI
Walk through the updated assessment report in the application.
Updated support email available: sma-support@snowflake.com
Spark Conversion Core 2.20.0
Added
Add support to convert from Databricks to Jupyter (.dbc -> .ipynb)
Add line number of the error when there is a parsing error
Add company written by the user to the execution info in the assessment summary
Add mappings for:
org.apache.spark.sql.Column.contains(scala.Any)
Add needed data to display detailed report info in the Desktop tool reports
Updates to the assessment JSON file to accommodate the detailed assessment report
Dataframes saved as a tables using a Hive format now converted to not be specific to Hive
Add automated generation of stored procedures for Spark entry points
Add preprocess step in Python files to identify combination of spaces and tabs, and normalize them with spaces to prevent parsing errors
Inventories uploaded to telemetry even if the tool crashes
Adjust new tool name (Snowpark Migration Accelerator) in DOCX and HTML reports to accommodate the rebranding
Fixed
Fix Import call summary table in the report not matching the total value
Fix timeout issue in application for StructType with multiple fields
Fix indentation scenarios that do not require normalization in Scala
Fix 'Load Symbol Table' crash when the base class is not defined
Fix an issue causing the 'Python File Sizing' and 'Scala File Sizing' tables in the reports to display wrong values
Fix tool getting stuck when processing SQL files in Scala
November 09, 2023
Application Version 0.27.5
Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.16.0
Update to the license request mechanism inside the application.
Spark Conversion Core 2.16.0
Updates include:
Add support for DataFrame alias at joins for Spark Scala.
Import Call Summary table in Assessment Report truncated and ordered.
Turn off by default the condensed file ID feature.
November 02, 2023
Application Version 0.26.0
Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.14.0
The logger mechanism has been updated.
October 25, 2023
Application Version 0.25.11
Updates include:
Updated Spark (Scala and Python) Conversion Core: 2.14.0
Improved crash report flow
Fixes in Code Compare component
The button “View Reports” was changed to open the expected folder
Spark Conversion Core 2.14.0
Updates include:
Add condensed ID for filenames and use it in the log.
Refactor output folder hierarchy of the TrialMode.
Generate Reports locally in Assessment mode when the score hits 90 or higher.
Generate Reports locally in Assessment mode when it's a Snowflake user.
Create inventories as .csv files (as shown below).
Move inventories to the Reports folder (as shown below).
October 19, 2023
Version 0.25.6 (Oct 19, 2023)
Included SnowConvert Core Versions
Fixes
Inconsistencies with Spark-supported file extensions
CLI Terms and Conditions and Show Access Code options
Visual fixes
Features
SnowConvert Client separation
Version 0.24.0(Oct 04, 2023)
Included SnowConvert Core Versions
Fixes
Conversion settings persistency on project files.
Inconsistencies in SQL Assessment and Conversion reports were fixed.
Features
Feature Flags for CLI
Version 0.20.3(Sept 14, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.2.63
Oracle
Teradata
SQLServer
Scala Conversion Core 2.6.0
Python Conversion Core 2.6.0
Features
Analyzing sub-folders and Converting sub-folders are now available.
Include the Disable topological level reorder flag as part of the Teradata conversion settings.
Fixes
Conversion finished successfully but reporting a crashed status.
SQL Server schema was set to PUBLIC automatically.
Missing generic scanner files on Spark/Python assessment.
Updated EULA.
Version 0.19.7(Sept 7, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.2.48
Oracle
Teradata
SQLServer
Scala Conversion Core 2.5.0
Python Conversion Core 2.5.0
Version 0.19.1(Sept 4, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.2.30
Oracle
Teradata
SQLServer
Scala Conversion Core 2.4.0
Python Conversion Core 2.4.0
Fixes
Changed default Conversion Rate on Reports to Lines of Code Conversion Rate.
Fixed issues with the list of Recently opened projects.
Fixed issue when trying to open an invalid .snowct file
Version 0.17.0(Aug 24, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.2.9
Oracle
Teradata
SQLServer
Scala Conversion Core 2.3.31
Python Conversion Core 2.3.31
Fixes
Assessment Conversion settings on the correct platforms.
Input Folder validations.
Creating a project with an existent name in the input folder blocked the application.
Version 0.16.1(Aug 21, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.0.47
Oracle
Teradata
SQLServer
Scala Conversion Core 2.3.31
Python Conversion Core 2.3.31
Fixes
A unified CLI version is now available.
Fix displayed data on SQL Conversion reports.
Open recent project issues when starting a new project.
Assessment settings.
Version 0.15.2(Aug 17, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.0.47
Oracle
Teradata
SQLServer
Scala Conversion Core 2.3.31
Python Conversion Core 2.3.31
Fixes
An auto-update issue with the x64 version for macOS. (Requires manual reinstallation).
Fix links displayed in report pages.
Minor updates in texts and labels.
Version 0.14.5(Aug 10, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.0.32
Oracle
Teradata
SQLServer
Scala Conversion Core 2.3.31
Python Conversion Core 2.3.31
Hotfix change for Snowpark Engines.
Version 0.14.1 (Aug 9, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.0.32
Oracle
Teradata
SQLServer
Scala Conversion Core 2.3.22
Python Conversion Core 2.3.22
Fixes
Fixed visual bugs on reports.
Changes on the Request an Access Code page
Rename the access-code field on the .snowct files.
Don't create empty output folders.
Version 0.13.1 (Aug 3, 2023)
Included SnowConvert Core Versions
SQL Conversion Core: 22.0.17
Oracle
Teradata
SQLServer
Scala Conversion Core 2.3.22
Python Conversion Core 2.3.22
Fixes
Improvements in Assessment and Conversion Reports
Updates in the reports layouts.
Collapsible sections.
Order in Card Components.
Version 0.11.7 (July 27, 2023)
Included SnowConvert Core Versions
Fixes
Fixing Conversion Rate by LoC.
Adding % to SQL LoC Conversion Rate
Output path validation was added in the report viewer.
Telemetry can be disabled once a valid license is selected.
Version 0.11.3 (July 19, 2023)
Included SnowConvert Core Versions
Fixes
Conversion settings reset after changing the current step.
Minor visual improvements.
Wording changes.
Version 0.9.2 (July 12, 2023)
Included SnowConvert Core Versions
Fixes
Included preview header.
Minor visual improvements.
Version 0.8.2 (July 10, 2023)
Included SnowConvert Core Versions
Fixes
Reset the timer on the progress bar in alerts.
Fixing styles on displayed alert notifications.
Added preview banner on application header.
Improved exception handling mechanism.
Version 0.7.6 (July 03, 2023)
Included SnowConvert Core Versions
Fixes
Updates notarization tool.
Fix the conversion rate issue when using conversion settings.
Fix the open new project flow after opening an old project.
Remove the .mobilize folder from outputs.
Improve alerts and notifications.
Windows certificate naming issue. (Requires manual reinstallation).
Version 0.6.1 (June 23, 2023)
Included SnowConvert Core Versions
Fixes
Sign Windows binaries with Snowflake certificates.
Fixed issue when creating a new project after opening an existing one.
Minor styling and wording improvements.
Version 0.4.1 (June 21, 2023)
Included SnowConvert Core Versions
Fixes
The report information does not display the correct information.
Keep the conversion failed status when reopening the project.
Update texts and documentation links.
Version 0.3.0 (June 16, 2023)
Included SnowConvert Core Versions
Fixes
Added tool version in error logs.
Included custom installation wizard for Windows version.
Assessment report tables not processing numbers with commas.
The code signing certificate was changed. This affects the OTA Update, manual installation of this version is required.
Version 0.2.9 (June 15, 2023)
Included SnowConvert Core Versions
Fixes
Missing information in telemetry reports
Fix the auto-saving issue with .snowct project files.
Telemetry enabled for conversion flows.
Error is shown when trying to convert without supported files.
Last updated