Output code setup
Before running migrated pyspark source code, there are a couple of things to consider
Snowpark and snowpark extensions libraries must be referenced from migrated project.
Snowpark Extensions is a support library that extends the standard Snowpark library by adding different functionalities that are present in PySpark but are not currently supported by Snowpark. The goal of this library is to facilitate the conversion process of projects from PySpark to Snowpark.
Here are the steps to reference snowpark and snowpark extensions libraries from the migrated code.
pip install snowpark-extensions
pip install snowflake-snowpark-python
The tool includes this import in each file that uses pyspark.
import snowpark_extensions
In the following code, create_map function is not supported by PySpark, but not supported by Snowpark. The code will work because create_map function is one of the included in snowpark extensions.
import pyspark.sql.functions as df
df.select(create_map('name', 'age').alias("map")).collect()
import snowpark_extensions
import snowflake.snowpark.functions as df
df.select(create_map('name', 'age').alias("map")).collect()