Links

Output code setup

Before running migrated pyspark source code, there are a couple of things to consider

Install snowpark and snowpark extensions libraries

Snowpark and snowpark extensions libraries must be referenced from migrated project.

Snowpark Extensions

Snowpark Extensions is a support library that extends the standard Snowpark library by adding different functionalities that are present in PySpark but are not currently supported by Snowpark. The goal of this library is to facilitate the conversion process of projects from PySpark to Snowpark.
Here are the steps to reference snowpark and snowpark extensions libraries from the migrated code.

Step 1 - Install snowpark library

pip install snowpark-extensions

Step 2 - Install snowpark extensions library

pip install snowflake-snowpark-python

Step 3 - Add snowpark extensions library import statements

The tool includes this import in each file that uses pyspark.
import snowpark_extensions

Code example

In the following code, create_map function is not supported by PySpark, but not supported by Snowpark. The code will work because create_map function is one of the included in snowpark extensions.

Input code

import pyspark.sql.functions as df
df.select(create_map('name', 'age').alias("map")).collect()

Output code

import snowpark_extensions
import snowflake.snowpark.functions as df
df.select(create_map('name', 'age').alias("map")).collect()