LogoLogo
SnowflakeDocumentation Home
  • Snowpark Migration Accelerator Documentation
  • General
    • Introduction
    • Getting Started
      • Download and Access
      • Installation
        • Windows Installation
        • MacOS Installation
        • Linux Installation
    • Conversion Software Terms of Use
      • Open Source Libraries
    • Release Notes
      • Old Version Release Notes
        • SC Spark Scala Release Notes
          • Known Issues
        • SC Spark Python Release Notes
          • Known Issues
    • Roadmap
  • User Guide
    • Overview
    • Before Using the SMA
      • Supported Platforms
      • Supported Filetypes
      • Code Extraction
      • Pre-Processing Considerations
    • Project Overview
      • Project Setup
      • Configuration and Settings
      • Tool Execution
    • Assessment
      • How the Assessment Works
      • Assessment Quick Start
      • Understanding the Assessment Summary
      • Readiness Scores
      • Output Reports
        • Curated Reports
        • SMA Inventories
        • Generic Inventories
        • Assessment zip file
      • Output Logs
      • Spark Reference Categories
    • Conversion
      • How the Conversion Works
      • Conversion Quick Start
      • Conversion Setup
      • Understanding the Conversion Assessment and Reporting
      • Output Code
    • Using the SMA CLI
      • Additional Parameters
  • Use Cases
    • Assessment Walkthrough
      • Walkthrough Setup
        • Notes on Code Preparation
      • Running the Tool
      • Interpreting the Assessment Output
        • Assessment Output - In Application
        • Assessment Output - Reports Folder
      • Running the SMA Again
    • Conversion Walkthrough
    • Sample Project
    • Using SMA with Docker
    • SMA CLI Walkthrough
  • Issue Analysis
    • Approach
    • Issue Code Categorization
    • Issue Codes by Source
      • General
      • Python
        • SPRKPY1000
        • SPRKPY1001
        • SPRKPY1002
        • SPRKPY1003
        • SPRKPY1004
        • SPRKPY1005
        • SPRKPY1006
        • SPRKPY1007
        • SPRKPY1008
        • SPRKPY1009
        • SPRKPY1010
        • SPRKPY1011
        • SPRKPY1012
        • SPRKPY1013
        • SPRKPY1014
        • SPRKPY1015
        • SPRKPY1016
        • SPRKPY1017
        • SPRKPY1018
        • SPRKPY1019
        • SPRKPY1020
        • SPRKPY1021
        • SPRKPY1022
        • SPRKPY1023
        • SPRKPY1024
        • SPRKPY1025
        • SPRKPY1026
        • SPRKPY1027
        • SPRKPY1028
        • SPRKPY1029
        • SPRKPY1030
        • SPRKPY1031
        • SPRKPY1032
        • SPRKPY1033
        • SPRKPY1034
        • SPRKPY1035
        • SPRKPY1036
        • SPRKPY1037
        • SPRKPY1038
        • SPRKPY1039
        • SPRKPY1040
        • SPRKPY1041
        • SPRKPY1042
        • SPRKPY1043
        • SPRKPY1044
        • SPRKPY1045
        • SPRKPY1046
        • SPRKPY1047
        • SPRKPY1048
        • SPRKPY1049
        • SPRKPY1050
        • SPRKPY1051
        • SPRKPY1052
        • SPRKPY1053
        • SPRKPY1054
        • SPRKPY1055
        • SPRKPY1056
        • SPRKPY1057
        • SPRKPY1058
        • SPRKPY1059
        • SPRKPY1060
        • SPRKPY1061
        • SPRKPY1062
        • SPRKPY1063
        • SPRKPY1064
        • SPRKPY1065
        • SPRKPY1066
        • SPRKPY1067
        • SPRKPY1068
        • SPRKPY1069
        • SPRKPY1070
        • SPRKPY1071
        • SPRKPY1072
        • SPRKPY1073
        • SPRKPY1074
        • SPRKPY1075
        • SPRKPY1076
        • SPRKPY1077
        • SPRKPY1078
        • SPRKPY1079
        • SPRKPY1080
        • SPRKPY1081
        • SPRKPY1082
        • SPRKPY1083
        • SPRKPY1084
        • SPRKPY1085
        • SPRKPY1086
        • SPRKPY1087
        • SPRKPY1088
        • SPRKPY1089
        • SPRKPY1101
      • Spark Scala
        • SPRKSCL1000
        • SPRKSCL1001
        • SPRKSCL1002
        • SPRKSCL1100
        • SPRKSCL1101
        • SPRKSCL1102
        • SPRKSCL1103
        • SPRKSCL1104
        • SPRKSCL1105
        • SPRKSCL1106
        • SPRKSCL1107
        • SPRKSCL1108
        • SPRKSCL1109
        • SPRKSCL1110
        • SPRKSCL1111
        • SPRKSCL1112
        • SPRKSCL1113
        • SPRKSCL1114
        • SPRKSCL1115
        • SPRKSCL1116
        • SPRKSCL1117
        • SPRKSCL1118
        • SPRKSCL1119
        • SPRKSCL1120
        • SPRKSCL1121
        • SPRKSCL1122
        • SPRKSCL1123
        • SPRKSCL1124
        • SPRKSCL1125
        • SPRKSCL1126
        • SPRKSCL1127
        • SPRKSCL1128
        • SPRKSCL1129
        • SPRKSCL1130
        • SPRKSCL1131
        • SPRKSCL1132
        • SPRKSCL1133
        • SPRKSCL1134
        • SPRKSCL1135
        • SPRKSCL1136
        • SPRKSCL1137
        • SPRKSCL1138
        • SPRKSCL1139
        • SPRKSCL1140
        • SPRKSCL1141
        • SPRKSCL1142
        • SPRKSCL1143
        • SPRKSCL1144
        • SPRKSCL1145
        • SPRKSCL1146
        • SPRKSCL1147
        • SPRKSCL1148
        • SPRKSCL1149
        • SPRKSCL1150
        • SPRKSCL1151
        • SPRKSCL1152
        • SPRKSCL1153
        • SPRKSCL1154
        • SPRKSCL1155
        • SPRKSCL1156
        • SPRKSCL1157
        • SPRKSCL1158
        • SPRKSCL1159
        • SPRKSCL1160
        • SPRKSCL1161
        • SPRKSCL1162
        • SPRKSCL1163
        • SPRKSCL1164
        • SPRKSCL1165
        • SPRKSCL1166
        • SPRKSCL1167
        • SPRKSCL1168
        • SPRKSCL1169
        • SPRKSCL1170
        • SPRKSCL1171
        • SPRKSCL1172
        • SPRKSCL1173
        • SPRKSCL1174
        • SPRKSCL1175
      • SQL
        • SparkSQL
          • SPRKSPSQL1001
          • SPRKSPSQL1002
          • SPRKSPSQL1003
          • SPRKSPSQL1004
          • SPRKSPSQL1005
          • SPRKSPSQL1006
        • Hive
          • SPRKHVSQL1001
          • SPRKHVSQL1002
          • SPRKHVSQL1003
          • SPRKHVSQL1004
          • SPRKHVSQL1005
          • SPRKHVSQL1006
      • Pandas
        • PNDSPY1001
        • PNDSPY1002
        • PNDSPY1003
        • PNDSPY1004
      • DBX
        • SPRKDBX1001
    • Troubleshooting the Output Code
      • Locating Issues
    • Workarounds
    • Deploying the Output Code
  • Translation Reference
    • Translation Reference Overview
    • SIT Tagging
      • SQL statements
    • SQL Embedded code
    • HiveSQL
      • Supported functions
    • Spark SQL
      • Spark SQL DDL
        • Create Table
          • Using
      • Spark SQL DML
        • Merge
        • Select
          • Distinct
          • Values
          • Join
          • Where
          • Group By
          • Union
      • Spark SQL Data Types
      • Supported functions
  • Workspace Estimator
    • Overview
    • Getting Started
  • INTERACTIVE ASSESSMENT APPLICATION
    • Overview
    • Installation Guide
  • Support
    • General Troubleshooting
      • How do I give SMA permission to the config folder?
      • Invalid Access Code error on VDI
      • How do I give SMA permission to Documents, Desktop, and Downloads folders?
    • Frequently Asked Questions (FAQ)
      • Using SMA with Jupyter Notebooks
      • How to request an access code
      • Sharing the Output with Snowflake
      • DBC files explode
    • Glossary
    • Contact Us
Powered by GitBook
On this page
  • Spark Scala
  • Add snowpark and snowpark extensions library reference
  • Snowpark Extensions
  • Step 1 - Add snowpark and snowpark extensions library references to the project configuration file
  • Step 2 - Add snowpark extensions library import statements
  • Code example
  • PySpark
  • Install snowpark and snowpark extensions libraries
  • Snowpark Extensions
  • Code example
  1. Issue Analysis

Deploying the Output Code

Running the output code may require some setup

Running the output code from the SMA depends on your local environment. Here are some recommendations based on source.

Spark Scala

Before running migrated spark source code, there are a couple of things to consider

Add snowpark and snowpark extensions library reference

Snowpark and snowpark extensions libraries must be referenced from migrated project.

Snowpark Extensions

Snowpark Extensions is a support library that extends the standard Snowpark library by adding different functionalities that are present in Apache Spark but are not currently supported by Snowpark. The goal of this library is to facilitate the conversion process of projects from Apache Spark to Snowpark.

Here are the steps to reference snowpark and snowpark extensions libraries from the migrated code.

Step 1 - Add snowpark and snowpark extensions library references to the project configuration file

The tool will try to add these dependencies to the project configuration file. Once the references has been added to the project configuration file, the build tool will take care of resolving the references.

Based on the extension of the project configuration file, the tool adds the references as follows:

build.gradle

dependencies {
    implementation 'com.snowflake:snowpark:1.6.2'
    implementation 'net.mobilize.snowpark-extensions:snowparkextensions:0.0.9'
    ...
}

build.sbt

...
libraryDependencies += "com.snowflake" % "snowpark" % "1.6.2"
libraryDependencies += "net.mobilize.snowpark-extensions" % "snowparkextensions" % "0.0.9"
...

pom.xml

<dependencies>
    <dependency>
        <groupId>com.snowflake</groupId>
        <artifactId>snowpark</artifactId>
        <version>1.6.2</version>
    </dependency>
    <dependency>
        <groupId>net.mobilize.snowpark-extensions</groupId>
        <artifactId>snowparkextensions</artifactId>
        <version>0.0.9</version>
    </dependency>
    ...
</dependencies>

Step 2 - Add snowpark extensions library import statements

The tool includes these two import statements in all output .scala files.

import com.snowflake.snowpark_extensions.Extensions._
import com.snowflake.snowpark_extensions.Extensions.functions._

Code example

In the following code, hex and isin are supported by Spark, but these are not supported by Snowpark. The code will work because hex and isin are functions included as extensions.

Input code

package com.mobilize.spark

import org.apache.spark.sql._

object Main {

   def main(args: Array[String]) : Unit = {

      var languageArray = Array("Java");

      var languageHex = hex(col("language"));

      col("language").isin(languageArray:_*);
   }

}

Output code

package com.mobilize.spark

import com.snowflake.snowpark._
import com.snowflake.snowpark_extensions.Extensions._
import com.snowflake.snowpark_extensions.Extensions.functions._

object Main {

   def main(args: Array[String]) : Unit = {

      var languageArray = Array("Java");
      
      // hex does not exist on Snowpark. It is a extension.
      var languageHex = hex(col("language"));
      
      // isin does not exist on Snowpark. It is a extension.
      col("language").isin(languageArray :_*)

   }

}

PySpark

Before running migrated pyspark source code, there are a couple of things to consider

Install snowpark and snowpark extensions libraries

Snowpark and snowpark extensions libraries must be referenced from migrated project.

Snowpark Extensions

Snowpark Extensions is a support library that extends the standard Snowpark library by adding different functionalities that are present in PySpark but are not currently supported by Snowpark. The goal of this library is to facilitate the conversion process of projects from PySpark to Snowpark.

Here are the steps to reference snowpark and snowpark extensions libraries from the migrated code.

Step 1 - Install snowpark library

pip install snowpark-extensions

Step 2 - Install snowpark extensions library

pip install snowflake-snowpark-python

Step 3 - Add snowpark extensions library import statements

The tool includes this import in each file that uses pyspark.

import snowpark_extensions

Code example

In the following code, create_map function is not supported by PySpark, but not supported by Snowpark. The code will work because create_map function is one of the included in snowpark extensions.

Input code

import pyspark.sql.functions as df
df.select(create_map('name', 'age').alias("map")).collect()

Output code

import snowpark_extensions
import snowflake.snowpark.functions as df
df.select(create_map('name', 'age').alias("map")).collect()
PreviousWorkaroundsNextTranslation Reference Overview

Last updated 3 months ago