Links

Introduction

Welcome to Snowflake SnowConvert for Spark Scala. Let us be your guide on the road from Apache to Snowflake.

What is SnowConvert for Spark Scala?

SnowConvert is not a find-and-replace or regex matching tool. SnowConvert is software that understands your source code (Scala) by parsing and building a semantic model of your code's behavior. For Spark, SnowConvert identifies the usages of the Spark API, inventories them, and ultimately converts them to their functional equivalent in Snowpark.

SnowConvert Terminology

Here are a few terms and definitions so you will know what they mean when they start dropping all over this documentation:
  • SnowConvert: A software that securely and automatically converts your Spark project written in Scala into the equivalent in Snowflake's Snowpark.
  • Conversion/Transformation Rule: Rules that allow SnowConvert to convert from a portion of source code to the expected target code.
  • Parse: An initial process done by SnowConvert to understand the source code and build up an internal data structure to process conversion rules.
  • SnowConvert Qualification Tool: The version of SnowConvert for Spark that runs in assessment mode. Ultimately, this is software that identifies, precisely and automatically, all Apache Spark Python usages in a codebase.
In the following few pages, you'll learn more about the qualification and conversion capabilities of SnowConvert for Spark Scala. If you're ready to start, visit the Getting Started page in this documentation. For more information about SnowConvert in general, visit our SnowConvert for Spark Scala information page.

Code Conversion

Spark to Snowpark

SnowConvert for Spark Scala converts references to the Spark API in Scala code to references to the Snowpark API 1.6.2. Let's take a look at how this works.

Example of Spark to Snowpark

Here's an example of the conversion of a simple Spark Application. This application reads, filters, joins, calculates an average, and shows results from a given dataset.
Apache Spark Scala Code
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
object SimpleApp {
def avgJobSalary(session: SparkSession, dept: String) {
val employees = session.read.csv("path/data/employees.csv")
val jobs = session.read.csv("path/data/jobs.csv")
val jobsAvgSalary = employees.
filter($"Department" === dept).
join(jobs).
groupBy("JobName").
avg("Salary")
// Salaries in department
jobsAvgSalary.select(collect_list("Salary")).show()
// avg Salary
jobsAvgSalary.show()
}
}
The Converted Snowflake Code:
import com.snowflake.snowpark._
import com.snowflake.snowpark.functions._
import com.snowflake.snowpark.Session
object SimpleApp {
def avgJobSalary(session: Session, dept: String) {
val employees = session.read.csv("path/data/employees.csv")
val jobs = session.read.csv("path/data/jobs.csv")
val jobsAvgSalary = employees.
filter($"Department" === dept).
join(jobs).
groupBy("JobName").
avg("Salary")
// Salaries in department
jobsAvgSalary.select(array_agg("Salary")).show()
// avg Salary
jobsAvgSalary.show()
}
}
In this example, most of the structure of the Scala code is the same, but the references to the Spark API have been changed to references to the Snowpark API. To view our translation reference, please reach out to [email protected].

Workload Assessment and Qualification

If you're not ready to convert, but are interested in better understanding your existing Spark workload and how much of it can be migrated to Snowflake, you can use SnowConvert for Spark Scala in Qualification Mode. This allows you to get an inventory of the existing references to the Spark API in your Scala code. You can also learn about how much of your existing code can be migrated automatically to Snowpark in Snowflake.