Migration Lab

A complete walkthrough

[Note that this is also part of the Snowflake End-to-End Migration Quickstart available in the Snowflake quickstarts.]

Moving the logic and data in a data warehouse is essential to getting an operational database on a new platform. But to take advantage of the new platform in a functional way, any pipelines running moving data in or out of that data platform need to be repointed or replatformed as well. This can often be challenging as there are usually a variety of pipelines being used. This HoL will focus on just one for which Snowflake can provide some acceleration. But note that new ETL and pipeline accelerators are constantly being developed.

Let’s talk about the pipeline and the notebook we are moving in this scenario. As a reminder, this a SQL Server database migration, but scoped to a Proof of Concept. A small data mart in SQL Server has been moved by AdventureWorks to Snowflake. There is a basic pipeline script and a reporting notebook that AdventureWorks has included as part of this POC. Here is a summary of each artifact:

The pipeline script is written in Python using Spark. This script is reading an accessible file generated by an older POS system in a local directory at regular intervals run by an orchestration tool. (Something like Airflow, but the orchestration is not part of the POC, so we’re not 100% sure what it is.)
The notebook is a reporting notebook that reads from the existing SQL Server database and reports on a few summary metrics.

Neither of these are too complex, but both are just the tip of the iceberg. There are hundreds more pipeline scripts and notebooks related to other data marts. This POC will just move these two.

Both of these use Spark and access the SQL Server database. So our goal is essentially to move the operations in Spark into Snowpark. Let’s see how we would do this using the Snowpark Migration Accelerator (SMA). The SMA is a sister tool to SnowConvert and is built on the same foundation. We are going to walk through many steps (most of which will be similar to what we did with SnowConvert), but note that we are still essentially working through the same assessment -> conversion -> validation flow that we have already walked through.

Notes on this Lab Environment

This lab uses the Snowpark Migration Accelerator and the Snowflake VS Code Extension. But to make the most of this, you will need to run Python with a PySpark. The simplest way to start this would be to start an environment with the anaconda distribution. This will have most of the packages needed to run the code in this lab.

You will still need to make available the following resources:

Python Libraries
VS Code Extensions
Other
- PySpark JDBC Driver of SQL Server

Having said all of this, you can still run this lab with just a Snowflake account, the SMA, and the Snowflake VS Code Extension. You will not be able to run everything (particularly, the source code), but you will be able to use all of the converted elements in Snowflake.

Now let’s get started by assessing what we have.

PreviousConversion Walkthrough NextCompatibility and Assessment

Last updated 1 month ago