LogoLogo
SnowflakeDocumentation Home
  • Snowpark Migration Accelerator Documentation
  • General
    • Introduction
    • Getting Started
      • Download and Access
      • Installation
        • Windows Installation
        • MacOS Installation
        • Linux Installation
    • Conversion Software Terms of Use
      • Open Source Libraries
    • Release Notes
      • Old Version Release Notes
        • SC Spark Scala Release Notes
          • Known Issues
        • SC Spark Python Release Notes
          • Known Issues
    • Roadmap
  • User Guide
    • Overview
    • Before Using the SMA
      • Supported Platforms
      • Supported Filetypes
      • Code Extraction
      • Pre-Processing Considerations
    • Project Overview
      • Project Setup
      • Configuration and Settings
      • Tool Execution
    • Assessment
      • How the Assessment Works
      • Assessment Quick Start
      • Understanding the Assessment Summary
      • Readiness Scores
      • Output Reports
        • Curated Reports
        • SMA Inventories
        • Generic Inventories
        • Assessment zip file
      • Output Logs
      • Spark Reference Categories
    • Conversion
      • How the Conversion Works
      • Conversion Quick Start
      • Conversion Setup
      • Understanding the Conversion Assessment and Reporting
      • Output Code
    • Using the SMA CLI
      • Additional Parameters
  • Use Cases
    • Assessment Walkthrough
      • Walkthrough Setup
        • Notes on Code Preparation
      • Running the Tool
      • Interpreting the Assessment Output
        • Assessment Output - In Application
        • Assessment Output - Reports Folder
      • Running the SMA Again
    • Conversion Walkthrough
    • Sample Project
    • Using SMA with Docker
    • SMA CLI Walkthrough
  • Issue Analysis
    • Approach
    • Issue Code Categorization
    • Issue Codes by Source
      • General
      • Python
        • SPRKPY1000
        • SPRKPY1001
        • SPRKPY1002
        • SPRKPY1003
        • SPRKPY1004
        • SPRKPY1005
        • SPRKPY1006
        • SPRKPY1007
        • SPRKPY1008
        • SPRKPY1009
        • SPRKPY1010
        • SPRKPY1011
        • SPRKPY1012
        • SPRKPY1013
        • SPRKPY1014
        • SPRKPY1015
        • SPRKPY1016
        • SPRKPY1017
        • SPRKPY1018
        • SPRKPY1019
        • SPRKPY1020
        • SPRKPY1021
        • SPRKPY1022
        • SPRKPY1023
        • SPRKPY1024
        • SPRKPY1025
        • SPRKPY1026
        • SPRKPY1027
        • SPRKPY1028
        • SPRKPY1029
        • SPRKPY1030
        • SPRKPY1031
        • SPRKPY1032
        • SPRKPY1033
        • SPRKPY1034
        • SPRKPY1035
        • SPRKPY1036
        • SPRKPY1037
        • SPRKPY1038
        • SPRKPY1039
        • SPRKPY1040
        • SPRKPY1041
        • SPRKPY1042
        • SPRKPY1043
        • SPRKPY1044
        • SPRKPY1045
        • SPRKPY1046
        • SPRKPY1047
        • SPRKPY1048
        • SPRKPY1049
        • SPRKPY1050
        • SPRKPY1051
        • SPRKPY1052
        • SPRKPY1053
        • SPRKPY1054
        • SPRKPY1055
        • SPRKPY1056
        • SPRKPY1057
        • SPRKPY1058
        • SPRKPY1059
        • SPRKPY1060
        • SPRKPY1061
        • SPRKPY1062
        • SPRKPY1063
        • SPRKPY1064
        • SPRKPY1065
        • SPRKPY1066
        • SPRKPY1067
        • SPRKPY1068
        • SPRKPY1069
        • SPRKPY1070
        • SPRKPY1071
        • SPRKPY1072
        • SPRKPY1073
        • SPRKPY1074
        • SPRKPY1075
        • SPRKPY1076
        • SPRKPY1077
        • SPRKPY1078
        • SPRKPY1079
        • SPRKPY1080
        • SPRKPY1081
        • SPRKPY1082
        • SPRKPY1083
        • SPRKPY1084
        • SPRKPY1085
        • SPRKPY1086
        • SPRKPY1087
        • SPRKPY1088
        • SPRKPY1089
        • SPRKPY1101
      • Spark Scala
        • SPRKSCL1000
        • SPRKSCL1001
        • SPRKSCL1002
        • SPRKSCL1100
        • SPRKSCL1101
        • SPRKSCL1102
        • SPRKSCL1103
        • SPRKSCL1104
        • SPRKSCL1105
        • SPRKSCL1106
        • SPRKSCL1107
        • SPRKSCL1108
        • SPRKSCL1109
        • SPRKSCL1110
        • SPRKSCL1111
        • SPRKSCL1112
        • SPRKSCL1113
        • SPRKSCL1114
        • SPRKSCL1115
        • SPRKSCL1116
        • SPRKSCL1117
        • SPRKSCL1118
        • SPRKSCL1119
        • SPRKSCL1120
        • SPRKSCL1121
        • SPRKSCL1122
        • SPRKSCL1123
        • SPRKSCL1124
        • SPRKSCL1125
        • SPRKSCL1126
        • SPRKSCL1127
        • SPRKSCL1128
        • SPRKSCL1129
        • SPRKSCL1130
        • SPRKSCL1131
        • SPRKSCL1132
        • SPRKSCL1133
        • SPRKSCL1134
        • SPRKSCL1135
        • SPRKSCL1136
        • SPRKSCL1137
        • SPRKSCL1138
        • SPRKSCL1139
        • SPRKSCL1140
        • SPRKSCL1141
        • SPRKSCL1142
        • SPRKSCL1143
        • SPRKSCL1144
        • SPRKSCL1145
        • SPRKSCL1146
        • SPRKSCL1147
        • SPRKSCL1148
        • SPRKSCL1149
        • SPRKSCL1150
        • SPRKSCL1151
        • SPRKSCL1152
        • SPRKSCL1153
        • SPRKSCL1154
        • SPRKSCL1155
        • SPRKSCL1156
        • SPRKSCL1157
        • SPRKSCL1158
        • SPRKSCL1159
        • SPRKSCL1160
        • SPRKSCL1161
        • SPRKSCL1162
        • SPRKSCL1163
        • SPRKSCL1164
        • SPRKSCL1165
        • SPRKSCL1166
        • SPRKSCL1167
        • SPRKSCL1168
        • SPRKSCL1169
        • SPRKSCL1170
        • SPRKSCL1171
        • SPRKSCL1172
        • SPRKSCL1173
        • SPRKSCL1174
        • SPRKSCL1175
      • SQL
        • SparkSQL
          • SPRKSPSQL1001
          • SPRKSPSQL1002
          • SPRKSPSQL1003
          • SPRKSPSQL1004
          • SPRKSPSQL1005
          • SPRKSPSQL1006
        • Hive
          • SPRKHVSQL1001
          • SPRKHVSQL1002
          • SPRKHVSQL1003
          • SPRKHVSQL1004
          • SPRKHVSQL1005
          • SPRKHVSQL1006
      • Pandas
        • PNDSPY1001
        • PNDSPY1002
        • PNDSPY1003
        • PNDSPY1004
      • DBX
        • SPRKDBX1001
    • Troubleshooting the Output Code
      • Locating Issues
    • Workarounds
    • Deploying the Output Code
  • Translation Reference
    • Translation Reference Overview
    • SIT Tagging
      • SQL statements
    • SQL Embedded code
    • HiveSQL
      • Supported functions
    • Spark SQL
      • Spark SQL DDL
        • Create Table
          • Using
      • Spark SQL DML
        • Merge
        • Select
          • Distinct
          • Values
          • Join
          • Where
          • Group By
          • Union
      • Spark SQL Data Types
      • Supported functions
  • Workspace Estimator
    • Overview
    • Getting Started
  • INTERACTIVE ASSESSMENT APPLICATION
    • Overview
    • Installation Guide
  • Support
    • General Troubleshooting
      • How do I give SMA permission to the config folder?
      • Invalid Access Code error on VDI
      • How do I give SMA permission to Documents, Desktop, and Downloads folders?
    • Frequently Asked Questions (FAQ)
      • Using SMA with Jupyter Notebooks
      • How to request an access code
      • Sharing the Output with Snowflake
      • DBC files explode
    • Glossary
    • Contact Us
Powered by GitBook
On this page
  • Description
  • Sample Source Patterns
  • Sample auxiliary data
  • MERGE Statement - Insert and Update Case
  • MERGE Statement - Delete Case
  • MERGE Statement - WHEN NOT MATCHED BY SOURCE
  • Known issues
  • Related EWIs
  1. Translation Reference
  2. Spark SQL
  3. Spark SQL DML

Merge

Is this about the zipper merge in traffic?

PreviousSpark SQL DMLNextSelect

Last updated 1 year ago

Description

The MERGE statement is used to select rows from one or more sources to update or insert into a target table. You can specify conditions to determine whether to update or insert into the target table. This statement is a convenient way to combine multiple operations. Allows you to avoid multiple INSERT, UPDATE, and DELETE DML statements. MERGE is a deterministic statement.

In Spark, the MERGE syntax is (For more information check ):

MERGE INTO target_table_name [target_alias]
   USING source_table_reference [source_alias]
   ON merge_condition
   { WHEN MATCHED [ AND matched_condition ] THEN matched_action |
     WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action |
     WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]

matched_action
 { DELETE |
   UPDATE SET * |
   UPDATE SET { column = { expr | DEFAULT } } [, ...] }

not_matched_action
 { INSERT * |
   INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )

not_matched_by_source_action
 { DELETE |
   UPDATE SET { column = { expr | DEFAULT } } [, ...] }
MERGE INTO <target_table> USING <source> ON <join_expr> { matchedClause | notMatchedClause } [ ... ]

matchedClause ::=
  WHEN MATCHED [ AND <case_predicate> ] THEN { UPDATE SET <col_name> = <expr> [ , <col_name2> = <expr2> ... ] | DELETE } [ ... ]
  
notMatchedClause ::=
   WHEN NOT MATCHED [ AND <case_predicate> ] THEN INSERT [ ( <col_name> [ , ... ] ) ] VALUES ( <expr> [ , ... ] )

The main difference is that Snowflake does not have an equivalence for the WHEN NOT MATCHED BY SOURCE clause, one workaround is needed for its equivalence.

Sample Source Patterns

Sample auxiliary data

This code was executed for a better understanding of the examples:

CREATE OR REPLACE people_source ( 
  person_id  INTEGER NOT NULL PRIMARY KEY, 
  first_name STRING NOT NULL, 
  last_name  STRING NOT NULL, 
  title      STRING NOT NULL,
);

CREATE OR REPLACE people_target ( 
  person_id  INTEGER NOT NULL PRIMARY KEY, 
  first_name STRING NOT NULL, 
  last_name  STRING NOT NULL, 
  title      STRING NOT NULL DEFAULT 'NONE'
);

INSERT INTO people_target VALUES (1, 'John', 'Smith', 'Mr');
INSERT INTO people_target VALUES (2, 'alice', 'jones', 'Mrs');
INSERT INTO people_source VALUES (2, 'Alice', 'Jones', 'Mrs.');
INSERT INTO people_source VALUES (3, 'Jane', 'Doe', 'Miss');
INSERT INTO people_source VALUES (4, 'Dave', 'Brown', 'Mr');
CREATE OR REPLACE TABLE people_source (
    person_id  INTEGER NOT NULL PRIMARY KEY,
    first_name VARCHAR(20) NOT NULL,
    last_name VARCHAR(20) NOT NULL,
    title VARCHAR(10) NOT NULL
);

CREATE OR REPLACE TABLE people_target (
    person_id  INTEGER NOT NULL PRIMARY KEY,
    first_name VARCHAR(20) NOT NULL,
    last_name VARCHAR(20) NOT NULL,
    title VARCHAR(10) NOT NULL DEFAULT 'NONE'
);

INSERT INTO people_target VALUES (1, 'John', 'Smith', 'Mr');
INSERT INTO people_target VALUES (2, 'alice', 'jones', 'Mrs');
INSERT INTO people_source VALUES (2, 'Alice', 'Jones', 'Mrs.');
INSERT INTO people_source VALUES (3, 'Jane', 'Doe', 'Miss');
INSERT INTO people_source VALUES (4, 'Dave', 'Brown', 'Mr');

MERGE Statement - Insert and Update Case

Spark

MERGE INTO people_target pt 
USING people_source ps 
ON    (pt.person_id = ps.person_id) 
WHEN MATCHED THEN UPDATE 
  SET pt.first_name = ps.first_name, 
      pt.last_name = ps.last_name, 
      pt.title = DEFAULT 
WHEN NOT MATCHED THEN INSERT 
  (pt.person_id, pt.first_name, pt.last_name, pt.title) 
  VALUES (ps.person_id, ps.first_name, ps.last_name, ps.title);

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        2|Alice     |Jones    |NONE |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

Snowflake

MERGE INTO people_target2 pt 
USING people_source ps 
ON    (pt.person_id = ps.person_id) 
WHEN MATCHED THEN UPDATE 
  SET pt.first_name = ps.first_name, 
      pt.last_name = ps.last_name, 
      pt.title = DEFAULT 
WHEN NOT MATCHED THEN INSERT 
  (pt.person_id, pt.first_name, pt.last_name, pt.title) 
  VALUES (ps.person_id, ps.first_name, ps.last_name, ps.title);

SELECT * FROM PUBLIC.people_target ORDER BY person_id;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        2|Alice     |Jones    |NONE |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

The INSERT and UPDATE actions are equivalently functional in Snowflake. Likewise in both languages, you can specify DEFAULT as expr to explicitly update the column to its default value.

Spark also has the option to insert and update without specifying the affected columns. In this case, the action is applied to all columns. This action assumes that the source table has the same columns as the destination table; otherwise, the query will return a parsing error.

UPDATE SET *
-- This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...]

INSERT * 
-- This is equivalent to INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...]

Snowflake does not have these options, so for these scenarios, the equivalent of listing all the columns of the target table is applied.

MERGE Statement - Delete Case

MERGE INTO people_target pt 
USING people_source ps 
ON    (pt.person_id = ps.person_id)
WHEN MATCHED AND pt.person_id < 3 THEN DELETE
WHEN NOT MATCHED BY TARGET THEN INSERT *;

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

Snowflake

MERGE INTO people_target pt 
USING people_source ps 
ON    (pt.person_id = ps.person_id)
WHEN MATCHED AND pt.person_id < 3 THEN DELETE
WHEN NOT MATCHED THEN INSERT 
  (pt.person_id, pt.first_name, pt.last_name, pt.title) 
  VALUES (ps.person_id, ps.first_name, ps.last_name, ps.title);

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        1|John      |Smith    |Mr   |
        3|Jane      |Doe      |Miss |
        4|Dave      |Brown    |Mr   |

The DELETE action is equivalently functional in Snowflake, as is the option to include an extra condition for the MATCHED/NOT MATCHED.

WHEN NOT MATCHED BY TARGET can be used as an alias for WHEN NOT MATCHED.

MERGE Statement - WHEN NOT MATCHED BY SOURCE

Snowflake does not support this clause, so a workaround is necessary to support it. Below is the workaround recommended for this case with the DELETE action, the same can be used for the UPDATE action

MERGE INTO people_target pt 
USING people_source ps 
ON pt.person_id = ps.person_id
WHEN NOT MATCHED BY SOURCE THEN DELETE;

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        2|Alice     |Jones    |NONE |

Snowflake

MERGE INTO people_target pt 
USING (
    SELECT 
        pt.person_id 
    FROM
        people_target pt LEFT 
    JOIN people_source ps ON pt.person_id = ps.person_id
    WHERE 
        ps.person_id is null
) s_src
    ON s_src.person_id = pt.person_id
WHEN MATCHED THEN DELETE;

SELECT * FROM people_target;
PERSON_ID|FIRST_NAME|LAST_NAME|TITLE|
---------+----------+---------+-----+
        2|Alice     |Jones    |NONE |

The DELETE action is equivalently functional in Snowflake, as is the option to include an extra condition for the MATCHED/NOT MATCHED.

Known issues

1. MERGE is very similar in both languages

Although Spark has some extra functionalities, these can be supported in Snowflake through workarounds as shown previously.

Related EWIs

No related EWIs.

In Snowflake, the MERGE syntax is (For more information check ):

WHEN NOT MATCHED BY SOURCE clauses are executed when a target row does not match any rows in the source table based on the merge_condition and not_match_by_source_condition (if applied) evaluates to true ().

Spark documentation
Snowflake documentation
Spark documentation