LogoLogo
SnowflakeDocumentation Home
  • Snowpark Migration Accelerator Documentation
  • General
    • Introduction
    • Getting Started
      • Download and Access
      • Installation
        • Windows Installation
        • MacOS Installation
        • Linux Installation
    • Conversion Software Terms of Use
      • Open Source Libraries
    • Release Notes
      • Old Version Release Notes
        • SC Spark Scala Release Notes
          • Known Issues
        • SC Spark Python Release Notes
          • Known Issues
    • Roadmap
  • User Guide
    • Overview
    • Before Using the SMA
      • Supported Platforms
      • Supported Filetypes
      • Code Extraction
      • Pre-Processing Considerations
    • Project Overview
      • Project Setup
      • Configuration and Settings
      • Tool Execution
    • Assessment
      • How the Assessment Works
      • Assessment Quick Start
      • Understanding the Assessment Summary
      • Readiness Scores
      • Output Reports
        • Curated Reports
        • SMA Inventories
        • Generic Inventories
        • Assessment zip file
      • Output Logs
      • Spark Reference Categories
    • Conversion
      • How the Conversion Works
      • Conversion Quick Start
      • Conversion Setup
      • Understanding the Conversion Assessment and Reporting
      • Output Code
    • Using the SMA CLI
      • Additional Parameters
  • Use Cases
    • Assessment Walkthrough
      • Walkthrough Setup
        • Notes on Code Preparation
      • Running the Tool
      • Interpreting the Assessment Output
        • Assessment Output - In Application
        • Assessment Output - Reports Folder
      • Running the SMA Again
    • Conversion Walkthrough
    • Migration Lab
      • Compatibility and Assessment
      • Pipeline Conversion
      • Notebook Conversion
      • Conclusions
    • Sample Project
    • Using SMA with Docker
    • SMA CLI Walkthrough
    • SMA-Checkpoints Walkthrough
      • Prerequisites
      • SMA Execution Guide
        • Feature Settings
          • Default Settings
        • SMA-Checkpoints inventories
      • Snowpark-Checkpoints Execution Guide
        • Collection
        • Validation
  • Issue Analysis
    • Approach
    • Issue Code Categorization
    • Issue Codes by Source
      • General
      • Python
        • SPRKPY1000
        • SPRKPY1001
        • SPRKPY1002
        • SPRKPY1003
        • SPRKPY1004
        • SPRKPY1005
        • SPRKPY1006
        • SPRKPY1007
        • SPRKPY1008
        • SPRKPY1009
        • SPRKPY1010
        • SPRKPY1011
        • SPRKPY1012
        • SPRKPY1013
        • SPRKPY1014
        • SPRKPY1015
        • SPRKPY1016
        • SPRKPY1017
        • SPRKPY1018
        • SPRKPY1019
        • SPRKPY1020
        • SPRKPY1021
        • SPRKPY1022
        • SPRKPY1023
        • SPRKPY1024
        • SPRKPY1025
        • SPRKPY1026
        • SPRKPY1027
        • SPRKPY1028
        • SPRKPY1029
        • SPRKPY1030
        • SPRKPY1031
        • SPRKPY1032
        • SPRKPY1033
        • SPRKPY1034
        • SPRKPY1035
        • SPRKPY1036
        • SPRKPY1037
        • SPRKPY1038
        • SPRKPY1039
        • SPRKPY1040
        • SPRKPY1041
        • SPRKPY1042
        • SPRKPY1043
        • SPRKPY1044
        • SPRKPY1045
        • SPRKPY1046
        • SPRKPY1047
        • SPRKPY1048
        • SPRKPY1049
        • SPRKPY1050
        • SPRKPY1051
        • SPRKPY1052
        • SPRKPY1053
        • SPRKPY1054
        • SPRKPY1055
        • SPRKPY1056
        • SPRKPY1057
        • SPRKPY1058
        • SPRKPY1059
        • SPRKPY1060
        • SPRKPY1061
        • SPRKPY1062
        • SPRKPY1063
        • SPRKPY1064
        • SPRKPY1065
        • SPRKPY1066
        • SPRKPY1067
        • SPRKPY1068
        • SPRKPY1069
        • SPRKPY1070
        • SPRKPY1071
        • SPRKPY1072
        • SPRKPY1073
        • SPRKPY1074
        • SPRKPY1075
        • SPRKPY1076
        • SPRKPY1077
        • SPRKPY1078
        • SPRKPY1079
        • SPRKPY1080
        • SPRKPY1081
        • SPRKPY1082
        • SPRKPY1083
        • SPRKPY1084
        • SPRKPY1085
        • SPRKPY1086
        • SPRKPY1087
        • SPRKPY1088
        • SPRKPY1089
        • SPRKPY1091
        • SPRKPY1101
      • Spark Scala
        • SPRKSCL1000
        • SPRKSCL1001
        • SPRKSCL1002
        • SPRKSCL1100
        • SPRKSCL1101
        • SPRKSCL1102
        • SPRKSCL1103
        • SPRKSCL1104
        • SPRKSCL1105
        • SPRKSCL1106
        • SPRKSCL1107
        • SPRKSCL1108
        • SPRKSCL1109
        • SPRKSCL1110
        • SPRKSCL1111
        • SPRKSCL1112
        • SPRKSCL1113
        • SPRKSCL1114
        • SPRKSCL1115
        • SPRKSCL1116
        • SPRKSCL1117
        • SPRKSCL1118
        • SPRKSCL1119
        • SPRKSCL1120
        • SPRKSCL1121
        • SPRKSCL1122
        • SPRKSCL1123
        • SPRKSCL1124
        • SPRKSCL1125
        • SPRKSCL1126
        • SPRKSCL1127
        • SPRKSCL1128
        • SPRKSCL1129
        • SPRKSCL1130
        • SPRKSCL1131
        • SPRKSCL1132
        • SPRKSCL1133
        • SPRKSCL1134
        • SPRKSCL1135
        • SPRKSCL1136
        • SPRKSCL1137
        • SPRKSCL1138
        • SPRKSCL1139
        • SPRKSCL1140
        • SPRKSCL1141
        • SPRKSCL1142
        • SPRKSCL1143
        • SPRKSCL1144
        • SPRKSCL1145
        • SPRKSCL1146
        • SPRKSCL1147
        • SPRKSCL1148
        • SPRKSCL1149
        • SPRKSCL1150
        • SPRKSCL1151
        • SPRKSCL1152
        • SPRKSCL1153
        • SPRKSCL1154
        • SPRKSCL1155
        • SPRKSCL1156
        • SPRKSCL1157
        • SPRKSCL1158
        • SPRKSCL1159
        • SPRKSCL1160
        • SPRKSCL1161
        • SPRKSCL1162
        • SPRKSCL1163
        • SPRKSCL1164
        • SPRKSCL1165
        • SPRKSCL1166
        • SPRKSCL1167
        • SPRKSCL1168
        • SPRKSCL1169
        • SPRKSCL1170
        • SPRKSCL1171
        • SPRKSCL1172
        • SPRKSCL1173
        • SPRKSCL1174
        • SPRKSCL1175
      • SQL
        • SparkSQL
          • SPRKSPSQL1001
          • SPRKSPSQL1002
          • SPRKSPSQL1003
          • SPRKSPSQL1004
          • SPRKSPSQL1005
          • SPRKSPSQL1006
        • Hive
          • SPRKHVSQL1001
          • SPRKHVSQL1002
          • SPRKHVSQL1003
          • SPRKHVSQL1004
          • SPRKHVSQL1005
          • SPRKHVSQL1006
      • Pandas
        • PNDSPY1001
        • PNDSPY1002
        • PNDSPY1003
        • PNDSPY1004
      • DBX
        • SPRKDBX1000
        • SPRKDBX1001
        • SPRKDBX1002
        • SPRKDBX1003
    • Troubleshooting the Output Code
      • Locating Issues
    • Workarounds
    • Deploying the Output Code
  • Translation Reference
    • Translation Reference Overview
    • SIT Tagging
      • SQL statements
    • SQL Embedded code
    • HiveSQL
      • Supported functions
    • Spark SQL
      • Spark SQL DDL
        • Create Table
          • Using
      • Spark SQL DML
        • Merge
        • Select
          • Distinct
          • Values
          • Join
          • Where
          • Group By
          • Union
      • Spark SQL Data Types
      • Supported functions
    • DBX Notebook
      • Dbutils
        • dbutils.notebook.run
        • dbutils.notebook.exit
      • Magic commands
        • %run
  • Workspace Estimator
    • Overview
    • Getting Started
  • INTERACTIVE ASSESSMENT APPLICATION
    • Overview
    • Installation Guide
  • Support
    • General Troubleshooting
      • How do I give SMA permission to the config folder?
      • Invalid Access Code error on VDI
      • How do I give SMA permission to Documents, Desktop, and Downloads folders?
    • Frequently Asked Questions (FAQ)
      • Using SMA with Jupyter Notebooks
      • How to request an access code
      • Sharing the Output with Snowflake
      • DBC files explode
    • Glossary
    • Contact Us
Powered by GitBook
On this page
  • Description
  • Scenario
  • Additional recommendations
  1. Issue Analysis
  2. Issue Codes by Source
  3. Spark Scala

SPRKSCL1101

org.apache.spark.sql.functions.broadcast, org.apache.spark.broadcast

This issue code has been deprecated since Spark Conversion Core 2.3.22

Message: Broadcast is not supported

Category: Warning

Description

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.broadcast function, which is not supported by Snowpark. This function is not supported because Snowflake does not support broadcast variables.

Scenario

Input

Below is an example of the org.apache.spark.sql.functions.broadcast function used to create a broadcast object to use on each Spark cluster:

    var studentData = Seq(
      ("James", "Orozco", "Science"),
      ("Andrea", "Larson", "Bussiness"),
    )

    var collegeData = Seq(
      ("Arts", 1),
      ("Bussiness", 2),
      ("Science", 3)
    )

    val dfStudent = studentData.toDF("FirstName", "LastName", "CollegeName")
    val dfCollege = collegeData.toDF("CollegeName", "CollegeCode")

    dfStudent.join(
      broadcast(dfCollege),
      Seq("CollegeName")
    )

Output

The SMA adds the EWI SPRKSCL1101 to the output code to let you know that this function is not supported by Snowpark.

    var studentData = Seq(
      ("James", "Orozco", "Science"),
      ("Andrea", "Larson", "Bussiness"),
    )

    var collegeData = Seq(
      ("Arts", 1),
      ("Bussiness", 2),
      ("Science", 3)
    )

    val dfStudent = studentData.toDF("FirstName", "LastName", "CollegeName")
    val dfCollege = collegeData.toDF("CollegeName", "CollegeCode")

    dfStudent.join(
      /*EWI: SPRKSCL1101 => Broadcast is not supported*/
      broadcast(dfCollege),
      Seq("CollegeName")
    )

Recommended fix

Since Snowflake manages the storage and the workload on the clusters making broadcast objects inapplicable. This means that the use of broadcast could not be required at all, but each case should require further analysis.

The recommended approach is replace a Spark dataframe broadcast by a Snowpark regular dataframe or by using a dataframe method as Join.

For the proposed input the fix is to adapt the join to use directly the dataframe collegeDF without the use of broadcast for the dataframe.

    var studentData = Seq(
      ("James", "Orozco", "Science"),
      ("Andrea", "Larson", "Bussiness"),
    )

    var collegeData = Seq(
      ("Arts", 1),
      ("Bussiness", 2),
      ("Science", 3)
    )

    val dfStudent = studentData.toDF("FirstName", "LastName", "CollegeName")
    val dfCollege = collegeData.toDF("CollegeName", "CollegeCode")

    dfStudent.join(
      dfCollege,
      Seq("CollegeName")
    ).show()

Additional recommendations

  • The Snowflake's architecture guide provides insight about Snowflake storage management.

  • Snowpark Dataframe reference could be useful in how to adapt a particular broadcast scenario.

  • For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.

PreviousSPRKSCL1100NextSPRKSCL1102

Last updated 7 months ago