Issue Analysis Issue Codes by Source Spark Scala SPRKSCL1102 org.apache.spark.sql.functions.explode
This issue code has been deprecated since Spark Conversion Core 2.3.22
Message:Explode is not supported
Category: Warning
Description
This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.explode function, which is not supported by Snowpark.
Scenario
Input
Below is an example of the org.apache.spark.sql.functions.explode
function used to get the consolidated information of the array fields of the dataset.
Copy val explodeData = Seq(
Row( "Cat" , Array( "Gato" , "Chat" )),
Row( "Dog" , Array( "Perro" , "Chien" )),
Row( "Bird" , Array( "Ave" , "Oiseau" ))
)
val explodeSchema = StructType(
List(
StructField( "Animal" , StringType),
StructField( "Translation" , ArrayType(StringType))
)
)
val rddExplode = session.sparkContext.parallelize(explodeData)
val dfExplode = session.createDataFrame(rddExplode, explodeSchema)
dfExplode.select(explode(dfExplode( "Translation" ).alias( "exploded" )))
Output
The SMA adds the EWI SPRKSCL1102
to the output code to let you know that this function is not supported by Snowpark.
Copy val explodeData = Seq(
Row( "Cat" , Array( "Gato" , "Chat" )),
Row( "Dog" , Array( "Perro" , "Chien" )),
Row( "Bird" , Array( "Ave" , "Oiseau" ))
)
val explodeSchema = StructType(
List(
StructField( "Animal" , StringType),
StructField( "Translation" , ArrayType(StringType))
)
)
val rddExplode = session.sparkContext.parallelize(explodeData)
val dfExplode = session.createDataFrame(rddExplode, explodeSchema)
/*EWI: SPRKSCL1102 => Explode is not supported */
dfExplode.select(explode(dfExplode( "Translation" ).alias( "exploded" )))
Recommended Fix
Since explode is not supported by Snowpark, the function flatten could be used as a substitute.
The following fix creates flatten of the dfExplode dataframe, then makes the query to replicate the result in Spark.
Copy val explodeData = Seq(
Row( "Cat" , Array( "Gato" , "Chat" )),
Row( "Dog" , Array( "Perro" , "Chien" )),
Row( "Bird" , Array( "Ave" , "Oiseau" ))
)
val explodeSchema = StructType(
List(
StructField( "Animal" , StringType),
StructField( "Translation" , ArrayType(StringType))
)
)
val rddExplode = session.sparkContext.parallelize(explodeData)
val dfExplode = session.createDataFrame(rddExplode, explodeSchema)
var dfFlatten = dfExplode.flatten(col( "Translation" )).alias( "exploded" )
.select(col( "exploded.value" ).alias( "Translation" ))
Additional recommendations