Below is an example of the org.apache.spark.sql.functions.explode function used to get the consolidated information of the array fields of the dataset.
val explodeData = Seq(
Row("Cat", Array("Gato","Chat")),
Row("Dog", Array("Perro","Chien")),
Row("Bird", Array("Ave","Oiseau"))
)
val explodeSchema = StructType(
List(
StructField("Animal", StringType),
StructField("Translation", ArrayType(StringType))
)
)
val rddExplode = session.sparkContext.parallelize(explodeData)
val dfExplode = session.createDataFrame(rddExplode, explodeSchema)
dfExplode.select(explode(dfExplode("Translation").alias("exploded")))
Output
The SMA adds the EWI SPRKSCL1102 to the output code to let you know that this function is not supported by Snowpark.
val explodeData = Seq(
Row("Cat", Array("Gato","Chat")),
Row("Dog", Array("Perro","Chien")),
Row("Bird", Array("Ave","Oiseau"))
)
val explodeSchema = StructType(
List(
StructField("Animal", StringType),
StructField("Translation", ArrayType(StringType))
)
)
val rddExplode = session.sparkContext.parallelize(explodeData)
val dfExplode = session.createDataFrame(rddExplode, explodeSchema)
/*EWI: SPRKSCL1102 => Explode is not supported */
dfExplode.select(explode(dfExplode("Translation").alias("exploded")))
Recommended Fix
Since explode is not supported by Snowpark, the function flatten could be used as a substitute.
The following fix creates flatten of the dfExplode dataframe, then makes the query to replicate the result in Spark.
val explodeData = Seq(
Row("Cat", Array("Gato","Chat")),
Row("Dog", Array("Perro","Chien")),
Row("Bird", Array("Ave","Oiseau"))
)
val explodeSchema = StructType(
List(
StructField("Animal", StringType),
StructField("Translation", ArrayType(StringType))
)
)
val rddExplode = session.sparkContext.parallelize(explodeData)
val dfExplode = session.createDataFrame(rddExplode, explodeSchema)
var dfFlatten = dfExplode.flatten(col("Translation")).alias("exploded")
.select(col("exploded.value").alias("Translation"))