SPRKSCL1102

org.apache.spark.sql.functions.explode

This issue code has been deprecated since Spark Conversion Core 2.3.22

Message:Explode is not supported

Category: Warning

Description

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.explode function, which is not supported by Snowpark.

Scenario

Input

Below is an example of the org.apache.spark.sql.functions.explode function used to get the consolidated information of the array fields of the dataset.

    val explodeData = Seq(
      Row("Cat", Array("Gato","Chat")),
      Row("Dog", Array("Perro","Chien")),
      Row("Bird", Array("Ave","Oiseau"))
    )

    val explodeSchema = StructType(
      List(
        StructField("Animal", StringType),
        StructField("Translation", ArrayType(StringType))
      )
    )

    val rddExplode = session.sparkContext.parallelize(explodeData)

    val dfExplode = session.createDataFrame(rddExplode, explodeSchema)

    dfExplode.select(explode(dfExplode("Translation").alias("exploded")))

Output

The SMA adds the EWI SPRKSCL1102 to the output code to let you know that this function is not supported by Snowpark.

    val explodeData = Seq(
      Row("Cat", Array("Gato","Chat")),
      Row("Dog", Array("Perro","Chien")),
      Row("Bird", Array("Ave","Oiseau"))
    )

    val explodeSchema = StructType(
      List(
        StructField("Animal", StringType),
        StructField("Translation", ArrayType(StringType))
      )
    )

    val rddExplode = session.sparkContext.parallelize(explodeData)

    val dfExplode = session.createDataFrame(rddExplode, explodeSchema)

    /*EWI: SPRKSCL1102 => Explode is not supported */
    dfExplode.select(explode(dfExplode("Translation").alias("exploded")))

Recommended Fix

Since explode is not supported by Snowpark, the function flatten could be used as a substitute.

The following fix creates flatten of the dfExplode dataframe, then makes the query to replicate the result in Spark.

    val explodeData = Seq(
      Row("Cat", Array("Gato","Chat")),
      Row("Dog", Array("Perro","Chien")),
      Row("Bird", Array("Ave","Oiseau"))
    )

    val explodeSchema = StructType(
      List(
        StructField("Animal", StringType),
        StructField("Translation", ArrayType(StringType))
      )
    )

    val rddExplode = session.sparkContext.parallelize(explodeData)

    val dfExplode = session.createDataFrame(rddExplode, explodeSchema)

     var dfFlatten = dfExplode.flatten(col("Translation")).alias("exploded")
                              .select(col("exploded.value").alias("Translation"))

Additional recommendations

Last updated