SPRKSCL1100

org.apache.spark.sql.DataFrame.repartition

This issue code has been deprecated since Spark Conversion Core 2.3.22

Message: Repartition is not supported.

Category: Parsing error.

Description

This issue appears when the SMA detects a use of the org.apache.spark.sql.DataFrame.repartition function, which is not supported by Snowpark. Snowflake manages the storage and the workload on the clusters making repartition operation inapplicable.

Scenario

Input

Below is an example of the org.apache.spark.sql.DataFrame.repartition function used to return a new DataFrame partitioned by the given partitioning expressions.

    var nameData = Seq("James", "Sarah", "Dylan", "Leila, "Laura", "Peter")
    var jobData = Seq("Police", "Doctor", "Actor", "Teacher, "Dentist", "Fireman")
    var ageData = Seq(40, 38, 34, 27, 29, 55)

    val dfName = nameData.toDF("name")
    val dfJob = jobData.toDF("job")
    val dfAge = ageData.toDF("age")

    val dfRepartitionByExpresion = dfName.repartition($"name")

    val dfRepartitionByNumber = dfJob.repartition(3)

    val dfRepartitionByBoth = dfAge.repartition(3, $"age")

    val joinedDf = dfRepartitionByExpresion.join(dfRepartitionByNumber)

Output

The SMA adds the EWI SPRKSCL1100 to the output code to let you know that this function is not supported by Snowpark.

Recommended Fix

Since Snowflake manages the storage and the workload on the clusters making repartition operation inapplicable. This means that the use of repartition before the join is not required at all.

Additional recommendations

Last updated