This issue appears when the SMA detects a use of the org.apache.spark.sql.DataFrame.repartition function, which is not supported by Snowpark. Snowflake manages the storage and the workload on the clusters making repartition operation inapplicable.
Scenario
Input
Below is an example of the org.apache.spark.sql.DataFrame.repartition function used to return a new DataFrame partitioned by the given partitioning expressions.
The SMA adds the EWI SPRKSCL1100 to the output code to let you know that this function is not supported by Snowpark.
var nameData = Seq("James", "Sarah", "Dylan", "Leila, "Laura", "Peter") var jobData = Seq("Police", "Doctor", "Actor", "Teacher, "Dentist", "Fireman")var ageData = Seq(40, 38, 34, 27, 29, 55)val dfName = nameData.toDF("name")val dfJob = jobData.toDF("job")val dfAge = ageData.toDF("age")/*EWI: SPRKSCL1100 => Repartition is not supported*/val dfRepartitionByExpresion = dfName.repartition($"name")/*EWI: SPRKSCL1100 => Repartition is not supported*/val dfRepartitionByNumber = dfJob.repartition(3)/*EWI: SPRKSCL1100 => Repartition is not supported*/val dfRepartitionByBoth = dfAge.repartition(3, $"age")val joinedDf = dfRepartitionByExpresion.join(dfRepartitionByNumber)
Recommended Fix
Since Snowflake manages the storage and the workload on the clusters making repartition operation inapplicable. This means that the use of repartition before the join is not required at all.