SPRKSCL1130
org.apache.spark.sql.functions.greatest
Message: org.apache.spark.sql.functions.greatest has a workaround, see documentation for more info
Category: Warning
Description
This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.greatest function, which has a workaround.
Scenario
Input
Below is an example of the org.apache.spark.sql.functions.greatest
function, first used with multiple column names as arguments and then with multiple column objects.
val df = Seq(
("apple", 10, 20, 15),
("banana", 5, 25, 18),
("mango", 12, 8, 30)
).toDF("fruit", "value1", "value2", "value3")
val result1 = df.withColumn("greatest", greatest("value1", "value2", "value3"))
val result2 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))
Output
The SMA adds the EWI SPRKSCL1130
to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.
val df = Seq(
("apple", 10, 20, 15),
("banana", 5, 25, 18),
("mango", 12, 8, 30)
).toDF("fruit", "value1", "value2", "value3")
/*EWI: SPRKSCL1130 => org.apache.spark.sql.functions.greatest has a workaround, see documentation for more info*/
val result1 = df.withColumn("greatest", greatest("value1", "value2", "value3"))
/*EWI: SPRKSCL1130 => org.apache.spark.sql.functions.greatest has a workaround, see documentation for more info*/
val result2 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))
Recommended fix
Snowpark has an equivalent greatest function that receives multiple column objects as arguments. For that reason, the Spark overload that receives column objects as arguments is directly supported by Snowpark and does not require any changes.
For the overload that receives multiple string arguments, you can convert the strings into column objects using the com.snowflake.snowpark.functions.col function as a workaround.
val df = Seq(
("apple", 10, 20, 15),
("banana", 5, 25, 18),
("mango", 12, 8, 30)
).toDF("fruit", "value1", "value2", "value3")
val result1 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))
val result2 = df.withColumn("greatest", greatest(col("value1"), col("value2"), col("value3")))
Additional recommendations
For more support, you can email us at sma-support@snowflake.com or post an issue in the SMA.
Last updated