SPRKSCL1155

org.apache.spark.sql.functions.countDistinct

This issue code has been deprecated since Spark Conversion Core Version 4.3.2

Message: org.apache.spark.sql.functions.countDistinct has a workaround, see documentation for more info

Category: Warning

Description

This issue appears when the SMA detects a use of the org.apache.spark.sql.functions.countDistinct function, which has a workaround.

Scenario

Input

Below is an example of the org.apache.spark.sql.functions.countDistinct function, first used with column names as arguments and then with column objects.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2),
  ("Alice", 3),
  ("Bob", 4),
  ("Alice", 1),
  ("Charlie", 5)
).toDF("name", "value")

val result1 = df.select(countDistinct("name", "value"))
val result2 = df.select(countDistinct(col("name"), col("value")))

Output

The SMA adds the EWI SPRKSCL1155 to the output code to let you know that this function is not fully supported by Snowpark, but it has a workaround.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2),
  ("Alice", 3),
  ("Bob", 4),
  ("Alice", 1),
  ("Charlie", 5)
).toDF("name", "value")

/*EWI: SPRKSCL1155 => org.apache.spark.sql.functions.countDistinct has a workaround, see documentation for more info*/
val result1 = df.select(countDistinct("name", "value"))
/*EWI: SPRKSCL1155 => org.apache.spark.sql.functions.countDistinct has a workaround, see documentation for more info*/
val result2 = df.select(countDistinct(col("name"), col("value")))

Recommended fix

As a workaround, you can use the count_distinct function. For the Spark overload that receives string arguments, you additionally have to convert the strings into column objects using the com.snowflake.snowpark.functions.col function.

val df = Seq(
  ("Alice", 1),
  ("Bob", 2),
  ("Alice", 3),
  ("Bob", 4),
  ("Alice", 1),
  ("Charlie", 5)
).toDF("name", "value")

val result1 = df.select(count_distinct(col("name"), col("value")))
val result2 = df.select(count_distinct(col("name"), col("value")))

Additional recommendations

Last updated