Message: pyspark.sql.functions.udf without parameters or return type parameter are not supported
Category: Warning.
Description
This issue appears when the tool detects the usage of pyspark.sql.functions.udf as function or decorator and is not supported in two specifics cases, when it has no parameters or return type parameter.
Scenarios
Scenario 1
Input
In Pyspark you can create an User Defined Function without input or return type parameters:
Snowpark requires the input and return types for Udf function. Because they are not provided and SMA cannot this parameters.
from snowflake.snowpark import Session, DataFrameStatFunctionsfrom snowflake.snowpark.functions import col, udfspark = Session.builder.getOrCreate()spark.update_query_tag({"origin":"sf_sit","name":"sma","version":{"major":0,"minor":0,"patch":0},"attributes":{"language":"Python"}})
data = [['Q1','Test 1'], ['Q2','Test 2'], ['Q3','Test 1'], ['Q4','Test 1']]columns = ['Quadrant','Value']df = spark.createDataFrame(data, columns)#EWI: SPRKPY1073 => pyspark.sql.functions.udf function without the return type parameter is not supported. See documentation for more info.
my_udf =udf(lambdas: len(s))df.withColumn('Len Value' ,my_udf(col('Value')) ).show()
Recommended fix
To fix this scenario is required to add the import for the returns types of the input and output, and then the parameters of return_type and input_types[] on the udf function my_udf.
In Snowpark all the parameters of a udf decorator are required.
from snowflake.snowpark.functions import col, udfspark = Session.builder.getOrCreate()spark.update_query_tag({"origin":"sf_sit","name":"sma","version":{"major":0,"minor":0,"patch":0},"attributes":{"language":"Python"}})
data = [['Q1','Test 1'], ['Q2','Test 2'], ['Q3','Test 1'], ['Q4','Test 1']]columns = ['Quadrant','Value']df = spark.createDataFrame(data, columns)#EWI: SPRKPY1073 => pyspark.sql.functions.udf decorator without parameters is not supported. See documentation for more info.
@udf()defmy_udf(str):returnlen(str)df.withColumn('Len Value' ,my_udf(col('Value')) ).show()
Recommended fix
To fix this scenario is required to add the import for the returns types of the input and output, and then the parameters of return_type and input_types[] on the udf @udf decorator.