SPRKPY1073

pyspark.sql.functions.udf

Message: pyspark.sql.functions.udf without parameters or return type parameter are not supported

Category: Warning.

Description

This issue appears when the tool detects the usage of pyspark.sql.functions.udf as function or decorator and is not supported in two specifics cases, when it has no parameters or return type parameter.

Scenarios

Scenario 1

Input

In Pyspark you can create an User Defined Function without input or return type parameters:

from pyspark.sql import SparkSession, DataFrameStatFunctions
from pyspark.sql.functions import col, udf

spark = SparkSession.builder.getOrCreate()
data = [['Q1', 'Test 1'],
        ['Q2', 'Test 2'],
        ['Q3', 'Test 1'],
        ['Q4', 'Test 1']]

columns = ['Quadrant', 'Value']
df = spark.createDataFrame(data, columns)

my_udf = udf(lambda s: len(s))
df.withColumn('Len Value' ,my_udf(col('Value')) ).show()

Output

Snowpark requires the input and return types for Udf function. Because they are not provided and SMA cannot this parameters.

Recommended fix

To fix this scenario is required to add the import for the returns types of the input and output, and then the parameters of return_type and input_types[] on the udf function my_udf.

Scenario 2

In PySpark you can use a @udf decorator without parameters

Input

Output

In Snowpark all the parameters of a udf decorator are required.

Recommended fix

To fix this scenario is required to add the import for the returns types of the input and output, and then the parameters of return_type and input_types[] on the udf @udf decorator.

Additional recommendations

Last updated