SPRKPY1063
pyspark.sql.pandas.functions.pandas_udf has workaround.
Description
This issue appears when the tool detects the usage of pyspark.sql.pandas.functions.pandas_udf which has a workaround.
Input code:
@pandas_udf(schema, PandasUDFType.GROUPED_MAP)
def modify_df(pdf):
return pd.DataFrame({'result': pdf['col1'] + pdf['col2'] + 1})
df = spark.createDataFrame([(1, 2), (3, 4), (1, 1)], ["col1", "col2"])
new_df = df.groupby().apply(modify_df)
Output code:
#EWI: SPRKPY1062 => pyspark.sql.pandas.functions.pandas_udf has a workaround, see documentation for more info
@pandas_udf(schema, PandasUDFType.GROUPED_MAP)
def modify_df(pdf):
return pd.DataFrame({'result': pdf['col1'] + pdf['col2'] + 1})
df = spark.createDataFrame([(1, 2), (3, 4), (1, 1)], ["col1", "col2"])
new_df = df.groupby().apply(modify_df)
Scenario
@pandas_udf(schema : str, PandasUDFType.GROUPED_MAP : int)
Action: Specify explicitly the parameters types as a new parameter 'input_types', and remove 'functionType' parameter if applies. Created function must be called inside a select statement.
Manual Adjustment:
@pandas_udf(
return_type = schema,
input_types = [PandasDataFrameType([IntegerType(), IntegerType()])]
)
def modify_df(pdf):
return pd.DataFrame({'result': pdf['col1'] + pdf['col2'] + 1})
df = spark.createDataFrame([(1, 2), (3, 4), (1, 1)], ["col1", "col2"])
new_df = df.groupby().apply(modify_df) # You must modify function call to be a select and not an apply
Recommendation
For more support, you can email us at [email protected]. If you have a contract for support with Snowflake, reach out to your sales engineer and they can direct your support needs.
Last updated