SPRKPY1009

pyspark.sql.dataframe.DataFrame.approxQuantile

Message: pyspark.sql.dataframe.DataFrame.approxQuantile has a workaround

Category: Warning.

Description

This issue appears when the tool detects the usage of pyspark.sql.dataframe.DataFrame.approxQuantile which has a workaround.

Scenario

Input

It's important understand that Pyspark uses two different approxQuantile functions, here we use the DataFrame approxQuantile version

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [['Sun', 10],
        ['Mon', 64],
        ['Thr', 12],
        ['Wen', 15],
        ['Thu', 68],
        ['Fri', 14],
        ['Sat', 13]]

columns = ['Day', 'Ammount']
df = spark.createDataFrame(data, columns)
df.approxQuantile('Ammount', [0.25, 0.5, 0.75], 0)

Output

SMA returns the EWI SPRKPY1009 over the line where approxQuantile is used, so you can use to identify where to fix.

Recommended fix

Use Snowpark approxQuantile method. Some parameters don't match so they require some manual adjustments. for the output code's example a recommended fix could be:

pyspark.sql.dataframe.DataFrame.approxQuantile's relativeError parameter does't exist in SnowPark.

Additional recommendations

Last updated