SPRKPY1011

pyspark.sql.dataframe.DataFrameStatFunctions.approxQuantile

Message: pyspark.sql.dataframe.DataFrameStatFunctions.approxQuantile has a workaround

Category: Warning.

Description

This issue appears when the tool detects the usage of pyspark.sql.dataframe.DataFrameStatFunctions.approxQuantile which has a workaround.

Scenario

Input

It's important understand that Pyspark uses two different approxQuantile functions, here we use the DataFrameStatFunctions approxQuantile version.

import tempfile
from pyspark.sql import SparkSession, DataFrameStatFunctions
spark = SparkSession.builder.getOrCreate()
data = [['Q1', 300000],
        ['Q2', 60000],
        ['Q3', 500002],
        ['Q4', 130000]]

columns = ['Quarter', 'Gain']
df = spark.createDataFrame(data, columns)
aprox_quantille = DataFrameStatFunctions(df).approxQuantile('Gain', [0.25, 0.5, 0.75], 0)
print(aprox_quantille)

Output

SMA returns the EWI SPRKPY1011 over the line where approxQuantile is used, so you can use to identify where to fix.

Recommended fix

You can use Snowpark approxQuantile method. Some parameters don't match so they require some manual adjustments. for the output code's example a recommended fix could be:

pyspark.sql.dataframe.DataFrame.approxQuantile's relativeError parameter does't exist in SnowPark.

Additional recommendations

Last updated