SPRKPY1068
pyspark.sql.DataFrame.toPandas
Message: toPandas contains columns of type ArrayType that is not supported and has a workaround.
Category: Warning
Description
pyspark.sql.DataFrame.toPandas doesn't work properly If there are columns of type ArrayType. The workaround for these cases is converting those columns into a Python Dictionary by using json.loads method.
Scenario
Input
ToPandas returns the data of the original DataFrame as a Pandas DataFrame.
sparkDF = spark.createDataFrame([
Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0))
])
pandasDF = sparkDF.toPandas()Output
The tool adds this EWI to let you know that toPandas is not supported If there are columns of type ArrayType, but has workaround.
sparkDF = spark.createDataFrame([
Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0))
])
#EWI: SPRKPY1068 => toPandas doesn't work properly If there are columns of type ArrayType. The workaround for these cases is converting those columns into a Python Dictionary by using json.loads method. example: df[colName] = json.loads(df[colName]).
pandasDF = sparkDF.toPandas()Recommended fix
Additional recommendations
For more support, you can email us at [email protected] or post an issue in the SMA.
Last updated
