Dataset
org.apache.spark.sql.Dataset[T] => com.snowflake.snowpark.DataFrame
This section describes the mappings from org.apache.spark.sql.Dataset[T] to com.snowflake.snowpark.DataFrame. These methods are mapped to the DataFrame class since there is no Dataset class available in Snowpark.
Spark | Snowpark | Notes |
---|---|---|
cache() | Cache is an alias for persist. | |
dropDuplicates() | ||
dropDuplicates(Seq colNames) | ||
dropDuplicates(String[] colNames) | ||
dropDuplicates(String col1, Seq cols) | ||
dropDuplicates(String col1, String... cols) | ||
filter(Column condition) | Mapped to method in com.snowflake.snowpark.DataFrame | |
orderBy(Column... sortExprs) | ||
orderBy(Seq[Column] sortExprs) | ||
orderBy(String sortCol, Seq[String] sortCols) | * | |
orderBy(String sortCol, String... sortCols) | * | |
Persist() | ||
Persist(newLevel: StorageLevel) | ||
repartition(partitionExprs: Column*) | N/A | Repartition is a Spark concept that is not needed in Snowpark |
repartition(numPartitions: Int, partitionExprs: Column*) | N/A | Repartition is a Spark concept that is not needed in Snowpark |
repartition(numPartitions: Int) | N/A | Repartition is a Spark concept that is not needed in Snowpark |
repartitionByRange(*cols: Column*): DataFrame | N/A | Repartition by range is a Spark concept that is not needed in Snowpark |
repartitionByRange(numPartitions: int, *cols: Column*): DataFrame | N/A | Repartition by range is a Spark concept that is not needed in Snowpark |
transform(scala.Function1<Dataset,Dataset> t) | * | |
unionByName(Dataset other) | unionByName ( other: DataFrame ) : DataFrame | Pending: Functional comparison |
unionByName(Dataset other, boolean allowMissingColumns) | ** | |
withColumn(String colName, Column col) | ||
withColumnRenamed |
Last updated