DataFrame Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
A distributed collection of data organized into named columns.
public sealed class DataFrame
type DataFrame = class
Public NotInheritable Class DataFrame
- Inheritance
-
DataFrame
Properties
Item[String] |
Selects column based on the column name. |
Methods
Agg(Column, Column[]) |
Aggregates on the entire |
Alias(String) |
Returns a new |
As(String) |
Returns a new |
Cache() |
Persist this DataFrame with the default storage level MEMORY_AND_DISK. |
Checkpoint(Boolean) |
Returns a checkpointed version of this |
Coalesce(Int32) |
Returns a new |
Col(String) |
Selects column based on the column name. |
Collect() |
Returns an array that contains all rows in this |
ColRegex(String) |
Selects column based on the column name specified as a regex. |
Columns() |
Returns all column names. |
Count() |
Returns the number of rows in the |
CreateGlobalTempView(String) |
Creates a global temporary view using the given name. The lifetime of this temporary view is tied to this Spark application. |
CreateOrReplaceGlobalTempView(String) |
Creates or replaces a global temporary view using the given name. The lifetime of this temporary view is tied to this Spark application. |
CreateOrReplaceTempView(String) |
Creates or replaces a local temporary view using the given name. The lifetime of this
temporary view is tied to the SparkSession that created this |
CreateTempView(String) |
Creates a local temporary view using the given name. The lifetime of this
temporary view is tied to the SparkSession that created this |
CrossJoin(DataFrame) |
Explicit Cartesian join with another |
Cube(Column[]) |
Create a multi-dimensional cube for the current |
Cube(String, String[]) |
Create a multi-dimensional cube for the current |
Describe(String[]) |
Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns. |
Distinct() |
Returns a new Dataset that contains only the unique rows from this |
Drop(Column) |
Returns a new |
Drop(String[]) |
Returns a new |
DropDuplicates() |
Returns a new |
DropDuplicates(String, String[]) |
Returns a new |
DTypes() |
Returns all column names and their data types as an IEnumerable of Tuples. |
Except(DataFrame) |
Returns a new |
ExceptAll(DataFrame) |
Returns a new |
Explain(Boolean) |
Prints the plans (logical and physical) to the console for debugging purposes. |
Explain(String) |
Prints the plans (logical and physical) with a format specified by a given explain mode. |
Filter(Column) |
Filters rows using the given condition. |
Filter(String) |
Filters rows using the given SQL expression. |
First() |
Returns the first row. Alis for Head(). |
GroupBy(Column[]) |
Groups the DataFrame using the specified columns, so we can run aggregation on them. |
GroupBy(String, String[]) |
Groups the DataFrame using the specified columns. |
Head() |
Returns the first row. |
Head(Int32) |
Returns the first |
Hint(String, Object[]) |
Specifies some hint on the current |
Intersect(DataFrame) |
Returns a new |
IntersectAll(DataFrame) |
Returns a new |
IsEmpty() |
Returns true if this DataFrame is empty. |
IsLocal() |
Returns true if the Collect() and Take() methods can be run locally without any Spark executors. |
IsStreaming() |
Returns true if this |
Join(DataFrame, Column, String) |
Join with another |
Join(DataFrame, IEnumerable<String>, String) |
Equi-join with another |
Join(DataFrame, String) |
Inner equi-join with another |
Join(DataFrame) |
Join with another |
Limit(Int32) |
Returns a new |
LocalCheckpoint(Boolean) |
Returns a locally checkpointed version of this |
Na() |
Returns a |
Observe(String, Column, Column[]) |
Define (named) metrics to observe on the Dataset. This method returns an 'observed' DataFrame that returns the same result as the input, with the following guarantees:
Please note that continuous execution is currently not supported. |
OrderBy(Column[]) |
Returns a new Dataset sorted by the given expressions. |
OrderBy(String, String[]) |
Returns a new Dataset sorted by the given expressions. |
Persist() |
Persist this DataFrame with the default storage level MEMORY_AND_DISK. |
Persist(StorageLevel) |
Persist this DataFrame with the given storage level. |
PrintSchema() |
Prints the schema to the console in a nice tree format. |
PrintSchema(Int32) |
Prints the schema up to the given level to the console in a nice tree format. |
RandomSplit(Double[], Nullable<Int64>) |
Randomly splits this |
Repartition(Column[]) |
Returns a new |
Repartition(Int32, Column[]) |
Returns a new |
Repartition(Int32) |
Returns a new |
RepartitionByRange(Column[]) |
Returns a new |
RepartitionByRange(Int32, Column[]) |
Returns a new |
Rollup(Column[]) |
Create a multi-dimensional rollup for the current |
Rollup(String, String[]) |
Create a multi-dimensional rollup for the current |
Sample(Double, Boolean, Nullable<Int64>) |
Returns a new |
Schema() |
Returns the schema associated with this |
Select(Column[]) |
Selects a set of column based expressions. |
Select(String, String[]) |
Selects a set of columns. This is a variant of Select() that can only select existing columns using column names (i.e. cannot construct expressions). |
SelectExpr(String[]) |
Selects a set of SQL expressions. This is a variant of Select() that accepts SQL expressions. |
Show(Int32, Int32, Boolean) |
Displays rows of the |
Sort(Column[]) |
Returns a new |
Sort(String, String[]) |
Returns a new |
SortWithinPartitions(Column[]) |
Returns a new |
SortWithinPartitions(String, String[]) |
Returns a new |
Stat() |
Returns a |
StorageLevel() |
Get the DataFrame's current StorageLevel(). |
Summary(String[]) |
Computes specified statistics for numeric and string columns. |
Tail(Int32) |
Returns the last |
Take(Int32) |
Returns the first |
ToDF() |
Converts this strongly typed collection of data to generic |
ToDF(String[]) |
Converts this strongly typed collection of data to generic |
ToJSON() |
Returns the content of the DataFrame as a DataFrame of JSON strings. |
ToLocalIterator() |
Returns an iterator that contains all of the rows in this |
ToLocalIterator(Boolean) |
Returns an iterator that contains all of the rows in this |
Transform(Func<DataFrame,DataFrame>) |
Concise syntax for chaining custom transformations. |
Union(DataFrame) |
Returns a new |
UnionByName(DataFrame) |
Returns a new |
Unpersist(Boolean) |
Mark the Dataset as non-persistent, and remove all blocks for it from memory and disk. |
Where(Column) |
Filters rows using the given condition. This is an alias for Filter(). |
Where(String) |
Filters rows using the given SQL expression. This is an alias for Filter(). |
WithColumn(String, Column) |
Returns a new |
WithColumnRenamed(String, String) |
Returns a new Dataset with a column renamed.
This is a no-op if schema doesn't contain |
WithWatermark(String, String) |
Defines an event time watermark for this DataFrame. A watermark tracks a point in time before which we assume no more late data is going to arrive. |
Write() |
Interface for saving the content of the non-streaming Dataset out into external storage. |
WriteStream() |
Interface for saving the content of the streaming Dataset out into external storage. |
WriteTo(String) |
Create a write configuration builder for v2 sources. |