SparkSession Class

Definition

The entry point to programming Spark with the Dataset and DataFrame API.

public sealed class SparkSession : IDisposable
type SparkSession = class
    interface IDisposable
Public NotInheritable Class SparkSession
Implements IDisposable
Inheritance
SparkSession
Implements

Properties

Catalog

Interface through which the user may create, drop, alter or query underlying databases, tables, functions etc.

SparkContext

Returns SparkContext object associated with this SparkSession.

Methods

Active()

Returns the currently active SparkSession, otherwise the default one. If there is no default SparkSession, throws an exception.

Builder()

Creates a Builder object for SparkSession.

ClearActiveSession()

Clears the active SparkSession for current thread. Subsequent calls to GetOrCreate() will return the first created context instead of a thread-local override.

ClearDefaultSession()

Clears the default SparkSession that is returned by the builder.

Conf()

Runtime configuration interface for Spark. This is the interface through which the user can get and set all Spark and Hadoop configurations that are relevant to Spark SQL. When getting the value of a config, this defaults to the value set in the underlying SparkContext, if any.

CreateDataFrame(IEnumerable<Boolean>)

Creates a Dataframe given data as IEnumerable of type Boolean

CreateDataFrame(IEnumerable<Date>)

Creates a Dataframe given data as IEnumerable of type Date

CreateDataFrame(IEnumerable<Double>)

Creates a Dataframe given data as IEnumerable of type Double

CreateDataFrame(IEnumerable<GenericRow>, StructType)

Creates a DataFrame from an IEnumerable containing GenericRows using the given schema. It is important to make sure that the structure of every GenericRow of the provided IEnumerable matches the provided schema. Otherwise, there will be runtime exception.

CreateDataFrame(IEnumerable<Int32>)

Creates a Dataframe given data as IEnumerable of type Int32

CreateDataFrame(IEnumerable<Nullable<Boolean>>)

Creates a Dataframe given data as IEnumerable of type Nullable<T>

CreateDataFrame(IEnumerable<Nullable<Double>>)

Creates a Dataframe given data as IEnumerable of type Nullable<T>

CreateDataFrame(IEnumerable<Nullable<Int32>>)

Creates a Dataframe given data as IEnumerable of type Nullable<T>

CreateDataFrame(IEnumerable<String>)

Creates a Dataframe given data as IEnumerable of type String

CreateDataFrame(IEnumerable<Timestamp>)

Creates a Dataframe given data as IEnumerable of type Timestamp

Dispose()

Synonym for Stop().

ExecuteCommand(String, String, Dictionary<String,String>)

Execute an arbitrary string command inside an external execution engine rather than Spark. This could be useful when user wants to execute some commands out of Spark. For example, executing custom DDL/DML command for JDBC, creating index for ElasticSearch, creating cores for Solr and so on. The command will be eagerly executed after this method is called and the returned DataFrame will contain the output of the command(if any).

GetActiveSession()

Returns the active SparkSession for the current thread, returned by the builder.

GetDefaultSession()

Returns the default SparkSession that is returned by the builder.

NewSession()

Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but sharing the underlying SparkContext and cached data.

Range(Int64)

Creates a DataFrame with a single column named id, containing elements in a range from 0 to end (exclusive) with step value 1.

Range(Int64, Int64)

Creates a DataFrame with a single column named id, containing elements in a range from start to end (exclusive) with step value 1.

Range(Int64, Int64, Int64)

Creates a DataFrame with a single column named id, containing elements in a range from start to end (exclusive) with a step value.

Range(Int64, Int64, Int64, Int32)

Creates a DataFrame with a single column named id, containing elements in a range from start to end (exclusive) with a step value, with partition number specified.

Read()

Returns a DataFrameReader that can be used to read non-streaming data in as a DataFrame.

ReadStream()

Returns a DataStreamReader that can be used to read streaming data in as a DataFrame.

SetActiveSession(SparkSession)

Changes the SparkSession that will be returned in this thread when GetOrCreate() is called. This can be used to ensure that a given thread receives a SparkSession with an isolated session, instead of the global (first created) context.

SetDefaultSession(SparkSession)

Sets the default SparkSession that is returned by the builder.

Sql(String)

Executes a SQL query using Spark, returning the result as a DataFrame.

Stop()

Stops the underlying SparkContext.

Streams()

Returns a StreamingQueryManager that allows managing all the StreamingQuery instances active on this context.

Table(String)

Returns the specified table/view as a DataFrame.

Udf()

Returns UDFRegistraion object with which user-defined functions (UDF) can be registered.

Extension Methods

GetAssemblyInfo(SparkSession, Int32)

Get the Microsoft.Spark.Utils.AssemblyInfoProvider.AssemblyInfo for the "Microsoft.Spark" assembly running on the Spark Driver and make a "best effort" attempt in determining the Microsoft.Spark.Utils.AssemblyInfoProvider.AssemblyInfo of "Microsoft.Spark.Worker" assembly on the Spark Executors.

There is no guarantee that a Spark Executor will be run on all the nodes in a cluster. To increase the likelyhood, the spark conf spark.executor.instances and the numPartitions settings should be adjusted to a reasonable number relative to the number of nodes in the Spark cluster.

Applies to