IForeachWriter Interface

Reference

Definition

Namespace:: Microsoft.Spark.Sql

Assembly:: Microsoft.Spark.dll

Package:: Microsoft.Spark v1.0.0

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Interface for writing custom logic to process data generated by a query. This is often used to write the output of a streaming query to arbitrary storage systems.

public interface IForeachWriter

type IForeachWriter = interface

Public Interface IForeachWriter

Remarks

Any implementation of this interface will be used by Spark in the following way:

A single instance of this class is responsible of all the data generated by a single task in a query. In other words, one instance is responsible for processing one partition of the data generated in a distributed manner.
Any implementation of this class must be SerializableAttribute because each task will get a fresh serialized-deserialized copy of the provided object. Hence, it is strongly recommended that any initialization for writing data (e.g.opening a connection or starting a transaction) is done after the Open(Int64, Int64) method has been called, which signifies that the task is ready to generate data.
The lifecycle of the methods are as follows:

Important points to note:

The partitionId and epochId can be used to deduplicate generated data when failures cause reprocessing of some input data. This depends on the execution mode of the query. If the streaming query is being executed in the micro-batch mode, then every partition represented by a unique tuple(partition_id, epoch_id) is guaranteed to have the same data. Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit data and achieve exactly-once guarantees. However, if the streaming query is being executed in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication.

Methods

Close(Exception)

Called when stopping to process one partition of new data in the executor side. This is guaranteed to be called either Open(Int64, Int64) returns true or false. However, Close(Exception) won't be called in the following cases:

CLR/JVM crashes without throwing a Exception.
Open(Int64, Int64) throws an Exception.

Open(Int64, Int64)

Called when starting to process one partition of new data in the executor.

Process(Row)

Called to process each Row in the executor side. This method will be called only if Open returns true.

Applies to

Share via

IForeachWriter Interface

Definition

Remarks

Methods

Applies to

Additional resources