Share via


DataViewRowCursor Class

Definition

Class used to cursor through rows of an IDataView.

public abstract class DataViewRowCursor : Microsoft.ML.DataViewRow
type DataViewRowCursor = class
    inherit DataViewRow
Public MustInherit Class DataViewRowCursor
Inherits DataViewRow
Inheritance
DataViewRowCursor

Remarks

Note that this is also an DataViewRow. The Position is incremented by MoveNext(). Prior to the first call to MoveNext(), or after MoveNext() returns false, Position is -1. Otherwise, when MoveNext() returns true, Position >= 0.

Constructors

DataViewRowCursor()

Properties

Batch

This provides a means for reconciling multiple rows that have been produced generally from GetRowCursorSet(IEnumerable<DataViewSchema.Column>, Int32, Random). When getting a set, there is a need to, while allowing parallel processing to proceed, always have an aim that the original order should be recoverable. Note, whether or not a user cares about that original order in one's specific application is another story altogether (most callers of this as a practical matter do not, otherwise they would not call it), but at least in principle it should be possible to reconstruct the original order one would get from an identically configured GetRowCursor(IEnumerable<DataViewSchema.Column>, Random). So: for any cursor implementation, batch numbers should be non-decreasing. Furthermore, any given batch number should only appear in one of the cursors as returned by GetRowCursorSet(IEnumerable<DataViewSchema.Column>, Int32, Random). In this way, order is determined by batch number. An operation that reconciles these cursors to produce a consistent single cursoring, could do so by drawing from the single cursor, among all cursors in the set, that has the smallest batch number available.

Note that there is no suggestion that the batches for a particular entry will be consistent from cursoring to cursoring, except for the consistency in resulting in the same overall ordering. The same entry could have different batch numbers from one cursoring to another. There is also no requirement that any given batch number must appear, at all. It is merely a mechanism for recovering ordering from a possibly arbitrary partitioning of the data. It also follows from this, of course, that considering the batch to be a property of the data is completely invalid.

(Inherited from DataViewRow)
Position

This is incremented when the underlying contents changes, giving clients a way to detect change. It should be -1 when the object is in a state where values cannot be fetched. In particular, for an DataViewRowCursor, this will be before MoveNext() if ever called for the first time, or after the first time MoveNext() is called and returns false.

Note that this position is not position within the underlying data, but position of this cursor only. If one, for example, opened a set of parallel streaming cursors, or a shuffled cursor, each such cursor's first valid entry would always have position 0.

(Inherited from DataViewRow)
Schema

Gets a Schema, which provides name and type information for variables (i.e., columns in ML.NET's type system) stored in this row.

(Inherited from DataViewRow)

Methods

Dispose()

Implementation of dispose. Calls Dispose(Boolean) with true.

(Inherited from DataViewRow)
Dispose(Boolean)

The disposable method for the disposable pattern. This default implementation does nothing.

(Inherited from DataViewRow)
GetGetter<TValue>(DataViewSchema+Column)

Returns a value getter delegate to fetch the value of the given column, from the row. This throws if the column is not active in this row, or if the type TValue differs from this column's type.

(Inherited from DataViewRow)
GetIdGetter()

A getter for a 128-bit ID value. It is common for objects to serve multiple DataViewRow instances to iterate over what is supposed to be the same data, for example, in a IDataView a cursor set will produce the same data as a serial cursor, just partitioned, and a shuffled cursor will produce the same data as a serial cursor or any other shuffled cursor, only shuffled. The ID exists for applications that need to reconcile which entry is actually which. Ideally this ID should be unique, but for practical reasons, it suffices if collisions are simply extremely improbable.

Note that this ID, while it must be consistent for multiple streams according to the semantics above, is not considered part of the data per se. So, to take the example of a data view specifically, a single data view must render consistent IDs across all cursorings, but there is no suggestion at all that if the "same" data were presented in a different data view (as by, say, being transformed, cached, saved, or whatever), that the IDs between the two different data views would have any discernible relationship.

(Inherited from DataViewRow)
IsColumnActive(DataViewSchema+Column)

Returns whether the given column is active in this row.

(Inherited from DataViewRow)
MoveNext()

Advance to the next row. When the cursor is first created, this method should be called to move to the first row. Returns false if there are no more rows.

Applies to