Share via


Functions Class

Definition

Functions available for DataFrame operations.

public static class Functions
type Functions = class
Public Class Functions
Inheritance
Functions

Methods

Abs(Column)

Computes the absolute value.

Acos(Column)

Inverse cosine of column in radians, as if computed by java.lang.Math.acos.

Acos(String)

Inverse cosine of columnName in radians, as if computed by java.lang.Math.acos.

AddMonths(Column, Column)

Returns the date that is numMonths after startDate.

AddMonths(Column, Int32)

Returns the date that is numMonths after startDate.

ApproxCountDistinct(Column)

Returns the approximate number of distinct items in a group.

ApproxCountDistinct(Column, Double)

Returns the approximate number of distinct items in a group.

ApproxCountDistinct(String)

Returns the approximate number of distinct items in a group.

ApproxCountDistinct(String, Double)

Returns the approximate number of distinct items in a group.

Array(Column[])

Creates a new array column. The input columns must all have the same data type.

Array(String, String[])

Creates a new array column. The input columns must all have the same data type.

ArrayContains(Column, Object)

Returns null if the array is null, true if the array contains value, and false otherwise.

ArrayDistinct(Column)

Removes duplicate values from the array.

ArrayExcept(Column, Column)

Returns an array of the elements in the col1 but not in the col2, without duplicates. The order of elements in the result is nondeterministic.

ArrayIntersect(Column, Column)

Returns an array of the elements in the intersection of the given two arrays, without duplicates.

ArrayJoin(Column, String)

Concatenates the elements of column using the delimiter.

ArrayJoin(Column, String, String)

Concatenates the elements of column using the delimiter. Null values are replaced with nullReplacement.

ArrayMax(Column)

Returns the maximum value in the array.

ArrayMin(Column)

Returns the minimum value in the array.

ArrayPosition(Column, Object)

Locates the position of the first occurrence of the value in the given array as long. Returns null if either of the arguments are null.

ArrayRemove(Column, Object)

Remove all elements that equal to element from the given array.

ArrayRepeat(Column, Column)

Creates an array containing the left argument repeated the number of times given by the right argument.

ArrayRepeat(Column, Int32)

Creates an array containing the left argument repeated the count number of times.

ArraySort(Column)

Sorts the input array in ascending order. The elements of the input array must be sortable. Null elements will be placed at the end of the returned array.

ArraysOverlap(Column, Column)

Returns true if a1 and a2 have at least one non-null element in common. If not and both arrays are non-empty and any of them contains a null, it returns null. It returns false otherwise.

ArraysZip(Column[])

Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

ArrayUnion(Column, Column)

Returns an array of the elements in the union of the given two arrays, without duplicates.

Asc(String)

Returns a sort expression based on the ascending order of the column.

Ascii(Column)

Computes the numeric value of the first character of the string column, and returns the result as an int column.

AscNullsFirst(String)

Returns a sort expression based on the ascending order of the column, and null values return before non-null values.

AscNullsLast(String)

Returns a sort expression based on the ascending order of the column, and null values appear after non-null values.

Asin(Column)

Inverse sine of column in radians, as if computed by java.lang.Math.asin.

Asin(String)

Inverse sine of columnName in radians, as if computed by java.lang.Math.asin.

Atan(Column)

Inverse tangent of column in radians, as if computed by java.lang.Math.atan.

Atan(String)

Inverse tangent of columnName in radians, as if computed by java.lang.Math.atan.

Atan2(Column, Column)

Computes atan2 for the given x and y.

Atan2(Column, Double)

Computes atan2 for the given x and y.

Atan2(Column, String)

Computes atan2 for the given x and y.

Atan2(Double, Column)

Computes atan2 for the given x and y.

Atan2(Double, String)

Computes atan2 for the given x and y.

Atan2(String, Column)

Computes atan2 for the given x and y.

Atan2(String, Double)

Computes atan2 for the given x and y.

Atan2(String, String)

Computes atan2 for the given x and y.

Avg(Column)

Returns the average of the values in a group.

Avg(String)

Returns the average of the values in a group.

Base64(Column)

Computes the BASE64 encoding of a binary column and returns it as a string column.

Bin(Column)

An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

Bin(String)

An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

BitwiseNOT(Column)

Computes bitwise NOT.

Broadcast(DataFrame)

Marks a DataFrame as small enough for use in broadcast joins.

Bround(Column)

Returns the value of the column rounded to 0 decimal places with HALF_EVEN round mode.

Bround(Column, Int32)

Returns the value of the column rounded to scale decimal places with HALF_EVEN round mode.

Bucket(Column, Column)

A transform for any type that partitions by a hash of the input column.

Bucket(Int32, Column)

A transform for any type that partitions by a hash of the input column.

CallUDF(String, Column[])

Call an user-defined function registered via SparkSession.Udf().Register().

Cbrt(Column)

Computes the cube-root of the given column.

Cbrt(String)

Computes the cube-root of the given column.

Ceil(Column)

Computes the ceiling of the given value.

Ceil(String)

Computes the ceiling of the given value.

Coalesce(Column[])

Returns the first column that is not null, or null if all inputs are null.

Col(String)

Returns a Column based on the given column name. Alias for Column().

CollectList(Column)

Returns a list of objects with duplicates.

CollectList(String)

Returns a list of objects with duplicates.

CollectSet(Column)

Returns a set of objects with duplicate elements eliminated.

CollectSet(String)

Returns a set of objects with duplicate elements eliminated.

Column(String)

Returns a Column based on the given column name.

Concat(Column[])

Concatenates multiple input columns together into a single column.

ConcatWs(String, Column[])

Concatenates multiple input string columns together into a single string column, using the given separator.

Conv(Column, Int32, Int32)

Convert a number in a string column from one base to another.

Corr(Column, Column)

Returns the Pearson Correlation Coefficient for two columns.

Corr(String, String)

Returns the Pearson Correlation Coefficient for two columns.

Cos(Column)

Computes cosine of the angle, as if computed by java.lang.Math.cos

Cos(String)

Computes cosine of the angle, as if computed by java.lang.Math.cos

Cosh(Column)

Computes hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

Cosh(String)

Computes hyperbolic cosine of the angle, as if computed by java.lang.Math.cosh

Count(Column)

Returns the number of items in a group.

Count(String)

Returns the number of items in a group.

CountDistinct(Column, Column[])

Returns the number of distinct items in a group.

CountDistinct(String, String[])

Returns the number of distinct items in a group.

CovarPop(Column, Column)

Returns the population covariance for two columns.

CovarPop(String, String)

Returns the population covariance for two columns.

CovarSamp(Column, Column)

Returns the sample covariance for two columns.

CovarSamp(String, String)

Returns the sample covariance for two columns.

Crc32(Column)

Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

CumeDist()

Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

CurrentDate()

Returns the current date as a date column.

CurrentRow()

Window function: returns the special frame boundary that represents the current row in the window partition.

CurrentTimestamp()

Returns the current timestamp as a timestamp column.

DateAdd(Column, Column)

Returns the date that is days days after start.

DateAdd(Column, Int32)

Returns the date that is days days after start.

DateDiff(Column, Column)

Returns the number of days from start to end.

DateFormat(Column, String)

Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

DateSub(Column, Column)

Returns the date that is days days before start.

DateSub(Column, Int32)

Returns the date that is days days before start.

DateTrunc(String, Column)

Returns timestamp truncated to the unit specified by the format.

DayOfMonth(Column)

Extracts the day of the month as an integer from a given date/timestamp/string.

DayOfWeek(Column)

Extracts the day of the week as an integer from a given date/timestamp/string.

DayOfYear(Column)

Extracts the day of the year as an integer from a given date/timestamp/string.

Days(Column)

A transform for timestamps and dates to partition data into days.

Decode(Column, String)

Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16')

Degrees(Column)

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

Degrees(String)

Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

DenseRank()

Window function: returns the rank of rows within a window partition, without any gaps.

Desc(String)

Returns a sort expression based on the descending order of the column.

DescNullsFirst(String)

Returns a sort expression based on the descending order of the column, and null values return before non-null values.

DescNullsLast(String)

Returns a sort expression based on the descending order of the column, and null values appear after non-null values.

ElementAt(Column, Object)

Returns element of array at given index in value if column is array. Returns value for the given key in value if column is map.

Encode(Column, String)

Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16')

Exp(Column)

Computes the exponential of the given value.

Exp(String)

Computes the exponential of the given value.

Explode(Column)

Creates a new row for each element in the given array or map column.

ExplodeOuter(Column)

Creates a new row for each element in the given array or map column. Unlike Explode(), if the array/map is null or empty then null is produced.

Expm1(Column)

Computes the exponential of the given value minus one.

Expm1(String)

Computes the exponential of the given value minus one.

Expr(String)

Parses the expression string into the column that it represents.

Factorial(Column)

Computes the factorial of the given value.

First(Column, Boolean)

Returns the first value of a column in a group.

First(String, Boolean)

Returns the first value of a column in a group.

Flatten(Column)

Creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

Floor(Column)

Computes the floor of the given value.

Floor(String)

Computes the floor of the given value.

FormatNumber(Column, Int32)

Formats the given numeric column to a format like '#,###,###.##', rounded to the given d decimal places with HALF_EVEN round mode, and returns the result as a string column.

FormatString(String, Column[])

Formats the arguments in printf-style and returns the result as a string column.

FromCsv(Column, Column, Dictionary<String,String>)

Parses a column containing a CSV string into a StructType with the specified schema.

FromCsv(Column, StructType, Dictionary<String,String>)

Parses a column containing a CSV string into a StructType with the specified schema.

FromJson(Column, Column, Dictionary<String,String>)

Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema.

FromJson(Column, String, Dictionary<String,String>)

Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema.

FromUnixTime(Column)

Converts the number of seconds from UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone with a default format "yyyy-MM-dd HH:mm:ss".

FromUnixTime(Column, String)

Converts the number of seconds from UNIX epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone with the given format.

FromUtcTimestamp(Column, Column)

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

FromUtcTimestamp(Column, String)

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.

GetJsonObject(Column, String)

Extracts JSON object from a JSON string based on path specified, and returns JSON string of the extracted JSON object.

Greatest(Column[])

Returns the greatest value of the list of values, skipping null values.

Greatest(String, String[])

Returns the greatest value of the list of column names, skipping null values.

Grouping(Column)

Indicates whether a specified column in a GROUP BY list is aggregated or not, returning 1 for aggregated or 0 for not aggregated in the result set.

Grouping(String)

Indicates whether a specified column in a GROUP BY list is aggregated or not, returning 1 for aggregated or 0 for not aggregated in the result set.

GroupingId(Column[])

Returns the number of distinct items in a group.

GroupingId(String, String[])

Returns the number of distinct items in a group.

Hash(Column[])

Calculates the hash code of given columns, and returns the result as an int column.

Hex(Column)

Computes hex value of the given column.

Hour(Column)

Extracts the hours as an integer from a given date/timestamp/string.

Hours(Column)

A transform for timestamps to partition data into hours.

Hypot(Column, Column)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(Column, Double)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(Column, String)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(Double, Column)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(Double, String)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(String, Column)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(String, Double)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

Hypot(String, String)

Computes sqrt(a^2^ + b^2^) without intermediate overflow or underflow.

InitCap(Column)

Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

InputFileName()

Creates a string column for the file name of the current Spark task.

Instr(Column, String)

Locate the position of the first occurrence of the given substring.

IsNaN(Column)

Return true iff the column is NaN.

IsNull(Column)

Return true iff the column is null.

JsonTuple(Column, String[])

Creates a new row for a JSON column according to the given field names.

Kurtosis(Column)

Returns the kurtosis of the values in a group.

Kurtosis(String)

Returns the kurtosis of the values in a group.

Lag(Column, Int32, Object)

Window function: returns the value that is 'offset' rows before the current row, and null if there is less than 'offset' rows before the current row. For example, an 'offset' of one will return the previous row at any given point in the window partition.

Lag(String, Int32, Object)

Window function: returns the value that is 'offset' rows before the current row, and null if there is less than 'offset' rows before the current row. For example, an 'offset' of one will return the previous row at any given point in the window partition.

Last(Column, Boolean)

Returns the last value of a column in a group.

Last(String, Boolean)

Returns the last value of a column in a group.

LastDay(Column)

Returns the last day of the month which the given date belongs to.

Lead(Column, Int32, Object)

Window function: returns the value that is 'offset' rows after the current row, and null if there is less than 'offset' rows after the current row. For example, an 'offset' of one will return the next row at any given point in the window partition.

Lead(String, Int32, Object)

Window function: returns the value that is 'offset' rows after the current row, and null if there is less than 'offset' rows after the current row. For example, an 'offset' of one will return the next row at any given point in the window partition.

Least(Column[])

Returns the least value of the list of values, skipping null values.

Least(String, String[])

Returns the least value of the list of values, skipping null values.

Length(Column)

Computes the character length of a given string or number of bytes of a binary string.

Levenshtein(Column, Column)

Computes the Levenshtein distance of the two given string columns.

Lit(Object)

Creates a Column of literal value.

Locate(String, Column)

Locate the position of the first occurrence of the given substring.

Locate(String, Column, Int32)

Locate the position of the first occurrence of the given substring starting from the given position offset.

Log(Column)

Computes the natural logarithm of the given value.

Log(Double, Column)

Computes the first argument-base logarithm of the second argument.

Log(Double, String)

Computes the first argument-base logarithm of the second argument.

Log(String)

Computes the natural logarithm of the given value.

Log10(Column)

Computes the logarithm of the given value in base 10.

Log10(String)

Computes the logarithm of the given value in base 10.

Log1p(Column)

Computes the natural logarithm of the given value plus one.

Log1p(String)

Computes the natural logarithm of the given value plus one.

Log2(Column)

Computes the logarithm of the given column in base 2.

Log2(String)

Computes the logarithm of the given column in base 2.

Lower(Column)

Converts a string column to lower case.

Lpad(Column, Int32, String)

Left-pad the string column with pad to the given length len. If the string column is longer than len, the return value is shortened to len characters.

Ltrim(Column)

Trim the spaces from left end for the given string column.

Ltrim(Column, String)

Trim the specified character string from left end for the given string column.

Map(Column[])

Creates a new map column.

MapConcat(Column[])

Returns the union of all the given maps.

MapEntries(Column)

Returns an unordered array of all entries in the given map.

MapFromArrays(Column, Column)

Creates a new map column. The array in the first column is used for keys. The array in the second column is used for values. All elements in the array for key should not be null.

MapFromEntries(Column)

Returns a map created from the given array of entries.

MapKeys(Column)

Returns an unordered array containing the keys of the map.

MapValues(Column)

Returns an unordered array containing the values of the map.

Max(Column)

Returns the maximum value of the column in a group.

Max(String)

Returns the maximum value of the column in a group.

Md5(Column)

Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

Mean(Column)

Returns the average value of the column in a group.

Mean(String)

Returns the average value of the column in a group.

Min(Column)

Returns the minimum value of the column in a group.

Min(String)

Returns the minimum value of the column in a group.

Minute(Column)

Extracts the minutes as an integer from a given date/timestamp/string.

MonotonicallyIncreasingId()

A column expression that generates monotonically increasing 64-bit integers.

Month(Column)

Extracts the month as an integer from a given date/timestamp/string.

Months(Column)

A transform for timestamps and dates to partition data into months.

MonthsBetween(Column, Column)

Returns number of months between dates end and stasrt.

MonthsBetween(Column, Column, Boolean)

Returns number of months between dates end and start. If roundOff is set to true, the result is rounded off to 8 digits; it is not rounded otherwise.

NaNvl(Column, Column)

Returns col1 if it is not NaN, or col2 if col1 is NaN.

Negate(Column)

Unary minus, i.e. negate the expression.

NextDay(Column, String)

Given a date column, returns the first date which is later than the value of the date column that is on the specified day of the week.

Not(Column)

Inversion of boolean expression, i.e. NOT.

Ntile(Int32)

Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

Overlay(Column, Column, Column)

Overlay the specified portion of src with replace, starting from byte position pos of src.

Overlay(Column, Column, Column, Column)

Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.

PercentRank()

Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

Pmod(Column, Column)

Returns the positive value of dividend mod divisor.

PosExplode(Column)

Creates a new row for each element with position in the given array or map column.

PosExplodeOuter(Column)

Creates a new row for each element with position in the given array or map column. Unlike Posexplode(), if the array/map is null or empty then the row(null, null) is produced.

Pow(Column, Column)

Returns the value of the first argument raised to the power of the second argument.

Pow(Column, Double)

Returns the value of the first argument raised to the power of the second argument.

Pow(Column, String)

Returns the value of the first argument raised to the power of the second argument.

Pow(Double, Column)

Returns the value of the first argument raised to the power of the second argument.

Pow(Double, String)

Returns the value of the first argument raised to the power of the second argument.

Pow(String, Column)

Returns the value of the first argument raised to the power of the second argument.

Pow(String, Double)

Returns the value of the first argument raised to the power of the second argument.

Pow(String, String)

Returns the value of the first argument raised to the power of the second argument.

Quarter(Column)

Extracts the quarter as an integer from a given date/timestamp/string.

Radians(Column)

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

Radians(String)

Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

Rand()

Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].

Rand(Int64)

Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].

Randn()

Generate a random column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

Randn(Int64)

Generate a random column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

Rank()

Window function: returns the rank of rows within a window partition.

RegexpExtract(Column, String, Int32)

Extract a specific group matched by a Java regex, from the specified string column.

RegexpReplace(Column, Column, Column)

Replace all substrings of the specified string value that match the pattern with the given replacement string.

RegexpReplace(Column, String, String)

Replace all substrings of the specified string value that match the pattern with the given replacement string.

Repeat(Column, Int32)

Repeats a string column n times, and returns it as a new string column.

Reverse(Column)

Reverses the string column and returns it as a new string column.

Rint(Column)

Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

Rint(String)

Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

Round(Column)

Returns the value of the column rounded to 0 decimal places with HALF_UP round mode.

Round(Column, Int32)

Returns the value of the column rounded to scale decimal places with HALF_UP round mode.

RowNumber()

Window function: returns a sequential number starting at 1 within a window partition.

Rpad(Column, Int32, String)

Right-pad the string column with pad to the given length len. If the string column is longer than len, the return value is shortened to len characters.

Rtrim(Column)

Trim the spaces from right end for the specified string value.

Rtrim(Column, String)

Trim the specified character string from right end for the given string column.

SchemaOfCsv(Column)

Parses a CSV string and infers its schema in DDL format.

SchemaOfCsv(Column, Dictionary<String,String>)

Parses a CSV string and infers its schema in DDL format.

SchemaOfCsv(String)

Parses a CSV string and infers its schema in DDL format.

SchemaOfJson(Column)

Parses a JSON string and infers its schema in DDL format.

SchemaOfJson(Column, Dictionary<String,String>)

Parses a JSON string and infers its schema in DDL format.

SchemaOfJson(String)

Parses a JSON string and infers its schema in DDL format.

Second(Column)

Extracts the seconds as an integer from a given date/timestamp/string.

Sequence(Column, Column)

Generate a sequence of integers from start to stop, incrementing by 1 if start is less than or equal to stop, otherwise -1.

Sequence(Column, Column, Column)

Generate a sequence of integers from start to stop, incrementing by step.

Sha1(Column)

Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

Sha2(Column, Int32)

Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

ShiftLeft(Column, Int32)

Shift the given value numBits left.

ShiftRight(Column, Int32)

(Signed) shift the given value numBits right.

ShiftRightUnsigned(Column, Int32)

Unsigned shift the given value numBits right.

Shuffle(Column)

Returns a random permutation of the given array.

Signum(Column)

Computes the signum of the given value.

Signum(String)

Computes the signum of the given value.

Sin(Column)

Computes sine of the angle, as if computed by java.lang.Math.sin.

Sin(String)

Computes sine of the angle, as if computed by java.lang.Math.sin.

Sinh(Column)

Computes hyperbolic sine of the angle, as if computed by java.lang.Math.sin.

Sinh(String)

Computes hyperbolic sine of the angle, as if computed by java.lang.Math.sin.

Size(Column)

Returns length of array or map.

Skewness(Column)

Returns the skewness of the values in a group.

Skewness(String)

Returns the skewness of the values in a group.

Slice(Column, Int32, Int32)

Returns an array containing all the elements in column from index start (or starting from the end if start is negative) with the specified length.

SortArray(Column, Boolean)

Sorts the input array for the given column in ascending (default) or descending order, the natural ordering of the array elements.

Soundex(Column)

Returns the soundex code for the specified expression.

SparkPartitionId()

Partition ID.

Split(Column, String)

Splits string with a regular expression pattern.

Split(Column, String, Int32)

Splits str around matches of the given pattern.

Sqrt(Column)

Computes the square root of the specified float value.

Sqrt(String)

Computes the square root of the specified float value.

Stddev(Column)

Alias for StddevSamp().

Stddev(String)

Alias for StddevSamp().

StddevPop(Column)

Returns the population standard deviation of the expression in a group.

StddevPop(String)

Returns the population standard deviation of the expression in a group.

StddevSamp(Column)

Returns the sample standard deviation of the expression in a group.

StddevSamp(String)

Returns the sample standard deviation of the expression in a group.

Struct(Column[])

Creates a new struct column that composes multiple input columns.

Struct(String, String[])

Creates a new struct column that composes multiple input columns.

Substring(Column, Int32, Int32)

Returns the substring (or slice of byte array) starting from the given position for the given length.

SubstringIndex(Column, String, Int32)

Returns the substring from the given string before count occurrences of the given delimiter.

Sum(Column)

Returns the sum of all values in the expression.

Sum(String)

Returns the sum of all values in the expression.

SumDistinct(Column)

Returns the sum of distinct values in the expression.

SumDistinct(String)

Returns the sum of distinct values in the expression.

Tan(Column)

Computes tangent of the given value, as if computed by java.lang.Math.tan.

Tan(String)

Computes tangent of the given value, as if computed by java.lang.Math.tan.

Tanh(Column)

Computes hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh.

Tanh(String)

Computes hyperbolic tangent of the given value, as if computed by java.lang.Math.tanh.

ToCsv(Column)

Converts a column containing a StructType into a CSV string with the specified schema.

ToCsv(Column, Dictionary<String,String>)

Converts a column containing a StructType into a CSV string with the specified schema.

ToDate(Column)

Converts the column into DateType by casting rules to DateType.

ToDate(Column, String)

Converts the column into a DateType with a specified format.

ToJson(Column, Dictionary<String,String>)

Converts a column containing a StructType, ArrayType of StructTypes, a MapType or ArrayType of MapTypes into a JSON string.

ToTimestamp(Column)

Convert time string to a Unix timestamp (in seconds) by casting rules to TimestampType.

ToTimestamp(Column, String)

Convert time string to a Unix timestamp (in seconds) with specified format.

ToUtcTimestamp(Column, Column)

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

ToUtcTimestamp(Column, String)

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

Translate(Column, String, String)

Translate any characters that match with the given matchingString in the column by the given replaceString.

Trim(Column)

Trim the spaces from both ends for the specified string column.

Trim(Column, String)

Trim the specified character from both ends for the specified string column.

Trunc(Column, String)

Returns date truncated to the unit specified by the format.

Udf(Func<Row>, StructType)

Creates a UDF from the specified delegate.

Udf<A1,RT>(Func<A1,RT>)

Creates a UDF from the specified delegate.

Udf<T>(Func<T,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,TResult>(Func<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10>(Func<T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,T8,T9,TResult>(Func<T1,T2,T3,T4,T5,T6,T7,T8,T9,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,T8,T9>(Func<T1,T2,T3,T4,T5,T6,T7,T8,T9,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,T8,TResult>(Func<T1,T2,T3,T4,T5,T6,T7,T8,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,T8>(Func<T1,T2,T3,T4,T5,T6,T7,T8,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7,TResult>(Func<T1,T2,T3,T4,T5,T6,T7,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,T7>(Func<T1,T2,T3,T4,T5,T6,T7,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6,TResult>(Func<T1,T2,T3,T4,T5,T6,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,T6>(Func<T1,T2,T3,T4,T5,T6,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5,TResult>(Func<T1,T2,T3,T4,T5,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,T5>(Func<T1,T2,T3,T4,T5,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4,TResult>(Func<T1,T2,T3,T4,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,T4>(Func<T1,T2,T3,T4,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3,TResult>(Func<T1,T2,T3,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2,T3>(Func<T1,T2,T3,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<T1,T2,TResult>(Func<T1,T2,TResult>)

Creates a UDF from the specified delegate.

Udf<T1,T2>(Func<T1,T2,Row>, StructType)

Creates a UDF from the specified delegate.

Udf<TResult>(Func<TResult>)

Creates a UDF from the specified delegate.

Unbase64(Column)

Decodes a BASE64 encoded string column and returns it as a binary column.

UnboundedFollowing()

Window function: returns the special frame boundary that represents the last row in the window partition.

UnboundedPreceding()

Window function: returns the special frame boundary that represents the first row in the window partition.

Unhex(Column)

Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.

UnixTimestamp()

Returns the current Unix timestamp (in seconds).

UnixTimestamp(Column)

Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.

UnixTimestamp(Column, String)

Converts time string with given format to Unix timestamp (in seconds).

Upper(Column)

Converts a string column to upper case.

Variance(Column)

Alias for VarSamp().

Variance(String)

Alias for VarSamp().

VarPop(Column)

Returns the population variance of the values in a group.

VarPop(String)

Returns the population variance of the values in a group.

VarSamp(Column)

Returns the unbiased variance of the values in a group.

VarSamp(String)

Returns the unbiased variance of the values in a group.

WeekOfYear(Column)

Extracts the week number as an integer from a given date/timestamp/string.

When(Column, Object)

Evaluates a condition and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

Window(Column, String)

Generates tumbling time windows given a timestamp specifying column.

Window(Column, String, String)

Bucketize rows into one or more time windows given a timestamp column.

Window(Column, String, String, String)

Bucketize rows into one or more time windows given a timestamp column.

XXHash64(Column[])

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.

Year(Column)

Extracts the year as an integer from a given date/timestamp/string.

Years(Column)

A transform for timestamps and dates to partition data into years.

Applies to