Kopīgot, izmantojot


slice

Returns a new array column by slicing the input array column from a start index to a specific length. The indices start at 1, and can be negative to index from the end of the array. The length specifies the number of elements in the resulting array.

Syntax

from pyspark.sql import functions as sf

sf.slice(x, start, length)

Parameters

Parameter Type Description
x pyspark.sql.Column or str Input array column or column name to be sliced.
start pyspark.sql.Column, str, or int The start index for the slice operation. If negative, starts the index from the end of the array.
length pyspark.sql.Column, str, or int The length of the slice, representing number of elements in the resulting array.

Returns

pyspark.sql.Column: A new Column object of Array type, where each value is a slice of the corresponding list from the input column.

Examples

Example 1: Basic usage of the slice function.

from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
df.select(sf.slice(df.x, 2, 2)).show()
+--------------+
|slice(x, 2, 2)|
+--------------+
|        [2, 3]|
|           [5]|
+--------------+

Example 2: Slicing with negative start index.

from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3],), ([4, 5],)], ['x'])
df.select(sf.slice(df.x, -1, 1)).show()
+---------------+
|slice(x, -1, 1)|
+---------------+
|            [3]|
|            [5]|
+---------------+

Example 3: Slice function with column inputs for start and length.

from pyspark.sql import functions as sf
df = spark.createDataFrame([([1, 2, 3], 2, 2), ([4, 5], 1, 3)], ['x', 'start', 'length'])
df.select(sf.slice(df.x, df.start, df.length)).show()
+-----------------------+
|slice(x, start, length)|
+-----------------------+
|                 [2, 3]|
|                 [4, 5]|
+-----------------------+