Σημείωση
Η πρόσβαση σε αυτή τη σελίδα απαιτεί εξουσιοδότηση. Μπορείτε να δοκιμάσετε να συνδεθείτε ή να αλλάξετε καταλόγους.
Η πρόσβαση σε αυτή τη σελίδα απαιτεί εξουσιοδότηση. Μπορείτε να δοκιμάσετε να αλλάξετε καταλόγους.
Generates monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
Syntax
from pyspark.sql import functions as sf
sf.monotonically_increasing_id()
Returns
pyspark.sql.Column: last value of the group.
Notes
The function is non-deterministic because its result depends on partition IDs.
As an example, consider a :class:DataFrame with two partitions, each with 3 records. This expression would return the following IDs: 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
Examples
Example 1: Generate monotonically increasing IDs
from pyspark.sql import functions as sf
spark.range(0, 10, 1, 2).select(
"*",
sf.spark_partition_id(),
sf.monotonically_increasing_id()).show()
+---+--------------------+-----------------------------+
| id|SPARK_PARTITION_ID()|monotonically_increasing_id()|
+---+--------------------+-----------------------------+
| 0| 0| 0|
| 1| 0| 1|
| 2| 0| 2|
| 3| 0| 3|
| 4| 0| 4|
| 5| 1| 8589934592|
| 6| 1| 8589934593|
| 7| 1| 8589934594|
| 8| 1| 8589934595|
| 9| 1| 8589934596|
+---+--------------------+-----------------------------+