Piezīmes
Lai piekļūtu šai lapai, ir nepieciešama autorizācija. Varat mēģināt pierakstīties vai mainīt direktorijus.
Lai piekļūtu šai lapai, ir nepieciešama autorizācija. Varat mēģināt mainīt direktorijus.
Randomly splits this DataFrame with the provided weights.
Syntax
randomSplit(weights: List[float], seed: Optional[int] = None)
Parameters
| Parameter | Type | Description |
|---|---|---|
weights |
list | list of doubles as weights with which to split the DataFrame. Weights will be normalized if they don't sum up to 1.0. |
seed |
int, optional | The seed for sampling. |
Returns
list: List of DataFrames.
Examples
from pyspark.sql import Row
df = spark.createDataFrame([
Row(age=10, height=80, name="Alice"),
Row(age=5, height=None, name="Bob"),
Row(age=None, height=None, name="Tom"),
Row(age=None, height=None, name=None),
])
splits = df.randomSplit([1.0, 2.0], 24)
splits[0].count()
# 2
splits[1].count()
# 2