rxPartition: Partition Data by Key Values and Save the results to a Partitioned .Xdf


Partition input data sources by key values and save the results to a partitioned Xdf on disk.


  rxPartition(inData, outData, varsToPartition, append = "rows", overwrite = FALSE, ...)



either a data source object, a character string specifying a .xdf file, or a data frame object.


a partitioned data source object created by RxXdfData with createPartitionSet = TRUE.


character vector of variable names to specify the values in those variables to be used for partitioning


either "none" to create a new files or "rows" to append rows to an existing file. If outData exists and append is "none", the overwrite argument must be set to TRUE.


logical value. If TRUE, an existing outData will be overwritten. overwrite is ignored if append = "rows".


additional arguments to be passed directly to the Revolution Compute Engine.


a data frame of partitioning values and data sources, each row in the data frame represents one partition and the data source in the last variable holds the data of a specifict partition.


In the current version, this function is single threaded.


See Also

rxExecBy, RxXdfData


 # Construct a partitioned Xdf

   # create an input Xdf data source
   inFile <- "claims.xdf"
   inFile <- file.path(dataPath = rxGetOption(opt = "sampleDataDir"), inFile)
   inXdfDS <- RxXdfData(file = inFile)

   # create an output partitioned Xdf data source
   outFile <- file.path(tempdir(), "partitionedClaims.xdf")
   outPartXdfDataSource <- RxXdfData(file = outFile, createPartitionSet = TRUE)

   # construct and save the partitioned Xdf to disk
   partDF <- rxPartition(inData = inXdfDS, outData = outPartXdfDataSource, varsToPartition = c("car.age"))

 # Append new data to an existing partitioned Xdf

   # create two sets of data frames from Xdf data source
   inFile <- "claims.xdf"
   inFile <- file.path(dataPath = rxGetOption(opt = "sampleDataDir"), inFile)
   inXdfDS <- RxXdfData(file = inFile)
   inDF <- rxImport(inData = inXdfDS)

   df1 <- inDF[1:50,]
   df2 <- inDF[51:nrow(inDF),]

   # create an output partitioned Xdf data source
   outFile <- file.path(tempdir(), "partitionedClaims.xdf")
   outPartXdfDataSource <- RxXdfData(file = outFile, createPartitionSet = TRUE)

   # construct the partitioned Xdf from the first data set df1 and save to disk
   partDF1 <- rxPartition(inData = df1, outData = outPartXdfDataSource, varsToPartition = c("car.age", "type"), append = "none", overwrite = TRUE)

   # append data from the second data set to the existing partitioned Xdf
   partDF2 <- rxPartition(inData = df2, outData = outPartXdfDataSource, varsToPartition = c("car.age", "type"))

   # overwrite an existing partitioned Xdf
   partDF2 <- rxPartition(inData = inXdfDS, outData = outPartXdfDataSource, varsToPartition = c("car.age"), append = "none", overwrite = TRUE)