rxPartition: Partition Data by Key Values and Save the results to a Partitioned .Xdf

Description

Partition input data sources by key values and save the results to a partitioned Xdf on disk.

Usage

  rxPartition(inData, outData, varsToPartition, append = "rows", overwrite = FALSE, ...)

Arguments

inData

either a data source object, a character string specifying a .xdf file, or a data frame object.

outData

a partitioned data source object created by RxXdfData with createPartitionSet = TRUE.

varsToPartition

character vector of variable names to specify the values in those variables to be used for partitioning

append

either "none" to create a new files or "rows" to append rows to an existing file. If outData exists and append is "none", the overwrite argument must be set to TRUE.

overwrite

logical value. If TRUE, an existing outData will be overwritten. overwrite is ignored if append = "rows".

...

additional arguments to be passed directly to the Revolution Compute Engine.

Value

a data frame of partitioning values and data sources, each row in the data frame represents one partition and the data source in the last variable holds the data of a specifict partition.

Note

In the current version, this function is single threaded.

Author(s)

Microsoft Corporation Microsoft Technical Support

See Also

rxExecBy, RxXdfData

Examples



 ##############################################################################
 # Construct a partitioned Xdf
 ##############################################################################

   # create an input Xdf data source
   inFile <- "claims.xdf"
   inFile <- file.path(dataPath = rxGetOption(opt = "sampleDataDir"), inFile)
   inXdfDS <- RxXdfData(file = inFile)

   # create an output partitioned Xdf data source
   outFile <- file.path(tempdir(), "partitionedClaims.xdf")
   outPartXdfDataSource <- RxXdfData(file = outFile, createPartitionSet = TRUE)

   # construct and save the partitioned Xdf to disk
   partDF <- rxPartition(inData = inXdfDS, outData = outPartXdfDataSource, varsToPartition = c("car.age"))

 ##############################################################################
 # Append new data to an existing partitioned Xdf
 ##############################################################################

   # create two sets of data frames from Xdf data source
   inFile <- "claims.xdf"
   inFile <- file.path(dataPath = rxGetOption(opt = "sampleDataDir"), inFile)
   inXdfDS <- RxXdfData(file = inFile)
   inDF <- rxImport(inData = inXdfDS)

   df1 <- inDF[1:50,]
   df2 <- inDF[51:nrow(inDF),]

   # create an output partitioned Xdf data source
   outFile <- file.path(tempdir(), "partitionedClaims.xdf")
   outPartXdfDataSource <- RxXdfData(file = outFile, createPartitionSet = TRUE)

   # construct the partitioned Xdf from the first data set df1 and save to disk
   partDF1 <- rxPartition(inData = df1, outData = outPartXdfDataSource, varsToPartition = c("car.age", "type"), append = "none", overwrite = TRUE)

   # append data from the second data set to the existing partitioned Xdf
   partDF2 <- rxPartition(inData = df2, outData = outPartXdfDataSource, varsToPartition = c("car.age", "type"))

   # overwrite an existing partitioned Xdf
   partDF2 <- rxPartition(inData = inXdfDS, outData = outPartXdfDataSource, varsToPartition = c("car.age"), append = "none", overwrite = TRUE)