# rxLorenz: Lorenz Curve and Gini Coefficient

## Description

Compute and plot an empirical Lorenz curve from a variable in a data set, optionally specifiying a separate variable from which to compute the y-values for the curve. Compute the Gini Coefficient from the Lorenz curve data. Appropriate for big data sets since data is binned with computations performed in one pass, rather than sorting the data as part of the computation process.

## Usage

```
rxLorenz(orderVarName, valueVarName = orderVarName, data, numBreaks = 1000,
pweights = NULL, fweights = NULL, blocksPerRead = 1,
reportProgress = 1, verbose = 0)
## S3 method for class `rxLorenz':
rxGini ( x )
## S3 method for class `rxLorenz':
plot (x, title = NULL, subtitle = NULL,
xTitle = NULL, yTitle = NULL, lineColor = NULL,
lineStyle = "solid", lineWidth = 2, equalityGridLine = TRUE,
equalityColor = "grey75", equalityStyle = NULL, equalityWidth = 2, ...)
```

## Arguments

`orderVarName`

A character string with the name of the variable to use in computing approximate quantiles.

`valueVarName`

A character string with the name of the variable to use to compute the mean values per quantile. Can be the same as `orderVarName`

.

`data`

data frame, character string containing an .xdf file name (with path), or RxDataSource-class object representing a data set containing the actual and observed variables.

`numBreaks`

integer specifiying the number of breaks to use in comuting approximate quantiles.

`pweights`

character string specifying the variable to use as probability weights for the observations.

`fweights`

character string specifying the variable to use as frequency weights for the observations.

`blocksPerRead`

number of blocks to read for each chunk of data read from the data source.

`reportProgress`

integer value with options:

`0`

: no progress is reported.`1`

: the number of processed rows is printed and updated.`2`

: rows processed and timings are reported.`3`

: rows processed and all timings are reported.

`verbose`

integer value. If `0`

, no additional output is printed. If `1`

, additional information is printed as summary statistics are computed.

`x`

output object from rxLorenz function.

`title`

main title for the plot.

`subtitle`

subtitle (at the bottom) for the plot.

`xTitle`

title for the X axis.

`yTitle`

title for the Y axis.

`lineColor`

character or integer vector specifying line color for the Lorenz curve. See colors for a list of available colors.

`lineStyle`

line style for line plot: `"blank"`

, `"solid"`

, `"dashed"`

, `"dotted"`

, `"dotdash"`

, `"longdash"`

, or `"twodash"`

. Specify `"blank"`

for no line, or set `type`

to `"p"`

.

`lineWidth`

a positive number specifiying the line width for line plot. The interpretation is device-specific.

`equalityGridLine`

logical value. If `TRUE`

, a diagonal grid line will be drawn representing complete equality.

`equalityColor`

character or integer vector specifying line color for the equality grid line. If `NULL`

, the color of other grid lines will be used.

`equalityStyle`

line style for the equality grid line: `"blank"`

, `"solid"`

, `"dashed"`

, `"dotted"`

, `"dotdash"`

, `"longdash"`

, or `"twodash"`

. If `NULL`

, the style of other gride lines will be used.

`equalityWidth`

a positive number specifiying the line width for line plot. If `NULL`

, the width of other grid lines will be used.

` ...`

Additional arguments to be passed to `xyplot`

.

## Details

`rxLorenz`

computes the cumulative percentage values of the variable
specified in `valueVarName`

for groups binned by the `orderVarname`

. The
size of the bins is determined by `numBreaks`

.

When plotted, the cumulative percentage values are plotted against the quantile percentages.

The Gini coefficient is computed by estimating the ratio of the area between the line of equality and the Lorenz curve to the total area under the line of equality (using trapezoidal integration). The Gini coefficient can range from 0 to 1, with 0 representing perfect equality.

Precision can be increased by increasing `numBreaks`

.

## Value

`rxLorenz`

returns a data frame of class `"rxLorenz"`

containing two
variables: `cumVals`

and `percent`

. It also may
have a `"description"`

attribute containing the value variable name or
description.

`rxGini`

returns a numeric vector of length one containing the approximate Gini coefficient.

## Author(s)

Microsoft Corporation `Microsoft Technical Support`

## See Also

rxPredict, rxLogit, rxGlm, rxLinePlot, rxQuantile, rxRoc.

## Examples

```
########################################################################
# Example using simple data frames for extreme distributions
########################################################################
# Lorenz curve for complete equality
testData <- data.frame(income = rep(100, times=10))
lorenzOut1 <- rxLorenz("income", data = testData, numBreaks = 100)
plot(lorenzOut1)
rxGini(lorenzOut1)
# Extreme inequality
testData <- data.frame(income = c(rep(0, times=99), 100))
lorenzOut2 <- rxLorenz("income", data = testData, numBreaks = 100)
plot(lorenzOut2, equalityWidth = 3, equalityColor = "black")
rxGini(lorenzOut2)
########################################################################
# Example using xdf file from sample data
########################################################################
censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers")
# Compute Lorenz data using probability weights
lorenzOut <- rxLorenz(orderVarName = "incwage", data = censusWorkers,
pweights = "perwt")
# Plot the Lorenz Curve
lorenzPlot <- plot(lorenzOut,
title = "Lorenz Curve for Workers from Three States",
subtitle = "Data Source: 5 Percent Sample of U.S. 2000 Census",
lineWidth = 3, equalityColor = "black", equalityStyle = "longdash")
# Compute the Gini Coefficient
giniCoef <- rxGini(lorenzOut)
```