DAX: Simple Linear Regression Pattern
OVERVIEW
In statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.
The adjective simply refers to the fact that the outcome variable is related to a single predictor. The slope of the fitted line is equal to the correlation between y and x corrected by the ratio of standard deviations of these variables. The intercept of the fitted line is such that it passes through the center of mass (x, y) of the data points.
Other regression methods besides the simple ordinary least squares (OLS) also exist (see linear regression). In particular, when one wants to do regression by eye, one usually tends to draw a slightly steeper line, closer to the one produced by the total least squares method. This occurs because it is more natural for one's mind to consider the orthogonal distances from the observations to the regression line, rather than the vertical ones as OLS method does.
Linear regression can serve as a useful forecasting tool within the limits of linearity. For non-linear data, accuracy will suffer when using linear regression.
The regression formula can be stated as:
Regression Equation(y) = a + bx
Where:
Slope(b) = (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2)
Intercept(a) = (ΣY - b(ΣX)) / N
x and y are the variables.
b = The slope of the regression line
a = The intercept point of the regression line and the y axis.
N = Number of values or elements
X = First Score
Y = Second Score
ΣXY = Sum of the product of first and Second Scores
ΣX = Sum of First Scores
ΣY = Sum of Second Scores
ΣX2 = Sum of square First Scores
IMPLEMENTATION
Given a date set of X and Y coordinates:
regression.csv:
X,Y
60,3.1
61,3.6
62,3.8
63,4
65,4.1
And desired X estimation values
estimation.csv:
X
64
75
58
In Power BI Desktop, import both CSV files into the model as "regression" and "estimation" tables. There should be no relationship between these tables.
In regression table, create the following columns:
X*Y = [X]*[Y]
X*X = [X]*[X]
In regression table, create the following measures:
Count = COUNTAX(ALL('regression'),[X])
SumX = SUMX(ALL('regression'),[X])
SumY = SUMX(ALL('regression'),[Y])
SumX*Y = SUMX(ALL('regression'),[X*Y])
SumX*X = SUMX(ALL('regression'),[X*X])
Slope = ([Count]*[SumX*Y] - [SumX]*[SumY]) / ([Count]*[SumX*X] - [SumX]*[SumX])
Intercept = ([SumY] - [Slope]*[SumX]) / [Count]
In both the regression and estimation tables create the following column:
Estimate = [Intercept] + [Slope]*[X]
You can now plot your original values and the linear regression estimation values (Estimate) as well as plot your X values for estimation and the linear regression estimates (Estimate).
Below is a screen shot of example data. The first graph shows the original values plotted with the estimates derived from linear regression. The graph below that shows estimated values for desired future values of X.