Weighted Regression

Most standard regression procedures attempt to minimize the sum of squares, SSQ, defined as

This equation contains no information whatsoever about how confident we are in the accuracy of each of the measured observations, . It would be nice if we could give more weight to observations which are known to high precision, and less weight to observations which are known to the least precision. One way to do this is to include weights in the SSQ expression as

where wᵢ is the weight of the i-th observation. 

Weighted observations can help to correct situations like the one shown in the figure below. On the left is the (hypothetical) result of an unweighted, linear regression of four data points. The point farthest to the right is significantly smaller than the other three points and has the effect of lowering the slope of the regression line. On the left is the (hypothetical) result of a weighted, linear regression where the last point is weighted less than the other three points because its error bars are larger, indicating less confidence in its true value. The slope increases to better approximate the behavior of the first three points but still includes the effects of the fourth point, albeit much reduced compared to the unweighted case.

There's no universally accepted definition of observation weights but most expressions are functions of the errors (uncertainties) of the data points. For example, if error is measured only in the y-component then one way to define weights is as

If error is present in both the x- and y-component then another option is

where the means are of the x- and y-components of the error. The division by these two quantities is necessary because the errors have different units and cannot simply be added together. For CENG 176 we'll use either of these two forms whenever we need to calculate weights for regression analysis.