Linear Regression

The calibration plot developed previously looks rather like a straight line so it's probably not a bad idea to assume that the regression equation should be a straight line. The following code demonstrates how to perform a linear (POLY1) fit of our calibration data and overlay the predicted fit on the same figure from the Propagation of Error page. Notice that we first calculate the fit weights as described on the Weighted Regression page (eliminating an NaN error along the way by replacing it with a mean value), then we use the fit function to perform the regression procedure.

eY(1) = mean(eY(2:end));

wVal = 1./(ek/mean(ek) + eY/mean(eY));

[fobj, gof] = fit(k, Y, 'poly1', 'Weights', wVal)

fobj =

Linear model Poly1:

fobj(x) = p1*x + p2

Coefficients (with 95% confidence bounds):

p1 = 2.298 (1.969, 2.626)

p2 = -0.0923 (-1.059, 0.8743)

gof = struct with fields:

sse: 0.4745

rsquare: 0.9895

dfe: 4

adjrsquare: 0.9869

rmse: 0.3444

errorbar(k, Y, eY, eY, ek, ek, 'ko')

hold on

plot(fobj, 'r')

hold off

axis square

xlabel('Knob Setting')

ylabel('Discharge Rate (mL/s)')

legend('data', 'fit', 'Location', 'Northwest')

Extracting coefficient data

If all you need are the fit coefficients and their uncertainties then you're mostly done at this point. The fit object FOBJ holds all the information about our calibration curve. Importantly, notice that the confidence intervals are given as a range instead of the +/- notation we use in this class. To get the errors into our standard +/- notation you should note that the range is symmetric, then subtract the lower value from the coefficient value. For example, the uncertainty on p1 is 2.298 - 1.969 = 0.329 such that p1 = 2.298 +/- 0.329 (if we needed to actually report this value then we'd use the rounding rules mentioned earlier such that p1 = 2.3 +/- 0.3, but if we're only using p1 for intermediate calculations then we'd keep all the digits).

If the fit coefficient is related to the quantity of interest by an algebraic relation then you'll have to do some algebra and propagation of error to extract the desired information. For example, suppose p1 in the previous example was related to another coefficient c as

where h is another experimentally measured parameter. If c is the parameter we want to report then we can do some algebra to get

and we can propagate error through this expression to get

Using the calibration curve

There are two common ways of using a calibration curve:

Given y, get x. This is common if you're using the calibration curve to convert values output from a machine to more useful units, such as voltage to temperature or absorbance to calibration.
Set x to get a desired value of y. This is common if you're using the calibration curve as part of a controller setup, like the pump example we've been working with on this page.

In either case we need to use the equation of best fit to estimate the new value, but more importantly we need to provide an estimate of the error on the resulting value. This error is tricky because it has to include two sources of error:

The fitted equation for the calibration curve does not perfectly capture all variation in data.
The newly measured value (the one that we make after we made the calibration curve) has itself some amount of error.

Combining these sources of error can be done in two ways (see NIST for more discussion):

Measure a third reference called a "check standard" and compare the output.
Propagate coefficient errors through the equation of best fit.

The second option is usually straightforward when the calibration equation is simple enough that the propagation of error procedure isn't awful. For example, suppose you set the knob to 2.7 (or your best estimate of 2.7) and you wanted to know what the corresponding flow rate is:

% Knob and coefficient values

k = 2.7;

p1 = fobj.p1;

p2 = fobj.p2;

% Uncertainties

ek = 0.25;

ep = diff(confint(fobj))/2;

ep1 = ep(1);

ep2 = ep(2);

% New flow rate

Y = fobj(2.7); % mL/s

eY = sqrt((k*ek)^2 + (p1*ep1)^2 + ep2^2); % prop error

sprintf('Y = %f +/- %f mL/s', Y, eY)

ans = 'Y = 6.119387 +/- 1.421925 mL/s'

(Notice again that we're keeping all the digits. Presumably we're going to use these values in another calculation somewhere else, but if we had to report them then we'd using the rounding rules to get Y = 6 +/- 1 mL/s.)

You could also reverse the steps if you needed to set the volumetric flow rate to some particular value: you could rearrange the regression equation to solve for the required knob setting (the error on such a flow rate would be determined in the same way).

Unfortunately, the above method doesn't include information about interactions between the fitted coefficients, called the covariance terms. To do so requires computation of additional partial derivatives and extraction of additional uncertainties, all of which are certainly possible but a little tedious. To avoid doing so we use a third type of error estimate, the prediction interval.

Using prediction intervals

A prediction interval is a means of estimating the uncertainty on a future observation (a new measurement) based on information already collected (the calibration curve).

Page updated

Report abuse