Choosing the Best Fit

We've seen lots of different ways to fit a function to a data set but how can we figure out which one is best? There are three general rules to follow:

Regarding the last point, and indeed any time you need to compare the results of two or more fit results, you've got two statistical tools to help out: goodness-of-fit parameters and coefficient confidence intervals.

Goodness-of-fit Parameter

For a quick comparison the goodness-of-fit parameter, also known as R2, is sufficient. More detailed an accurate evaluations can be obtained by ANOVA (specifically by comparing the P-value of the two models) but for ease of use we'll stick with R2 because it almost always varies between 0 and 1 with the following interpretation:

In most chemical engineering experiments that you'll encounter a value of R2 > 0.95 is fairly good, but there's no strict definition of what "fairly good" is and it can vary from experiment to experiment. You can ask the FIT function to output the goodness-of-fit parameter by providing an additional output argument:

[fobj, gof] = fit(k', Y', 'poly1')

fobj =

    Linear model Poly1:

     fobj(x) = p1*x + p2

     Coefficients (with 95% confidence bounds):

       p1 =       2.303  (1.974, 2.632)

       p2 =    -0.09937  (-1.095, 0.8967)

gof = struct with fields:

          sse: 0.9828

       rsquare: 0.9895

           dfe: 4

    adjrsquare: 0.9869

          rmse: 0.4957

Coefficient Confidence Intervals

You should always looks at the coefficient confidence intervals for a simple reason:

If the coefficient's confidence interval contains zero then the coefficient

should be excluded and you should repeat the fit procedure.

This is how you can decide, for example, between a third-order polynomial and a fourth-order polynomial when both give almost identical goodness-of-fit parameters. The exception to this rule is if you're fitting data to a theoretical form: if a coefficient in a fit to a theoretical form includes zero then you have two choices: