Chapter 6

Regression Analysis

Linear regression is an approach for modeling the linear relationship between two variables.

Ordinary Least Squares

The ordinary least squares (OLS) approach to regression allows us to estimate the parameters of a linear model. The goal of this method is to determine the linear model that minimizes the sum of the squared errors between the observations in a dataset and those predicted by the model. Explore the OLS method through the four infamous datasets contained in Anscombe's Quartet.

Choose one of the quartets to investigate.

Drag and drop data points to explore how this affects the OLS line.

Click on a column of the regression table to learn more about this parameter.

\(\displaystyle{n}\) \(\displaystyle{\bar{\cssId{xMEAN}{x}}}\) \(\displaystyle{\bar{\cssId{yMEAN}{y}}}\) \(\displaystyle{\hat{\cssId{BETA0}{B_{0}}}}\) \(\displaystyle{\hat{\cssId{BETA1}{B_{1}}}}\) \(\displaystyle{SSE}\)
Model

Correlation

Correlation is a measure of the linear relationship between two variables. It is defined for a sample as the following and takes value between +1 and −1 inclusive:

$$r = \dfrac{s_{xy}}{\sqrt{s_{xx}}\sqrt{s_{yy}}}$$

It can also be understood as the cosine of the angle formed by the ordinary least square line determined in both variable dimensions. Explore this concept through Edgar Anderson's famous Iris flower dataset.

Check which species to investigate.

Click on a cell of the correlation matrix to visualize the relationship between these traits.

Sepal Length Sepal Width Petal Length Petal Width
Sepal Length
Sepal Width
Petal Length
Petal Width

Analysis of Variance

Analysis of Variance (ANOVA) is a statistical method for testing whether groups of data have the same mean. ANOVA generalizes the t-test to two or more groups by comparing the sum of square error within and between groups.

Choose one of the following datasets to investigate.

Drag and drop data points to explore how this affects the result of the ANOVA test.

Click on a column of the ANOVA table to learn more about this paramter.

\(\displaystyle{SSE}\) \(\displaystyle{df}\) \(\displaystyle{MS}\) \(\displaystyle{F}\) \(\displaystyle{p}\)
Treatment
Error
Total