Lesson 2

Introduction to Model Comparison

In our previous lesson, we built a simple model predicting car fuel efficiency from weight. But what if we could improve our predictions by adding more variables? This lesson teaches you how to compare models systematically and decide when adding complexity is worthwhile.

Model selection is one of the most important skills in statistical analysis. Too simple, and you miss important relationships. Too complex, and you overfit to your specific dataset. Today we'll learn principled approaches to finding the right balance.

What You'll Learn

  • Fit and compare multiple competing models
  • Use ANOVA to test if adding variables improves model fit
  • Interpret AIC, BIC, and R-squared for model selection
  • Apply practical guidelines for building better models
  • Understand the trade-off between complexity and overfitting
Before You Begin:
  • Completed the basic statistical modeling tutorial
  • Understanding of R-squared and p-values
  • Same packages: tidyverse, glmmTMB, broom.mixed
We'll continue using the mtcars dataset, but now we'll explore whether adding horsepower and cylinder count improves our weight-based model for predicting fuel efficiency.
Step 1

Building Competing Models

Let's start by fitting several competing models, each with different combinations of predictors. We'll systematically add variables to see which combinations provide the best predictions.

# Load our familiar packages
library(tidyverse)
library(glmmTMB)
library(broom.mixed)

# Load data
data(mtcars)

# Examine correlations between potential predictors
mtcars %>% 
  select(mpg, wt, hp, cyl) %>% 
  cor() %>% 
  round(3)
       mpg     wt     hp    cyl
mpg  1.000 -0.868 -0.776 -0.852
wt  -0.868  1.000  0.659  0.782
hp  -0.776  0.659  1.000  0.832
cyl -0.852  0.782  0.832  1.000

Understanding the Relationships

The correlation matrix reveals important patterns:

  • Strong negative correlations: All predictors negatively correlate with mpg (heavier, more powerful cars get worse mileage)
  • Multicollinearity: Weight, horsepower, and cylinders are moderately correlated (0.66-0.83)
  • Selection challenge: We need to balance additional predictive power against redundancy
# Fit competing models systematically
model1 <- glmmTMB(mpg ~ wt, data = mtcars)                    # Baseline: weight only
model2 <- glmmTMB(mpg ~ wt + hp, data = mtcars)               # Add horsepower
model3 <- glmmTMB(mpg ~ wt + hp + cyl, data = mtcars)         # Add cylinders
model4 <- glmmTMB(mpg ~ wt * hp, data = mtcars)               # Weight-horsepower interaction

# Create a summary function for easy comparison
summarize_model <- function(model) {
  model_data <- augment(model)
  r_squared <- 1 - var(model_data$.resid) / var(mtcars$mpg)
  
  data.frame(
    R_squared = round(r_squared, 3),
    AIC = round(AIC(model), 1),
    BIC = round(BIC(model), 1),
    RMSE = round(sqrt(mean(model_data$.resid^2)), 2)
  )
}

# Compare all models
model_comparison <- bind_rows(
  summarize_model(model1),
  summarize_model(model2),
  summarize_model(model3),
  summarize_model(model4)
) %>%
  mutate(Model = c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP"))

print(model_comparison)
Model R-squared AIC BIC RMSE
Weight Only 0.753 166.0 170.4 2.95
Weight + HP 0.827 156.7 162.5 2.44
Weight + HP + Cyl 0.843 155.5 162.8 2.32
Weight × HP 0.885 145.6 152.9 1.98
Each model shows improvement in R-squared and lower AIC values, suggesting that additional complexity is justified. The interaction model (Weight × HP) performs best, capturing how the effect of weight on fuel efficiency depends on horsepower.
Step 2

Statistical Testing with ANOVA

Comparing model fit metrics gives us a sense of which models perform better, but ANOVA provides formal statistical tests of whether adding variables significantly improves model fit.

# Use ANOVA to test nested models
anova_result <- anova(model1, model2, model3)
print(anova_result)
Data: mtcars
Models:
model1: mpg ~ wt, zi=~0, disp=~1
model2: mpg ~ wt + hp, zi=~0, disp=~1
model3: mpg ~ wt + hp + cyl, zi=~0, disp=~1
       Df    AIC    BIC  logLik deviance   Chisq Chi Df Pr(>Chisq)    
model1  3 166.03 170.43 -80.015   160.03                              
model2  4 156.65 162.51 -74.326   148.65 11.3771      1  0.0007436 ***
model3  5 155.48 162.81 -72.738   145.48  3.1757      1  0.0747407 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpreting ANOVA Results

  • Model 1 → Model 2: Adding horsepower significantly improves fit (p = 0.0007)
  • Model 2 → Model 3: Adding cylinders shows marginal improvement (p = 0.075)
  • Chi-square test: Tests whether the reduction in deviance is statistically significant
  • Degrees of freedom: Each additional parameter costs 1 degree of freedom
# Test the interaction model separately
# (Can't include in nested ANOVA since it's not nested)
anova_interaction <- anova(model2, model4)
print(anova_interaction)
Data: mtcars
Models:
model2: mpg ~ wt + hp, zi=~0, disp=~1
model4: mpg ~ wt * hp, zi=~0, disp=~1
       Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)   
model2  4 156.65 162.51 -74.326   148.65                           
model4  5 145.61 152.94 -67.805   135.61 13.044      1   0.000304 ***
The interaction term is highly significant (p = 0.0003), indicating that the effect of weight on fuel efficiency really does depend on horsepower. This suggests that weight matters more for high-horsepower cars than low-horsepower cars (or vice versa).
Interpretation Challenge: Why might the weight-horsepower interaction be significant? Think about different types of cars: a heavy sports car vs. a heavy truck. How might weight affect fuel efficiency differently for each?
Step 3

Understanding Model Fit Metrics

Different metrics tell us different things about model quality. Let's understand what each one means and when to use them for model selection.

R-squared: Proportion of Variance Explained

# Visualize R-squared improvement
r_squared_data <- data.frame(
  Model = factor(c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP"),
                 levels = c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP")),
  R_squared = c(0.753, 0.827, 0.843, 0.885),
  Variables = c(1, 2, 3, 3)  # Interaction counts as 3 parameters total
)

ggplot(r_squared_data, aes(x = Variables, y = R_squared)) +
  geom_line(color = "#7c3aed", linewidth = 1.2) +
  geom_point(color = "#7c3aed", size = 3) +
  geom_text(aes(label = Model), vjust = -0.5, size = 3.5) +
  scale_y_continuous(limits = c(0.7, 0.9), labels = scales::percent_format()) +
  labs(
    title = "Model Performance: R-squared vs Complexity",
    subtitle = "Each additional variable explains more variance",
    x = "Number of Predictor Variables",
    y = "R-squared (Variance Explained)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray60")
  )

AIC and BIC: Balancing Fit and Complexity

# Compare information criteria
ic_data <- model_comparison %>%
  select(Model, AIC, BIC) %>%
  pivot_longer(cols = c(AIC, BIC), names_to = "Criterion", values_to = "Value") %>%
  mutate(Model = factor(Model, levels = c("Weight Only", "Weight + HP", 
                                         "Weight + HP + Cyl", "Weight × HP")))

ggplot(ic_data, aes(x = Model, y = Value, fill = Criterion)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("AIC" = "#2563eb", "BIC" = "#dc2626")) +
  labs(
    title = "Information Criteria Comparison",
    subtitle = "Lower values indicate better models (accounting for complexity)",
    x = "Model",
    y = "Information Criterion Value"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray60")
  )

Key Differences Between Metrics

Metric What it Measures Good for Limitation
R-squared Proportion of variance explained Understanding explanatory power Always increases with more variables
AIC Fit penalized by number of parameters Model selection, prediction focus Can overfit with small samples
BIC Fit with stronger complexity penalty Finding "true" model, larger samples May be too conservative
RMSE Average prediction error Practical prediction accuracy Same units as outcome variable
Notice that AIC and BIC both favor the interaction model, but BIC (which penalizes complexity more) shows smaller differences between models. This suggests the interaction is robust and worth the added complexity.
Step 4

Model Selection Strategy

With multiple competing models, how do we choose? The best approach depends on your goals: prediction accuracy, scientific understanding, or model simplicity.

The Final Model: Weight × Horsepower Interaction

# Examine our best model in detail
best_model <- model4  # Weight × HP interaction
model_summary <- tidy(best_model)
print(model_summary)

# Get model fit statistics
fit_stats <- glance(best_model)
print(fit_stats)
# Model coefficients
# A tibble: 4 × 7
  effect component term        estimate std.error statistic  p.value
                                 
1 fixed  cond      (Intercept)  49.8       3.31       15.0  4.20e-16
2 fixed  cond      wt           -8.22      1.60       -5.13 2.48e- 5
3 fixed  cond      hp           -0.120     0.0205     -5.87 4.30e- 6
4 fixed  cond      wt:hp         0.0279    0.00774     3.61 1.15e- 3

# Model fit statistics
   nobs sigma logLik   AIC   BIC deviance df.residual
1    32  1.98  -67.8  145.6 152.9     124.        28

Interpreting the Interaction Model

The final model equation is: mpg = 49.8 - 8.22×wt - 0.120×hp + 0.0279×wt×hp

  • Weight main effect (-8.22): For low-horsepower cars, each 1000 lbs reduces mpg by ~8.2
  • Horsepower main effect (-0.120): For light cars, each additional hp reduces mpg by ~0.12
  • Interaction term (+0.0279): The weight penalty is less severe for high-horsepower cars
# Visualize the interaction effect
model_data <- augment(best_model)

# Create predictions for visualization
prediction_grid <- expand_grid(
  wt = seq(1.5, 5.5, 0.1),
  hp = c(100, 200, 300)  # Low, medium, high horsepower
)

prediction_grid$predicted_mpg <- predict(best_model, newdata = prediction_grid)

# Plot the interaction
ggplot(prediction_grid, aes(x = wt, y = predicted_mpg, color = factor(hp))) +
  geom_line(linewidth = 1.2) +
  geom_point(data = model_data, aes(x = wt, y = mpg), 
             color = "black", alpha = 0.6, inherit.aes = FALSE) +
  scale_color_manual(values = c("100" = "#22c55e", "200" = "#eab308", "300" = "#ef4444"),
                     name = "Horsepower") +
  labs(
    title = "Weight-Horsepower Interaction Effects on Fuel Efficiency",
    subtitle = "The weight penalty varies by horsepower level",
    x = "Weight (1000 lbs)",
    y = "Predicted MPG"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray60"),
    legend.position = "bottom"
  )
The interaction reveals that weight matters less for high-horsepower cars. This makes intuitive sense: if you're already burning fuel for a powerful engine, the extra weight penalty is relatively smaller than for an efficient, low-power engine.
Step 5

Practical Model Selection Guidelines

When to Add Variables

Decision Framework:
  1. Theoretical justification: Does the variable make scientific sense?
  2. Statistical significance: Does ANOVA show significant improvement? (p < 0.05)
  3. Practical significance: Is the R-squared improvement meaningful?
  4. Sample size: Do you have at least 10-20 observations per parameter?

Model Selection Checklist

Check Our Example Result
Theoretical sense Weight and horsepower should affect fuel efficiency ✅ Pass
ANOVA significance p < 0.001 for interaction ✅ Pass
R-squared improvement 0.753 → 0.885 (+13.2%) ✅ Pass
Sample size 32 observations, 4 parameters (8:1 ratio) ✅ Pass
Residual patterns Random scatter in diagnostics ✅ Pass

Common Pitfalls to Avoid

  • Stepwise selection: Automated variable selection often produces unstable results
  • P-hacking: Testing many variables increases the chance of false positives
  • Overfitting: Complex models may not generalize to new data
  • Ignoring multicollinearity: Highly correlated predictors can cause unstable estimates
  • Model complexity without justification: Simpler models are often more interpretable and robust
# Final model validation
# Check multicollinearity with VIF (Variance Inflation Factor)
library(car)  # for vif() function

# For interaction models, we need to center variables first
mtcars_centered <- mtcars %>%
  mutate(
    wt_c = wt - mean(wt),
    hp_c = hp - mean(hp)
  )

validation_model <- glmmTMB(mpg ~ wt_c * hp_c, data = mtcars_centered)

# Note: VIF calculation for glmmTMB requires extracting the linear model part
# For practical purposes, we can check the correlation matrix
# High correlations (>0.7) between predictors suggest multicollinearity

correlation_check <- mtcars %>%
  select(wt, hp) %>%
  cor()

print("Correlation between predictors:")
print(round(correlation_check, 3))
Correlation between predictors:
     wt    hp
wt 1.000 0.659
hp 0.659 1.000
The moderate correlation (0.659) between weight and horsepower is acceptable for our model. Values above 0.8-0.9 would suggest problematic multicollinearity that could make coefficient estimates unstable.
Step 6

Summary and Next Steps

What You've Accomplished

✅ Completed Learning Objectives

  • Built and compared multiple competing models systematically
  • Used ANOVA to test statistical significance of model improvements
  • Interpreted AIC, BIC, and R-squared for model selection decisions
  • Discovered and interpreted a meaningful interaction effect
  • Applied practical guidelines for responsible model building

Key Insights from Our Analysis

  • Systematic approach works: Building models incrementally helps identify which variables add value
  • Interactions matter: The relationship between weight and fuel efficiency depends on horsepower
  • Multiple metrics tell different stories: Use R-squared, AIC, BIC, and ANOVA together for robust decisions
  • Theory guides statistics: Understanding the domain helps interpret statistical results
Practice Exercises:
  1. Try the same analysis with the diamonds dataset (price ~ carat + cut + color)
  2. Explore polynomial terms: does mpg ~ wt + I(wt^2) improve fit?
  3. Test three-way interactions: mpg ~ wt * hp * cyl (watch for overfitting!)
  4. Apply these techniques to your own research data

Advanced Topics to Explore Next

  • Cross-validation: Testing model performance on holdout data
  • Regularization: LASSO and Ridge regression for automatic variable selection
  • Non-linear models: GAMs (Generalized Additive Models) for flexible relationships
  • Mixed-effects models: Handling hierarchical or repeated-measures data
  • Model diagnostics: Advanced residual analysis and assumption checking
Building Your Statistical Toolkit: You now have the core skills for responsible model building: systematic comparison, formal testing, and practical guidelines. These principles apply whether you're working with simple regression or advanced machine learning methods.