Model Comparison and Selection in R: Building on Statistical Modeling Fundamentals

Lesson 2

Introduction to Model Comparison

In our previous lesson, we built a simple model predicting car fuel efficiency from weight. But what if we could improve our predictions by adding more variables? This lesson teaches you how to compare models systematically and decide when adding complexity is worthwhile.

Model selection is one of the most important skills in statistical analysis. Too simple, and you miss important relationships. Too complex, and you overfit to your specific dataset. Today we'll learn principled approaches to finding the right balance.

What You'll Learn

Fit and compare multiple competing models
Use ANOVA to test if adding variables improves model fit
Interpret AIC, BIC, and R-squared for model selection
Apply practical guidelines for building better models
Understand the trade-off between complexity and overfitting

Before You Begin:

Completed the basic statistical modeling tutorial
Understanding of R-squared and p-values
Same packages: tidyverse, glmmTMB, broom.mixed

We'll continue using the mtcars dataset, but now we'll explore whether adding horsepower and cylinder count improves our weight-based model for predicting fuel efficiency.

Step 1

Building Competing Models

Let's start by fitting several competing models, each with different combinations of predictors. We'll systematically add variables to see which combinations provide the best predictions.

# Load our familiar packages
library(tidyverse)
library(glmmTMB)
library(broom.mixed)

# Load data
data(mtcars)

# Examine correlations between potential predictors
mtcars %>% 
  select(mpg, wt, hp, cyl) %>% 
  cor() %>% 
  round(3)

       mpg     wt     hp    cyl
mpg  1.000 -0.868 -0.776 -0.852
wt  -0.868  1.000  0.659  0.782
hp  -0.776  0.659  1.000  0.832
cyl -0.852  0.782  0.832  1.000

Understanding the Relationships

The correlation matrix reveals important patterns:

Strong negative correlations: All predictors negatively correlate with mpg (heavier, more powerful cars get worse mileage)
Multicollinearity: Weight, horsepower, and cylinders are moderately correlated (0.66-0.83)
Selection challenge: We need to balance additional predictive power against redundancy

# Fit competing models systematically
model1 <- glmmTMB(mpg ~ wt, data = mtcars)                    # Baseline: weight only
model2 <- glmmTMB(mpg ~ wt + hp, data = mtcars)               # Add horsepower
model3 <- glmmTMB(mpg ~ wt + hp + cyl, data = mtcars)         # Add cylinders
model4 <- glmmTMB(mpg ~ wt * hp, data = mtcars)               # Weight-horsepower interaction

# Create a summary function for easy comparison
summarize_model <- function(model) {
  model_data <- augment(model)
  r_squared <- 1 - var(model_data$.resid) / var(mtcars$mpg)
  
  data.frame(
    R_squared = round(r_squared, 3),
    AIC = round(AIC(model), 1),
    BIC = round(BIC(model), 1),
    RMSE = round(sqrt(mean(model_data$.resid^2)), 2)
  )
}

# Compare all models
model_comparison <- bind_rows(
  summarize_model(model1),
  summarize_model(model2),
  summarize_model(model3),
  summarize_model(model4)
) %>%
  mutate(Model = c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP"))

print(model_comparison)

Model	R-squared	AIC	BIC	RMSE
Weight Only	0.753	166.0	170.4	2.95
Weight + HP	0.827	156.7	162.5	2.44
Weight + HP + Cyl	0.843	155.5	162.8	2.32
Weight × HP	0.885	145.6	152.9	1.98

Each model shows improvement in R-squared and lower AIC values, suggesting that additional complexity is justified. The interaction model (Weight × HP) performs best, capturing how the effect of weight on fuel efficiency depends on horsepower.

Step 2

Statistical Testing with ANOVA

Comparing model fit metrics gives us a sense of which models perform better, but ANOVA provides formal statistical tests of whether adding variables significantly improves model fit.

# Use ANOVA to test nested models
anova_result <- anova(model1, model2, model3)
print(anova_result)

Data: mtcars
Models:
model1: mpg ~ wt, zi=~0, disp=~1
model2: mpg ~ wt + hp, zi=~0, disp=~1
model3: mpg ~ wt + hp + cyl, zi=~0, disp=~1
       Df    AIC    BIC  logLik deviance   Chisq Chi Df Pr(>Chisq)    
model1  3 166.03 170.43 -80.015   160.03                              
model2  4 156.65 162.51 -74.326   148.65 11.3771      1  0.0007436 ***
model3  5 155.48 162.81 -72.738   145.48  3.1757      1  0.0747407 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpreting ANOVA Results

Model 1 → Model 2: Adding horsepower significantly improves fit (p = 0.0007)
Model 2 → Model 3: Adding cylinders shows marginal improvement (p = 0.075)
Chi-square test: Tests whether the reduction in deviance is statistically significant
Degrees of freedom: Each additional parameter costs 1 degree of freedom

# Test the interaction model separately
# (Can't include in nested ANOVA since it's not nested)
anova_interaction <- anova(model2, model4)
print(anova_interaction)

Data: mtcars
Models:
model2: mpg ~ wt + hp, zi=~0, disp=~1
model4: mpg ~ wt * hp, zi=~0, disp=~1
       Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)   
model2  4 156.65 162.51 -74.326   148.65                           
model4  5 145.61 152.94 -67.805   135.61 13.044      1   0.000304 ***

The interaction term is highly significant (p = 0.0003), indicating that the effect of weight on fuel efficiency really does depend on horsepower. This suggests that weight matters more for high-horsepower cars than low-horsepower cars (or vice versa).

Interpretation Challenge: Why might the weight-horsepower interaction be significant? Think about different types of cars: a heavy sports car vs. a heavy truck. How might weight affect fuel efficiency differently for each?

Step 3

Understanding Model Fit Metrics

Different metrics tell us different things about model quality. Let's understand what each one means and when to use them for model selection.

R-squared: Proportion of Variance Explained

# Visualize R-squared improvement
r_squared_data <- data.frame(
  Model = factor(c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP"),
                 levels = c("Weight Only", "Weight + HP", "Weight + HP + Cyl", "Weight × HP")),
  R_squared = c(0.753, 0.827, 0.843, 0.885),
  Variables = c(1, 2, 3, 3)  # Interaction counts as 3 parameters total
)

ggplot(r_squared_data, aes(x = Variables, y = R_squared)) +
  geom_line(color = "#7c3aed", linewidth = 1.2) +
  geom_point(color = "#7c3aed", size = 3) +
  geom_text(aes(label = Model), vjust = -0.5, size = 3.5) +
  scale_y_continuous(limits = c(0.7, 0.9), labels = scales::percent_format()) +
  labs(
    title = "Model Performance: R-squared vs Complexity",
    subtitle = "Each additional variable explains more variance",
    x = "Number of Predictor Variables",
    y = "R-squared (Variance Explained)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray60")
  )

AIC and BIC: Balancing Fit and Complexity

# Compare information criteria
ic_data <- model_comparison %>%
  select(Model, AIC, BIC) %>%
  pivot_longer(cols = c(AIC, BIC), names_to = "Criterion", values_to = "Value") %>%
  mutate(Model = factor(Model, levels = c("Weight Only", "Weight + HP", 
                                         "Weight + HP + Cyl", "Weight × HP")))

ggplot(ic_data, aes(x = Model, y = Value, fill = Criterion)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(values = c("AIC" = "#2563eb", "BIC" = "#dc2626")) +
  labs(
    title = "Information Criteria Comparison",
    subtitle = "Lower values indicate better models (accounting for complexity)",
    x = "Model",
    y = "Information Criterion Value"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray60")
  )

Key Differences Between Metrics

Metric	What it Measures	Good for	Limitation
R-squared	Proportion of variance explained	Understanding explanatory power	Always increases with more variables
AIC	Fit penalized by number of parameters	Model selection, prediction focus	Can overfit with small samples
BIC	Fit with stronger complexity penalty	Finding "true" model, larger samples	May be too conservative
RMSE	Average prediction error	Practical prediction accuracy	Same units as outcome variable

Notice that AIC and BIC both favor the interaction model, but BIC (which penalizes complexity more) shows smaller differences between models. This suggests the interaction is robust and worth the added complexity.

Step 4

Model Selection Strategy

With multiple competing models, how do we choose? The best approach depends on your goals: prediction accuracy, scientific understanding, or model simplicity.

The Final Model: Weight × Horsepower Interaction

# Examine our best model in detail
best_model <- model4  # Weight × HP interaction
model_summary <- tidy(best_model)
print(model_summary)

# Get model fit statistics
fit_stats <- glance(best_model)
print(fit_stats)

# Model coefficients
# A tibble: 4 × 7
  effect component term        estimate std.error statistic  p.value
                                 
1 fixed  cond      (Intercept)  49.8       3.31       15.0  4.20e-16
2 fixed  cond      wt           -8.22      1.60       -5.13 2.48e- 5
3 fixed  cond      hp           -0.120     0.0205     -5.87 4.30e- 6
4 fixed  cond      wt:hp         0.0279    0.00774     3.61 1.15e- 3

# Model fit statistics
   nobs sigma logLik   AIC   BIC deviance df.residual
1    32  1.98  -67.8  145.6 152.9     124.        28

Interpreting the Interaction Model

The final model equation is: mpg = 49.8 - 8.22×wt - 0.120×hp + 0.0279×wt×hp

Weight main effect (-8.22): For low-horsepower cars, each 1000 lbs reduces mpg by ~8.2
Horsepower main effect (-0.120): For light cars, each additional hp reduces mpg by ~0.12
Interaction term (+0.0279): The weight penalty is less severe for high-horsepower cars

# Visualize the interaction effect
model_data <- augment(best_model)

# Create predictions for visualization
prediction_grid <- expand_grid(
  wt = seq(1.5, 5.5, 0.1),
  hp = c(100, 200, 300)  # Low, medium, high horsepower
)

prediction_grid$predicted_mpg <- predict(best_model, newdata = prediction_grid)

# Plot the interaction
ggplot(prediction_grid, aes(x = wt, y = predicted_mpg, color = factor(hp))) +
  geom_line(linewidth = 1.2) +
  geom_point(data = model_data, aes(x = wt, y = mpg), 
             color = "black", alpha = 0.6, inherit.aes = FALSE) +
  scale_color_manual(values = c("100" = "#22c55e", "200" = "#eab308", "300" = "#ef4444"),
                     name = "Horsepower") +
  labs(
    title = "Weight-Horsepower Interaction Effects on Fuel Efficiency",
    subtitle = "The weight penalty varies by horsepower level",
    x = "Weight (1000 lbs)",
    y = "Predicted MPG"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray60"),
    legend.position = "bottom"
  )

The interaction reveals that weight matters less for high-horsepower cars. This makes intuitive sense: if you're already burning fuel for a powerful engine, the extra weight penalty is relatively smaller than for an efficient, low-power engine.

Step 5

Practical Model Selection Guidelines

When to Add Variables

Decision Framework:

Theoretical justification: Does the variable make scientific sense?
Statistical significance: Does ANOVA show significant improvement? (p < 0.05)
Practical significance: Is the R-squared improvement meaningful?
Sample size: Do you have at least 10-20 observations per parameter?

Model Selection Checklist

Check	Our Example	Result
Theoretical sense	Weight and horsepower should affect fuel efficiency	✅ Pass
ANOVA significance	p < 0.001 for interaction	✅ Pass
R-squared improvement	0.753 → 0.885 (+13.2%)	✅ Pass
Sample size	32 observations, 4 parameters (8:1 ratio)	✅ Pass
Residual patterns	Random scatter in diagnostics	✅ Pass

Common Pitfalls to Avoid

Stepwise selection: Automated variable selection often produces unstable results
P-hacking: Testing many variables increases the chance of false positives
Overfitting: Complex models may not generalize to new data
Ignoring multicollinearity: Highly correlated predictors can cause unstable estimates
Model complexity without justification: Simpler models are often more interpretable and robust

# Final model validation
# Check multicollinearity with VIF (Variance Inflation Factor)
library(car)  # for vif() function

# For interaction models, we need to center variables first
mtcars_centered <- mtcars %>%
  mutate(
    wt_c = wt - mean(wt),
    hp_c = hp - mean(hp)
  )

validation_model <- glmmTMB(mpg ~ wt_c * hp_c, data = mtcars_centered)

# Note: VIF calculation for glmmTMB requires extracting the linear model part
# For practical purposes, we can check the correlation matrix
# High correlations (>0.7) between predictors suggest multicollinearity

correlation_check <- mtcars %>%
  select(wt, hp) %>%
  cor()

print("Correlation between predictors:")
print(round(correlation_check, 3))

Correlation between predictors:
     wt    hp
wt 1.000 0.659
hp 0.659 1.000

The moderate correlation (0.659) between weight and horsepower is acceptable for our model. Values above 0.8-0.9 would suggest problematic multicollinearity that could make coefficient estimates unstable.

Step 6

Summary and Next Steps

What You've Accomplished

✅ Completed Learning Objectives

Built and compared multiple competing models systematically
Used ANOVA to test statistical significance of model improvements
Interpreted AIC, BIC, and R-squared for model selection decisions
Discovered and interpreted a meaningful interaction effect
Applied practical guidelines for responsible model building

Key Insights from Our Analysis

Systematic approach works: Building models incrementally helps identify which variables add value
Interactions matter: The relationship between weight and fuel efficiency depends on horsepower
Multiple metrics tell different stories: Use R-squared, AIC, BIC, and ANOVA together for robust decisions
Theory guides statistics: Understanding the domain helps interpret statistical results

Practice Exercises:

Try the same analysis with the diamonds dataset (price ~ carat + cut + color)
Explore polynomial terms: does mpg ~ wt + I(wt^2) improve fit?
Test three-way interactions: mpg ~ wt * hp * cyl (watch for overfitting!)
Apply these techniques to your own research data

Advanced Topics to Explore Next

Cross-validation: Testing model performance on holdout data
Regularization: LASSO and Ridge regression for automatic variable selection
Non-linear models: GAMs (Generalized Additive Models) for flexible relationships
Mixed-effects models: Handling hierarchical or repeated-measures data
Model diagnostics: Advanced residual analysis and assumption checking

Building Your Statistical Toolkit: You now have the core skills for responsible model building: systematic comparison, formal testing, and practical guidelines. These principles apply whether you're working with simple regression or advanced machine learning methods.