预计阅读

贝叶斯计算扩展包 RStan & brms





1 Stan 编码

考虑一个 Stan 编码的线性模型,代码来自 Stan 用户指南,如下:

data {
  int<lower=0> N;   // number of data items
  int<lower=0> K;   // number of predictors
  matrix[N, K] x;   // predictor matrix
  vector[N] y;      // outcome vector
}
parameters {
  real alpha;           // intercept
  vector[K] beta;       // coefficients for predictors
  real<lower=0> sigma;  // error scale
}
model {
  // alpha ~ normal(0, 1);  // prior
  // beta ~ normal(0, 1);   // prior
  y ~ normal(x * beta + alpha, sigma);  // likelihood
}

Stan 模型代码至少需要三块:数据、参数和模型,各个部分尽量采用向量化的编程方式,少用 for 循环。编译 Stan 模型,传递数据进去,采样计算获得结果。

2 Stan 配置

在 R 语言社区,很早就存在一个 R 包 rstan ,它整合了编译代码、运行采样、模型输出等功能。下面先加载 R 包,设置 Stan 运行参数,准备输入数据,拟合模型。

library(StanHeaders)
library(rstan)
# Stan 代码不变就不要重复编译模型
rstan_options(auto_write = TRUE)
# 单核跑模型
options(mc.cores = 1)
# 每条链用一个线程
rstan_options(threads_per_chain = 1)
# 准备数据
cars_d <- list(
  N = nrow(cars), # 观测记录的条数
  K = 1,          # 协变量个数
  x = cars[, "speed", drop = F], # N x 1 矩阵
  y = cars[, "dist"]             # N 向量
)

设置拟合模型的算法的各项参数。

# 拟合模型
fit_rstan_cars <- stan(
  file = "code/cars.stan",   # stan 代码文件路径
  data = cars_d,             # 观测数据
  iter = 3000,               # 每条链迭代次数
  warmup = 1000,             # 初始化阶段迭代次数
  chains = 4,                # MCMC 链条数
  model_name = "cars_model", # 模型名称
  verbose = FALSE,           # 不显示中间输出
  algorithm = "HMC",         # 指定 HMC 算法
  seed = 2022,               # 设置随机数种子
  control = list(adapt_delta = 0.8),
  refresh = 0                # 不显示采样进度
)

3 Stan 输出

fit_rstan_cars 是一个模型输出对象,可以用函数 print() 打印结果。

# 模型结果
print(fit_rstan_cars, probs = c(0.025, 0.5, 0.975), digits_summary = 3)
## Inference for Stan model: cars_model.
## 4 chains, each with iter=3000; warmup=1000; thin=1; 
## post-warmup draws per chain=2000, total post-warmup draws=8000.
## 
##             mean se_mean    sd     2.5%      50%    97.5% n_eff  Rhat
## alpha    -17.606   0.070 6.943  -31.268  -17.640   -3.751  9719 1.000
## beta[1]    3.935   0.004 0.429    3.077    3.933    4.781 10281 1.000
## sigma     15.772   0.025 1.697   12.868   15.618   19.507  4593 1.001
## lp__    -159.481   0.024 1.266 -162.730 -159.170 -158.033  2888 1.000
## 
## Samples were drawn using HMC(diag_e) at Mon Mar 10 10:37:39 2025.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

设置 4 条迭代链,每条链初始化阶段 warmup 迭代 1000 次,总迭代 3000 次,则初始化后阶段迭代 2000 次,4 条链总迭代 8000 次。

参数 alpha 是截距,beta[1] 是斜率,sigma 是标准差,lp__ 后验似然的对数,n_eff 表示采样效率,值越大越好,Rhat 表示马氏链混合效果,越接近 1 越好。

4 分布检查

4.1 参数迭代轨迹

以参数 \(\alpha\) 的迭代轨迹为例,如下图所示:

library(ggplot2)
stan_trace(fit_rstan_cars, pars = "alpha") +
  labs(y = expression(alpha), color = "链条")
参数 $\alpha$ 的迭代轨迹

图 1: 参数 \(\alpha\) 的迭代轨迹

可见,4条链在初始化阶段之后的混合很好,很平稳。参数 \(\alpha\)\(\beta_1\) 的联合分布如下:

stan_scat(fit_rstan_cars,
  pars = c("alpha", "beta[1]"),
  size = 1, color = "lightblue"
) +
  geom_point(
    data = data.frame(alpha = -17.579, beta1 = 3.932),
    aes(x = alpha, y = beta1), color = "orange", size = 2
  ) +
  labs(x = expression(alpha), y = expression(beta[1]))
参数 $\alpha$ 和 $\beta_1$ 的联合分布

图 2: 参数 \(\alpha\)\(\beta_1\) 的联合分布

理论上,参数 \(\alpha\)\(\beta_1\) 的联合分布是二元正态分布,图上所示也是吻合的。

4.2 参数后验分布

根据每条马氏链的迭代值,获得参数 \(\alpha\) 的后验分布,每条链都是接近正态分布。

stan_dens(fit_rstan_cars, pars = "alpha", 
          separate_chains = TRUE, alpha = 0.35) +
  labs(x = expression(alpha), fill = "链条")
参数 $\alpha$ 的核密度估计

图 3: 参数 \(\alpha\) 的核密度估计

分别查看参数 \(\alpha\)\(\beta_1\) 的后验分布。

stan_dens(fit_rstan_cars, pars = "alpha", fill = "lightblue") +
  geom_vline(xintercept = -17.579, color = "orange", lwd = 1) +
  labs(x = expression(alpha))

stan_dens(fit_rstan_cars, pars = "beta[1]", fill = "lightblue") +
  geom_vline(xintercept = 3.932, color = "orange", lwd = 1) +
  labs(x = expression(beta[1]))
参数的后验分布参数的后验分布

图 4: 参数的后验分布

5 brms 包

Paul-Christian Bürkner 开发和维护的brms 包是站在 RStan 和 cmdstanr 包肩膀上扩展包,封装了大量基于 Stan 语言的贝叶斯统计模型,可以拟合相当广泛的模型,线性模型不在话下。

fit_brm_cars <- brms::brm(
  dist ~ speed, # 使用语法与 lme4 包类似
  data = cars, # 数据集
  family = gaussian(), # 响应变量的分布
  backend = "cmdstanr", # 调用 cmdstanr 还是 rstan 包
  algorithm = "sampling", # Stan 采样器
  refresh = 0, # 不显示采样详细过程
  seed = 2022, cores = 1, # 设置随机数种子和不并行
  chains = 4, iter = 2000 # 4条马氏链迭代2000次
)
## Start sampling
## Running MCMC with 4 sequential chains...
## 
## Chain 1 finished in 0.0 seconds.
## Chain 2 finished in 0.0 seconds.
## Chain 3 finished in 0.0 seconds.
## Chain 4 finished in 0.0 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.7 seconds.

此线性模型是非常简单的,模型拟合时间非常短。拟合模型的输出结果如下:

fit_brm_cars
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: dist ~ speed 
##    Data: cars (Number of observations: 50) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   -17.57      6.99   -31.59    -3.97 1.00     3667     2596
## speed         3.92      0.43     3.09     4.76 1.00     3792     2995
## 
## Further Distributional Parameters:
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    15.71      1.70    12.78    19.40 1.00     3915     2523
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

可见 brms 包与 rstan 包拟合模型的输出结果是非常相近的。各个参数的后验分布如下:

plot(fit_brm_cars)
参数的后验分布和迭代轨迹

图 5: 参数的后验分布和迭代轨迹

在得知迭代过程平稳,后验分布也已获得,接下来,想要知道模型的效果,这有一些评估指标,函数 brms::loo() 基于后验似然做近似交叉留一验证。类似 AIC 和 BIC,针对贝叶斯模型这里有 LOOIC 以及 Pareto k 诊断。

brms::loo(fit_brm_cars)
## 
## Computed from 4000 by 50 log-likelihood matrix.
## 
##          Estimate   SE
## elpd_loo   -210.1  6.5
## p_loo         3.5  1.3
## looic       420.2 13.0
## ------
## MCSE of elpd_loo is 0.1.
## MCSE and ESS estimates assume MCMC draws (r_eff in [0.7, 1.0]).
## 
## All Pareto k estimates are good (k < 0.7).
## See help('pareto-k-diagnostic') for details.

类似调整的 \(R^2\) ,这里也可以计算贝叶斯 \(R^2\)

brms::bayes_R2(fit_brm_cars)
##    Estimate Est.Error  Q2.5  Q97.5
## R2    0.641   0.05341 0.514 0.7163

与频率派结果对照。

m <- lm(dist ~ speed, data = cars)
summary(m)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -29.07  -9.53  -2.27   9.21  43.20 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.579      6.758   -2.60    0.012 *  
## speed          3.932      0.416    9.46  1.5e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.4 on 48 degrees of freedom
## Multiple R-squared:  0.651,	Adjusted R-squared:  0.644 
## F-statistic: 89.6 on 1 and 48 DF,  p-value: 1.49e-12