本文假定读者对简单线性模型（Simple Linear Models）已经比较熟悉了，在其他条件不变的情况下，当响应变量不是高斯分布，而是其他分布时，如何拟合模型呢？在 R 语言中，内置的函数 glm() 可以拟合响应变量的分布属于指数族的模型。这类模型单独取了个名字，叫广义线性模型（Generalized Linear Models）。典型的情况有这样一些，响应变量服从泊松分布、伯努利分布、二项分布、指数分布、伽马分布等。这些常见的分布都可以写成一个指数族（Exponential family）的统一形式，如下。

\[ f(y|x;\theta,\phi) = \exp\Big\{\frac{y\theta-b(\theta)}{a(\phi)}+c(y,\theta)\Big\} \]

其中，\(\theta,\phi\) 是未知的参数，\(\{a(\cdot),b(\cdot),c(\cdot)\}\) 是已知的函数，\(x\) 是已知的协变量的观测值。\(\theta\) 是依赖于 \(x\) 的的自然参数（natural parameter，这个参数的形式仅取决于分布的形式），\(\phi\) 是离散参数（dispersion parameter），更多介绍见文末参考文献。规范联系函数（canonical link function）由自然参数决定。

1 泊松分布

响应变量服从泊松分布的广义线性模型（见下面的方程），在环境受到核辐射污染问题上，可用来刻画辐射强度（单位时间内辐射源释放的粒子数），在汽车保险的场景中，可以用来刻画事故发生率（单位时间内事故发生的次数，一段时间内某投保群体发生事故的次数）。

\[ \begin{aligned} & \bm{y} \sim \mathrm{Poisson}(\bm{\lambda}) \\ & \log(\bm{\lambda}) = X\bm{\beta} \end{aligned} \]

其中，\(\bm{y}\) 是响应变量对应的数据向量，\(\bm{\lambda}\) 是建模的对象，是未知的（待估）向量，\(\mathrm{log}\) 表示对数变换，\(\bm{\beta}\) 也是未知的向量，是待估的回归系数，\(X\) 是由样本观察值组成的矩阵，是已知的。

在 R 语言中，泊松分布\(\mathrm{Poisson}(\lambda)\)的概率密度函数（可查看函数dpois()的帮助文档）如下：

\[ p(x) = \frac{\lambda^x\mathrm{e}^{-\lambda}}{x!},\quad x = 0, 1, 2, \ldots \]

泊松分布的期望与方差都是 \(\lambda\) 。

1.1 函数 `glm()` 拟合模型

在 R 语言中，用函数 glm() 来拟合响应变量服从泊松分布的广义线性模型，在参数 family 指定泊松分布及对应的联系函数（Link function）。

set.seed(2026)
x <- rnorm(n = 100)
# 线性预测 linear predictor 设定为 1 + x
# 生成 100 个服从泊松分布的随机数
y <- rpois(n = 100, lambda = exp(1 + x))
# 拟合泊松回归模型
fit1 <- glm(y ~ x, family = poisson(link = "log"))
summary(fit1)

## 
## Call:
## glm(formula = y ~ x, family = poisson(link = "log"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.0759     0.0648    16.6   <2e-16 ***
## x             0.9283     0.0509    18.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 452.78  on 99  degrees of freedom
## Residual deviance: 101.52  on 98  degrees of freedom
## AIC: 370.2
## 
## Number of Fisher Scoring iterations: 5

在输出结果中，可以看到回归系数的估计 \(\hat{\bm{\beta}}\) 是比较接近预设的值 \((1,1)^\top\) ，说明这个模拟过程没有什么问题。

1.2 离散参数与函数 `quasipoisson()`

模型输出结果中，包含如下一行内容。

(Dispersion parameter for poisson family taken to be 1)

一个分布的离散程度是用该分布的方差与均值的比值来表示。泊松分布的离散参数 Dispersion parameter 设为 1，因为泊松分布的方差和均值是一样的。

值得注意，这里的离散参数和分布的变异系数 coefficient of variation 是不同的。变异系数是标准差（方差的平方根）与均值的比值（无量纲）。

# 对数似然
logLik(fit1)

## 'log Lik.' -183.1 (df=2)

当参数 family 设定为 quasipoisson(link = "log") ，可以看到稍微不同的结果。

fit2 <- glm(y ~ x, family = quasipoisson(link = "log"))
summary(fit2)

## 
## Call:
## glm(formula = y ~ x, family = quasipoisson(link = "log"))
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.0759     0.0623    17.3   <2e-16 ***
## x             0.9283     0.0490    18.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 0.9257)
## 
##     Null deviance: 452.78  on 99  degrees of freedom
## Residual deviance: 101.52  on 98  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

回归系数是完全一样的，但是其相应的标准误差略有差别，离散参数的估计值是 0.9257 而不是 1 。分布族 family 参数值从 poisson(link = "log") 到 quasipoisson(link = "log") ，区别在于离散参数是当作固定值还是为待估参数。

1.3 函数 `quasi()` 的说明

在分布族的设定中，还可以使用函数 quasi() ，如下。

fit3 <- glm(y ~ x, family = quasi(variance = "mu", link = "log"))
summary(fit3)

## 
## Call:
## glm(formula = y ~ x, family = quasi(variance = "mu", link = "log"))
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.0759     0.0623    17.3   <2e-16 ***
## x             0.9283     0.0490    18.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 0.9257)
## 
##     Null deviance: 452.78  on 99  degrees of freedom
## Residual deviance: 101.52  on 98  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

函数 quasi() 的参数 variance 对应于分布的方差，link 代表广义线性模型的联系函数。在函数 glm() 的符号表示中，用 mu 表示响应变量的分布的均值，对泊松分布，其方差和期望是一样的。

从输出结果来看，family = quasi(variance = "mu", link = "log") 与 family = quasipoisson(link = "log") 的效果是一样的。

1.4 VGAM 和 glmnet 包

在 R 语言社区中，相比于 R 函数 glm() 内置的分布类型，扩展包 VGAM 支持更加丰富的分布类型。对于泊松分布回归模型，代码如下：

library(VGAM)
fit4 <- vglm(y ~ x, family = poissonff(link = "loglink"))
summary(fit4)

## 
## Call:
## vglm(formula = y ~ x, family = poissonff(link = "loglink"))
## 
## Coefficients: 
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.0759     0.0648    16.6   <2e-16 ***
## x             0.9283     0.0509    18.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Name of linear predictor: loglink(lambda) 
## 
## Residual deviance: 101.5 on 98 degrees of freedom
## 
## Log-likelihood: -183.1 on 98 degrees of freedom
## 
## Number of Fisher scoring iterations: 5

从输出结果来看，和前面在函数 glm() 中设置分布族 family = poisson(link = "log") 时，是一样的。

glmnet 包不仅包含函数 glm() 的功能，还支持变量选择（Lasso），处理共线性问题（岭回归）。

library(glmnet)
# lambda = 0 去掉惩罚项
# 退化为一般的 GLM 
fit5 = glmnet(x = cbind(1, x), y = y, family = "poisson", lambda = 0)
# 回归系数
coef(fit5)

## 3 x 1 sparse Matrix of class "dgCMatrix"
##                 s0
## (Intercept) 1.0759
##             .     
## x           0.9283

# Null deviance
fit5$nulldev

## [1] 452.8

1.5 漂移项 offset

对于响应变量服从泊松分布的情况，在实际应用场景中，出现漂移项是很常见的，本章开篇所举两个实际例子都是带有漂移项。所谓漂移项，它不是模型的参数。

set.seed(2026)
x <- rnorm(n = 100)
# 准备漂移项
m <- rep(c(5, 10), each = 50)
# 线性预测 linear predictor 设定为 1 + x
# 生成 100 个服从泊松分布的随机数
y <- rpois(n = 100, lambda = m * exp(1 + x))
# 拟合带漂移项的泊松回归模型
fit5 <- glm(y ~ x + offset(log(m)), family = poisson(link = "log"))
summary(fit5)

## 
## Call:
## glm(formula = y ~ x + offset(log(m)), family = poisson(link = "log"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.0134     0.0246    41.2   <2e-16 ***
## x             0.9892     0.0182    54.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3207.160  on 99  degrees of freedom
## Residual deviance:   81.836  on 98  degrees of freedom
## AIC: 557.5
## 
## Number of Fisher Scoring iterations: 4

glmnet 包拟合的结果如下：

library(glmnet)
fit6 <- glmnet(
  x = cbind(1, x), y = y, offset = log(m),
  family = "poisson", lambda = 0
)
coef(fit6)

## 3 x 1 sparse Matrix of class "dgCMatrix"
##                 s0
## (Intercept) 1.0134
##             .     
## x           0.9892

1.6 泊松分布还是二项分布？

一个关于车险的带漂移项的真实数据案例见 MASS 包的数据 Insurance 的帮助文档（输入命令 help(Insurance, package="MASS")可查看帮助文档）。

数据集 Insurance 共有 5 个变量，其中， Holders 表示持有车险保单的数量，Claims 表示提出索赔的数量。

# 加载数据
data(Insurance, package = "MASS")
str(Insurance)

## 'data.frame':	64 obs. of  5 variables:
##  $ District: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Group   : Ord.factor w/ 4 levels "<1l"<"1-1.5l"<..: 1 1 1 1 2 2 2 2 3 3 ...
##  $ Age     : Ord.factor w/ 4 levels "<25"<"25-29"<..: 1 2 3 4 1 2 3 4 1 2 ...
##  $ Holders : int  197 264 246 1680 284 536 696 3582 133 286 ...
##  $ Claims  : int  38 35 20 156 63 84 89 400 19 52 ...

根据前面的基于模拟数据的模型介绍，下面基于数据集 Insurance 的代码如下。

# 默认设置
options(contrasts = c(
  unordered = "contr.treatment",
  ordered = "contr.poly"
)) # 模型处理 factor 变量的策略

fit7 = glm(Claims ~ District + Group + Age + offset(log(Holders)),
    data = Insurance, family = poisson(link = "log"))
summary(fit7)

## 
## Call:
## glm(formula = Claims ~ District + Group + Age + offset(log(Holders)), 
##     family = poisson(link = "log"), data = Insurance)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.810508   0.032972  -54.91  < 2e-16 ***
## District2    0.025868   0.043016    0.60  0.54760    
## District3    0.038524   0.050512    0.76  0.44566    
## District4    0.234205   0.061673    3.80  0.00015 ***
## Group.L      0.429708   0.049459    8.69  < 2e-16 ***
## Group.Q      0.004632   0.041988    0.11  0.91215    
## Group.C     -0.029294   0.033069   -0.89  0.37570    
## Age.L       -0.394432   0.049404   -7.98  1.4e-15 ***
## Age.Q       -0.000355   0.048918   -0.01  0.99421    
## Age.C       -0.016737   0.048478   -0.35  0.72991    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 236.26  on 63  degrees of freedom
## Residual deviance:  51.42  on 54  degrees of freedom
## AIC: 388.7
## 
## Number of Fisher Scoring iterations: 4

根据问题的背景，可不可以理解为持有车险保单的人当中提出索赔的人的比例？从模型的角度理解为索赔率。因此，新的问题是响应变量假定服从带漂移项的泊松分布还是二项分布？

# 响应变量是带漂移项的泊松分布与二项分布的区别
fit8 = glm(cbind(Claims, Holders-Claims) ~ District + Group + Age,
           data = Insurance, family = binomial(link = "logit"))
summary(fit8)

## 
## Call:
## glm(formula = cbind(Claims, Holders - Claims) ~ District + Group + 
##     Age, family = binomial(link = "logit"), data = Insurance)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.62302    0.03621  -44.82  < 2e-16 ***
## District2    0.03007    0.04640    0.65     0.52    
## District3    0.04488    0.05445    0.82     0.41    
## District4    0.27698    0.06753    4.10  4.1e-05 ***
## Group.L      0.50558    0.05417    9.33  < 2e-16 ***
## Group.Q      0.01158    0.04591    0.25     0.80    
## Group.C     -0.03392    0.03600   -0.94     0.35    
## Age.L       -0.47103    0.05513   -8.54  < 2e-16 ***
## Age.Q        0.00514    0.05415    0.09     0.92    
## Age.C       -0.01966    0.05325   -0.37     0.71    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 279.406  on 63  degrees of freedom
## Residual deviance:  63.088  on 54  degrees of freedom
## AIC: 387.4
## 
## Number of Fisher Scoring iterations: 4

对数似然越大越好，AIC 越小越好，上述两个模型的比较。

logLik(fit7)

## 'log Lik.' -184.4 (df=10)

logLik(fit8)

## 'log Lik.' -183.7 (df=10)

无论从 AIC 还是对数似然来看，将响应变量视为服从二项分布更好。

2 伯努利分布

响应变量服从伯努利分布的广义线性模型（见下面的方程），在互联网上，常用来刻画用户购买与否（购买率）的情况，广告的点击与否（点击率）等等。

\[ \begin{aligned} & \bm{y} \sim \mathrm{Bernoulli}(\bm{p}) \\ & \mathrm{logit}(\bm{p}) = X\bm{\beta} \end{aligned} \]

其中，\(\bm{y}\) 是响应变量对应的数据向量，\(\bm{p}\) 是建模的对象，是未知的（待估）向量，\(\mathrm{logit}\) 表示 logit 变换，\(\bm{\beta}\) 也是未知的向量，是待估的回归系数。无特殊说明时，大写字母表示矩阵，小写字母表示标量，加粗的小写字母表示向量。后续将会对此模型中涉及的统计概念，及其对应的 R 语言表示进行一一说明。

2.1 二项分布

在 R 语言中，二项分布的定义（可查看函数 rbinom() 的帮助文档）如下：

\[ p(x) = {n \choose x} p^x(1-p)^{n-x}, \quad x = 0, 1, \ldots,n \]

其中，\(p\) 代表成功概率，可解释为在 \(n\) 次独立可重复的实验中事件（比如点击或购买）发生的次数。二项分布是一个离散分布，它的期望是 \(np\)，方差是 \(np(1-p)\) 。当 \(n = 1\) 时，二项分布退化为伯努利分布，它的期望是 \(p\)，方差是 \(p(1-p)\) 。值得注意，它们的方差是随期望变化的。

2.2 Logistic 分布

在 R 语言中，Logistic 分布的概率密度函数的定义（可查看函数 dlogis() 的帮助文档）如下：

\[ f(x;\mu,\sigma) = \frac{1}{\sigma} \frac{\mathrm{e}^{(x-\mu)/\sigma}}{(1+\mathrm{e}^{(x-\mu)/\sigma})^2} \] 其中，\(\mu\) 被称为位置参数 location parameter，\(\sigma\) 被称为尺度参数 scale parameter。Logistic 的概率分布函数如下：

\[ F(x;\mu,\sigma) = \frac{1}{1 + \mathrm{e}^{-(x-\mu)/\sigma}} \] 它的期望是 \(\mu\) ，方差是 \(\frac{\pi^2}{3}\sigma^2\)。 Logistic 的概率分布函数（下左图）和概率密度函数（下右图）的图像如下。

2.3 logit 变换

与 Logistic 分布相关联的还有 logit 函数（变换）及其逆函数（变换）。

\[ \mathrm{logit}(p) = \log \frac{p}{1-p} \]

\[ \mathrm{logit}^{-1}(p) = \frac{\mathrm{e}^p}{1 + \mathrm{e}^p} = \frac{1}{1 + \mathrm{e}^{-p}} \]

从 logit 的逆变换可以看出，它与 Logistic 分布的概率分布函数很接近。如果令 Logistic 分布的参数 \(\mu=0,\sigma=1\) ，那么 logit 的逆变换就是 Logistic 分布的概率分布函数。

2.4 模拟二项分布

set.seed(2026)
x <- rnorm(100)
# 线性预测 linear predictor 设定为 1+x
# 对线性预测做 logit 逆变换，结果作为伯努利分布的参数 p
# 生成 100 个服从伯努利分布的随机数
y <- rbinom(n = 100, size = 1, prob = plogis(1 + x))
y

##   [1] 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0
##  [38] 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1
##  [75] 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1

伯努利分布是二项分布的特殊情况，因此，生成服从伯努利分布的随机数，只需在函数 rbinom() 中设置 size 为 1 （这个 size 指的是试验空间的大小，而不是样本量 n ）。另一个稍显麻烦的方式是通过函数 sample() 生成随机数。

# 生成 100 个服从伯努利分布的随机数
sample(x = c(0, 1), size = 100, replace = T, prob = c(0.4, 0.6))

##   [1] 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1
##  [38] 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1
##  [75] 0 1 0 1 1 1 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1

在 R 语言中，函数 glm() 可以拟合响应变量服从伯努利分布的广义线性模型。设置分布族的参数 family 为 binomial(link = "logit") ，这里联系函数为 logit 变换。

fit1 <- glm(y ~ x, family = binomial(link = "logit"))
summary(fit1)

## 
## Call:
## glm(formula = y ~ x, family = binomial(link = "logit"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.485      0.283    5.24  1.6e-07 ***
## x              0.751      0.282    2.66   0.0078 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 105.382  on 99  degrees of freedom
## Residual deviance:  97.179  on 98  degrees of freedom
## AIC: 101.2
## 
## Number of Fisher Scoring iterations: 4

在前面的模型参数设定中，回归系数 \(\bm{\beta}\) 是 \((1,1)^{\top}\) 。眼下，模拟实验结果显示回归系数的估计 \(\hat{\bm{\beta}}\) 有明显偏离。原因是样本量有点小，若样本量从 100 增加到 1000 ，回归系数的估计值会接近 \((1,1)^{\top}\) 。

2.5 函数 `quasibinomial()` 及其等价表示

在前面的输出结果中，有这么一行信息。

(Dispersion parameter for binomial family taken to be 1)

对于二项分布族的离散参数（Dispersion parameter）取值为 1。

如果不想让离散参数固定下来，可以用函数 quasibinomial() 替代 binomial() 。当遇到真实的数据时，数据一般是不会严丝合缝地服从二项分布的。样本均值与样本方差的关系不是精确的 \(p\) 和 \(p(1-p)\) 的关系。

fit2 <- glm(y ~ x, family = quasibinomial(link = "logit"))
summary(fit2)

## 
## Call:
## glm(formula = y ~ x, family = quasibinomial(link = "logit"))
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.485      0.278    5.35  5.8e-07 ***
## x              0.751      0.276    2.72   0.0078 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.9606)
## 
##     Null deviance: 105.382  on 99  degrees of freedom
## Residual deviance:  97.179  on 98  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

对于函数 quasibinomial() ，还有另一种等价的方式，使用函数 quasi() 构造。在函数 glm() 对指数族的符号表示体系中，mu 表示期望。对于伯努利分布，若其期望为 mu ，则其方差为 mu(1-mu) 。

# 需要指定回归系数的初始值
fit3 <- glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), start = c(0.5, 0.5))
summary(fit3)

## 
## Call:
## glm(formula = y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"), 
##     start = c(0.5, 0.5))
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.485      0.278    5.35  5.8e-07 ***
## x              0.751      0.276    2.72   0.0078 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 0.9607)
## 
##     Null deviance: 105.382  on 99  degrees of freedom
## Residual deviance:  97.179  on 98  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

这个结果与前面设置 family = quasibinomial(link = "logit") 时，是一样的。

2.6 使用函数 `vglm()` 的等价表示

VGAM 包支持拟合响应变量是伯努利分布的广义线性模型。

library(VGAM)
fit4 <- vglm(y ~ x, family = binomialff(link = "logitlink"))
summary(fit4)

## 
## Call:
## vglm(formula = y ~ x, family = binomialff(link = "logitlink"))
## 
## Coefficients: 
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.485      0.283    5.24  1.6e-07 ***
## x              0.751      0.282    2.66   0.0078 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Name of linear predictor: logitlink(prob) 
## 
## Residual deviance: 97.18 on 98 degrees of freedom
## 
## Log-likelihood: -48.59 on 98 degrees of freedom
## 
## Number of Fisher scoring iterations: 4

这个结果与本节开头拟合的结果是完全一样的。

3 指数分布

响应变量服从指数分布的广义线性模型，在互联网上，常用来刻画用户流失情况（对流失率/风险建模），在医疗上，用来刻画病人的生存风险（对存活率建模）。

根据 R 语言中指数分布的记号（可查看函数 dexp() 的帮助文档），指数分布\(\mathrm{Exp}(\lambda)\)的概率密度函数如下：

\[ f(x;\lambda) = \lambda\mathrm{e}^{-\lambda x},\quad x \geq 0 \]

它的期望是 \(\frac{1}{\lambda}\) ，方差是 \(\frac{1}{\lambda^2}\) ，方差是期望的平方。

3.1 使用函数 `quasi()` 设置指数分布

在 Base R 中，拟合广义线性模型的函数是 glm()，但是函数 glm() 内置的分布中并没有指数分布，需要自己根据已有的条件组合。本质上，函数 glm() 对条件期望建模，就是将各个特征的观察值看作固定值，此处条件分布服从指数分布，而指数分布的方差是其期望的平方，联系函数是对数函数。因此，family 参数设置如下：

family = quasi(variance = "mu^2", link = "log")

在 R 语言中，生成服从指数分布的随机数的函数 rexp(n, rate) 中的参数 rate 在实际意义中通常表示（衰减）速率，这也是参数名 rate 的由来。随机数生成函数 rexp(n, rate) 对应的指数分布的均值是 1/rate ，函数 glm() 是对均值进行回归，协变量 x 影响均值 1/rate 。在函数 glm() 中，参数 family 用来设置响应变量的分布。

# 固定随机数种子使得结果可复现
set.seed(2026)
# 生成协变量
x <- rnorm(100)
# 生成服从指数分布的随机数
y <- rexp(n = 100, rate = 1/exp(1 + x)) # 对应期望是 exp(1 + x)
# 函数 glm 对条件期望建模，条件分布是指数分布，而指数分布的方差是其期望的平方
fit1 <- glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))
summary(fit1)

## 
## Call:
## glm(formula = y ~ x, family = quasi(variance = "mu^2", link = "log"))
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.8896     0.0945    9.42  2.2e-15 ***
## x             0.9861     0.0942   10.47  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 0.8842)
## 
##     Null deviance: 196.14  on 99  degrees of freedom
## Residual deviance: 108.25  on 98  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

3.2 离散参数

在输出模型拟合结果的函数 summary() 中，还可设置参数 dispersion 控制离散参数的值。当调用函数 quasi() 构造概率分布时，包含未知的离散参数，它是待估的，一并随回归系数进行估计。在函数 summary() 中指定 dispersion = 1 就是固定参数值，不需要估计。对于逻辑回归（响应变量服从0-1分布）和泊松回归（响应变量服从泊松分布），离散参数是固定值为 1。

summary(fit1, dispersion = 1)

## 
## Call:
## glm(formula = y ~ x, family = quasi(variance = "mu^2", link = "log"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    0.890      0.100    8.85   <2e-16 ***
## x              0.986      0.100    9.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasi family taken to be 1)
## 
##     Null deviance: 196.14  on 99  degrees of freedom
## Residual deviance: 108.25  on 98  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

3.3 Gamma 分布

在 R 语言中，Gamma 伽马分布（可查看函数 dgamma() 的帮助文档）的概率密度函数如下：

\[ f(x;\alpha,\sigma) = \frac{1}{\sigma^\alpha\Gamma(\alpha)}x^{\alpha-1}\mathrm{e}^{-\frac{x}{\sigma}}, \quad x \geq 0, \alpha > 0, \sigma > 0 \]

伽马分布的概率密度函数带有两个参数 \(\alpha\) 和 \(\sigma\) ，其中，参数 \(\alpha\) 称之为形状参数 shape parameter ，参数 \(\sigma\) 称之为尺度参数 scale parameter 。伽马分布的均值和方差分别为 \(\alpha\sigma\) 和 \(\alpha\sigma^2\) 。\(\Gamma(\cdot)\) 表示伽马函数（可查看函数 gamma() 的帮助文档），伽马分布的由来离不开伽马函数。

3.4 伽马分布可退化为指数分布

根据前述伽马分布的概率密度函数，当形状参数 \(\alpha\) 等于 1 时，伽马分布退化为指数分布。若保持与前面指数分布 \(\mathrm{Exp}(\lambda)\) 一致的记号，伽马分布退化下来的指数分布为 \(\mathrm{Exp}(\frac{1}{\sigma})\) ，此时，期望为 \(\sigma\) 。

\[ f(x;\sigma) = \frac{1}{\sigma}\mathrm{e}^{-\frac{x}{\sigma}}, \quad x \geq 0, \sigma > 0 \]

在 R 语言中，函数 glm() 的参数 family 支持选择伽马分布 Gamma()。

3.5 函数 `glm()` 中使用 Gamma 分布族

注意函数 Gamma() 的代码内容，以下这段话只属于这段代码。

在函数 glm() 中，伽马分布族是由均值参数 mu 和离散参数 disp 两个参数来刻画的。对于伽马分布，当离散参数 disp 是 1 时，伽马分布的形状参数 shape 也是 1，这就退化为指数分布 \(\mathrm{Exp}(\frac{1}{\sigma})\)，伽马分布的尺度参数 scale 就是指数分布的期望 \(\sigma\) 。

# https://stat.ethz.ch/pipermail/r-help/2003-June/034852.html
# 伽马分布的形状参数 alpha=1 对应于指数分布
# 离散参数为 1 导出形状参数 alpha=1 
fit2 <- glm(y ~ x, family = Gamma(link = "log"))
summary(fit2, dispersion = 1)

## 
## Call:
## glm(formula = y ~ x, family = Gamma(link = "log"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    0.890      0.100    8.85   <2e-16 ***
## x              0.986      0.100    9.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 1)
## 
##     Null deviance: 196.14  on 99  degrees of freedom
## Residual deviance: 108.25  on 98  degrees of freedom
## AIC: 365.5
## 
## Number of Fisher Scoring iterations: 6

从结果来看，伽马分布退化之后，与前面的计算结果是一样的。

3.6 响应变量是否可取值 0 ？

另，值得注意的是当 family = Gamma(link = "log") ，函数 glm() 不允许响应变量 \(y\) 含 0 。根据前面对伽马分布的定义，这相当于 \(x^{\alpha-1}\) 中的 \(x\) 不能取 0 ，幂函数的底数只能是非零实数。但是，如果形状参数取 1 ，退化为指数分布，响应变量 \(y\) 是可以含 0 的。也就是说，当采用如下形式拟合模型时，响应变量 \(y\) 是可以含 0 的。

glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))

3.7 使用函数 `vglm()` 的等价表示

相比于函数 glm() 支持的响应变量的分布类型，VGAM 包支持更多，对于指数分布，它有相应的分布。

library(VGAM)
fit3 <- vglm(y ~ x, family = exponential(link = "loglink"))
summary(fit3)

## 
## Call:
## vglm(formula = y ~ x, family = exponential(link = "loglink"))
## 
## Coefficients: 
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.890      0.100   -8.85   <2e-16 ***
## x             -0.986      0.100   -9.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Name of linear predictor: loglink(rate) 
## 
## Residual deviance: 108.2 on 98 degrees of freedom
## 
## Log-likelihood: -179.3 on 98 degrees of freedom
## 
## Number of Fisher scoring iterations: 5

但是，要注意回归系数的符号，在线性预测部分，它是对指数分布 rexp() 中的 rate 建模，而不是 1/rate （期望）。

Name of linear predictor: loglink(rate)

4 参考文献

Peng Ding, 2026, Linear Model and Extensions. Boca Raton, Florida. Chapman and Hall/CRC. https://arxiv.org/abs/2401.00649

广义线性模型与 R 语言

黄湘云

1 泊松分布

1.1 函数 `glm()` 拟合模型

1.2 离散参数与函数 `quasipoisson()`

1.3 函数 `quasi()` 的说明

1.4 VGAM 和 glmnet 包

1.5 漂移项 offset

1.6 泊松分布还是二项分布？

2 伯努利分布

2.1 二项分布

2.2 Logistic 分布

2.3 logit 变换

2.4 模拟二项分布

2.5 函数 `quasibinomial()` 及其等价表示

2.6 使用函数 `vglm()` 的等价表示

3 指数分布

3.1 使用函数 `quasi()` 设置指数分布

3.2 离散参数

3.3 Gamma 分布

3.4 伽马分布可退化为指数分布

3.5 函数 `glm()` 中使用 Gamma 分布族

3.6 响应变量是否可取值 0 ？

3.7 使用函数 `vglm()` 的等价表示

4 参考文献

广义线性模型与 R 语言

黄湘云

1 泊松分布

1.1 函数 glm() 拟合模型

1.2 离散参数与函数 quasipoisson()

1.3 函数 quasi() 的说明

1.4 VGAM 和 glmnet 包

1.5 漂移项 offset

1.6 泊松分布还是二项分布？

2 伯努利分布

2.1 二项分布

2.2 Logistic 分布

2.3 logit 变换

2.4 模拟二项分布

2.5 函数 quasibinomial() 及其等价表示

2.6 使用函数 vglm() 的等价表示

3 指数分布

3.1 使用函数 quasi() 设置指数分布

3.2 离散参数

3.3 Gamma 分布

3.4 伽马分布可退化为指数分布

3.5 函数 glm() 中使用 Gamma 分布族

3.6 响应变量是否可取值 0 ？

3.7 使用函数 vglm() 的等价表示

4 参考文献

1.1 函数 `glm()` 拟合模型

1.2 离散参数与函数 `quasipoisson()`

1.3 函数 `quasi()` 的说明

2.5 函数 `quasibinomial()` 及其等价表示

2.6 使用函数 `vglm()` 的等价表示

3.1 使用函数 `quasi()` 设置指数分布

3.5 函数 `glm()` 中使用 Gamma 分布族

3.7 使用函数 `vglm()` 的等价表示