预计阅读

动画制作与 Plotly Python





Plotly Python

plotly.js 是一款功能非常全面的开源数据可视化库,目前 Github 上的星赞超过 14K。图形种类非常丰富,仅地图制作就涵盖了热力图、专题地图、飞线图、散点图、气泡图等常用的图形,且支持调用 MapBox 地图服务用作底图。相比较于 R 版(Sievert 2020),Python 版 plotly.py 几乎与上游 JS 版同步更新,功能更加全面,BUG 也少,模块依赖也少,接口也好用些。本文图形和动画主要用 Python 版 plotly 制作。

散点图

目前,Plotly 的高级绘图子模块 express 支持35 种常见图形,各图形参数见此,帮助文档见此

  • Basics: scatter, line, area, bar, funnel, timeline
  • Part-of-Whole: pie, sunburst, treemap, icicle, funnel_area
  • 1D Distributions: histogram, box, violin, strip, ecdf
  • 2D Distributions: density_heatmap, density_contour
  • Matrix or Image Input: imshow
  • 3-Dimensional: scatter_3d, line_3d
  • Multidimensional: scatter_matrix, parallel_coordinates, parallel_categories
  • Tile Maps: scatter_mapbox, line_mapbox, choropleth_mapbox, density_mapbox
  • Outline Maps: scatter_geo, line_geo, choropleth
  • Polar Charts: scatter_polar, line_polar, bar_polar
  • Ternary Charts: scatter_ternary, line_ternary
c(
  "scatter", "density_contour", "density_heatmap",
  "line", "area", "bar", "timeline", "histogram",
  "histogram", "violin", "box", "strip", "scatter_3d",
  "line_3d", "scatter_ternary", "line_ternary",
  "scatter_polar", "line_polar", "bar_polar", "choropleth",
  "scatter_geo", "line_geo", "scatter_mapbox",
  "choropleth_mapbox", "density_mapbox", "line_mapbox",
  "scatter_matrix", "parallel_coordinates",
  "parallel_categories", "pie", "sunburst", "treemap",
  "icicle", "funnel", "funnel_area"
)

express 支持快速地绘制图形,使用接口类似 R 语言,从 R 语言切换过来简单,和统计模型结合的非常好,以分组线性回归散点图为例。

import plotly.express as px

px.scatter(
    px.data.iris(),
    x="sepal_width",
    y="sepal_length",
    color="species",
    trendline="ols",
    template="simple_white",
    labels={
        "sepal_length": "Sepal Length (cm)",
        "sepal_width": "Sepal Width (cm)",
        "species": "Species of Iris",
    },
    title="Edgar Anderson's Iris Data",
    color_discrete_sequence=px.colors.qualitative.Set2
)

导出图形非常简单,保存绘图对象,然后调函数 write_image() 导出即可。

import plotly.express as px
fig = px.scatter(
    px.data.iris(),
    x="sepal_width",
    y="sepal_length",
    color="species",
    trendline="ols",
    template="simple_white",
    labels={
        "sepal_length": "Sepal Length (cm)",
        "sepal_width": "Sepal Width (cm)",
        "species": "Species of Iris",
    },
    title="Edgar Anderson's Iris Data",
    color_discrete_sequence=px.colors.qualitative.Set2,
)
fig.write_image("img/iris.png", engine="kaleido")
fig.write_image("img/iris.pdf", engine="kaleido")
fig.write_image("img/iris.svg", engine="kaleido")

支持导出多种格式,且导出的图片效果很好,如图1所示。

Plotly 导出高质量的图片

图 1: Plotly 导出高质量的图片

气泡动画

plotly 内置了著名的 gapminder 数据集,如下:

import plotly.express as px
df = px.data.gapminder()
df.head(6)
#        country continent  year  ...   gdpPercap  iso_alpha  iso_num
# 0  Afghanistan      Asia  1952  ...  779.445314        AFG        4
# 1  Afghanistan      Asia  1957  ...  820.853030        AFG        4
# 2  Afghanistan      Asia  1962  ...  853.100710        AFG        4
# 3  Afghanistan      Asia  1967  ...  836.197138        AFG        4
# 4  Afghanistan      Asia  1972  ...  739.981106        AFG        4
# 5  Afghanistan      Asia  1977  ...  786.113360        AFG        4
# 
# [6 rows x 8 columns]

相比于 R 包 gapminder 内置的数据集,Python 版新加了一列 iso_alpha

import plotly.express as px

df = px.data.gapminder()
px.scatter(
    df,
    # 横轴
    x="gdpPercap",
    # 纵轴
    y="lifeExp",
    # 动画帧
    animation_frame="year",
    # 动画分组
    animation_group="country",
    # 气泡大小映射到总人口
    size="pop",
    # 颜色映射到地区/大洲
    color="continent",
    # 悬浮
    hover_name="country",
    # 图形主题
    template="none",
    # 横轴坐标取对数
    log_x=True,
    size_max=55,
    # 横轴范围
    range_x=[100, 100000],
    # 纵轴范围
    range_y=[25, 90],
    # 动画标题
    title = "各国人均寿命与人均GDP关系演变",
    # 设置横纵坐标轴标题
    labels={
        "gdpPercap": "人均 GDP (美元)",
        "lifeExp": "人均寿命 (岁)",
        "year": "年份",
        "continent": "大洲",
        "country": "国家",
        "pop": "人口总数"
    },
    # 调用来自 colorbrewer 的 Set2 调色板
    color_discrete_sequence=px.colors.colorbrewer.Set2
)

类似 ggplot2 包的主题设置,Plotly Express 也是支持 自定义风格样式的,其中图形主题 最为方便快捷。默认主题为 plotly,可用的有:

import plotly.io as pio
pio.templates
# Templates configuration
# -----------------------
#     Default template: 'plotly'
#     Available templates:
#         ['ggplot2', 'seaborn', 'simple_white', 'plotly',
#          'plotly_white', 'plotly_dark', 'presentation', 'xgridoff',
#          'ygridoff', 'gridon', 'none']

只需将 template="none" 替换为以上任意一种,即可获得不一样的效果。

import plotly.express as px
df = px.data.gapminder()
px.scatter(
    df,
    # 横轴
    x="gdpPercap",
    # 纵轴
    y="lifeExp",
    # 动画帧
    animation_frame="year",
    # 动画分组
    animation_group="country",
    # 气泡大小映射到总人口
    size="pop",
    # 颜色映射到地区/大洲
    color="continent",
    # 悬浮
    hover_name="country",
    # 图形主题
    template="ggplot2",
    # 横轴坐标取对数
    log_x=True,
    size_max=55,
    # 横轴范围
    range_x=[100, 100000],
    # 纵轴范围
    range_y=[25, 90],
    # 动画标题
    title = "各国人均寿命与人均GDP关系演变",
    # 设置横纵坐标轴标题
    labels={
        "gdpPercap": "人均 GDP (美元)",
        "lifeExp": "人均寿命 (岁)",
        "year": "年份",
        "continent": "大洲",
        "country": "国家",
        "pop": "人口总数"
    },
    # 调用来自 colorbrewer 的 Set2 调色板
    color_discrete_sequence=px.colors.colorbrewer.Set2
)

除了主题模版,plotly 模块还内置了很多调色板,比如家喻户晓的 colorbrewer 调色板,见下图。

import plotly.express as px
fig = px.colors.colorbrewer.swatches()
fig.show()

地图动画

参考 plotly 官网 choropleth map

import plotly.express as px
# 提取 2007 年的数据
df = px.data.gapminder().query("year == 2007")
# 2007 年世界平均寿命
avg_lifeExp = (df["lifeExp"] * df["pop"]).sum() / df["pop"].sum()
# 绘制地区分布图
px.choropleth(
    df,
    locations="iso_alpha",
    color="lifeExp",
    color_continuous_scale=px.colors.diverging.BrBG,
    color_continuous_midpoint=avg_lifeExp,
    title="2007 年世界平均寿命 %.1f(岁)" % avg_lifeExp,
    labels={
        "iso_alpha": "国家编码",
        "lifeExp": "人均寿命 (岁)"
    }
)

与气泡动画类似,只需指定 animation_frameanimation_group 的参数值,就可以生成动画。

import plotly.express as px
df = px.data.gapminder()
px.choropleth(
    df,
    locations="iso_alpha",
    color="lifeExp",
    # 动画帧
    animation_frame="year",
    # 动画分组
    animation_group="country",
    # 悬浮提示
    hover_name="country", 
    color_continuous_scale=px.colors.sequential.Viridis,
)

Plotly R

下面采用 Carson Sievert 开发的 plotly(Sievert 2020)制作网页动画,仍是以 gapminder 数据集为例,示例修改自 plotly 官网的 动画示例。关于 plotly 以及绘制散点图的介绍,见前文交互式网页图形与 R 语言,此处不再赘述。

library(gapminder)
library(plotly)
plot_ly(
  data = gapminder,
  # 横轴
  x = ~gdpPercap,
  # 纵轴
  y = ~lifeExp,
  # 气泡大小
  size = ~pop,
  fill = ~"",
  # 分组配色
  color = ~continent,
  # 时间变量
  frame = ~year,
  # 悬浮提示
  text = ~ paste0(
    "年份:", year, "<br>",
    "国家:", country, "<br>",
    "人口总数:", round(pop / 10^6, 2), "百万", "<br>",
    "人均 GDP:", round(gdpPercap), "美元", "<br>",
    "预期寿命:", round(lifeExp), "岁"
  ),
  hoverinfo = "text",
  type = "scatter",
  mode = "markers",
  marker = list(
    symbol = "circle",
    sizemode = "diameter",
    line = list(width = 2, color = "#FFFFFF"),
    opacity = 0.8
  )
) |>
  layout(
    xaxis = list(
      type = "log", title = "人均 GDP(美元)"
    ),
    yaxis = list(title = "预期寿命(年)"),
    legend = list(title = list(text = "<b> 大洲 </b>"))
  ) |>
  animation_opts(
    # 帧与帧之间播放的时间间隔,包含转场时间
    frame = 1000, 
    # 转场的效果
    easing = "linear", 
    redraw = FALSE,
    # 动画帧与帧之间转场的时间间隔,单位毫秒
    transition = 1000,
    mode = "immediate"
  ) |>
  animation_slider(
    currentvalue = list(
      prefix = "年份 ",
      xanchor = "right",
      font = list(color = "gray", size = 30)
    )
  ) |>
  # 下面的参数会传递给 layout.updatemenus 的 button 对象
  animation_button(
    # 按钮位置
    x = 0, xanchor = "right",
    y = -0.3, yanchor = "bottom",
    visible = TRUE, # 显示播放按钮
    label = "播放", # 按钮文本
    # bgcolor = "#5b89f7", # 按钮背景色
    font = list(color = "orange")# 文本颜色
  ) |>
  config(
    # 去掉图片右上角工具条
    displayModeBar = FALSE
  )

图 2: plotly 包 制作网页动画

在上图的制作过程中,有几点值得一提:

  1. 如图 2 所示,相比于 echarts4r, 气泡即使有重叠和覆盖,只要鼠标悬浮其上,就能显示被覆盖的 tooltip。

  2. 读者看到 plot_ly(fill = ~"",...) 可能会奇怪,为什么参数 fill 的值是空字符串?这应该是 plotly 的 R 包或 JS 库的问题,不加会有很多警告:

    `line.width` does not currently support multiple values.

    笔者是参考SO的帖子加上的。

  3. 动画控制参数 animation_opts(),详见 Plotly 动画属性,结合此帮助文档,可知参数 easing 除了上面的取值 linear,还有很多,全部参数值见表1

    表 1: 动画转场特效
    linear quad cubic sin exp circle
    elastic back bounce linear-in quad-in cubic-in
    sin-in exp-in circle-in elastic-in back-in bounce-in
    linear-out quad-out cubic-out sin-out exp-out circle-out
    elastic-out back-out bounce-out linear-in-out quad-in-out cubic-in-out
    sin-in-out exp-in-out circle-in-out elastic-in-out back-in-out bounce-in-out

    animation_opts() 的其它默认参数设置见 plotly:::animation_opts_defaults()

  4. 动画播放和暂停的控制按钮一直有问题,即点击播放后,按钮不会切换为暂停按钮,点击暂停后也不能恢复,详见 plotly 包的 Github 问题贴。然而,Python 版的 plotly 模块制作此动画,一切正常,代码也紧凑很多。

  5. 添加如下 CSS 可以去掉图形右上角烦人的工具条。

    .modebar {
      display: none !important;
    }

运行环境

本文是在 RStudio IDE 内用 R Markdown 编辑的,用 blogdown 构建网站,Hugo 渲染 knitr 之后的 Markdown 文件,得益于 blogdown 对 R Markdown 格式的支持,图、表和参考文献的交叉引用非常方便,省了不少文字编辑功夫。文中使用了多个 R 包,为方便复现本文内容,下面列出详细的环境信息:

xfun::session_info(packages = c(
  "knitr", "rmarkdown", "blogdown", "reticulate"
), dependencies = FALSE)
# R version 4.2.0 (2022-04-22)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Big Sur/Monterey 10.16
# 
# Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
# 
# Package version:
#   blogdown_1.10   knitr_1.39      reticulate_1.25 rmarkdown_2.14 
# 
# Pandoc version: 2.18
# 
# Hugo version: 0.98.0

本文用 virtualenv 创建虚拟 Python 环境,安装 plotly 模块 5.5.0 版本,还安装 kaleido (导出图形)和 statsmodels (线性回归)模块。

## 安装 virtualenv
brew install virtualenv
## 准备虚拟环境存放地址
sudo mkdir -p /opt/.virtualenvs/r-tensorflow
sudo chown -R $(whoami):staff /opt/.virtualenvs/r-tensorflow
## 方便后续复用
export RETICULATE_PYTHON_ENV=/opt/.virtualenvs/r-tensorflow
## 创建虚拟环境
virtualenv -p /usr/local/bin/python3 $RETICULATE_PYTHON_ENV
## 激活虚拟环境
source $RETICULATE_PYTHON_ENV/bin/activate

最后,在文件 ~/.Rprofile 里设置环境变量 RETICULATE_PYTHONRETICULATE_PYTHON_ENV,这样, reticulate 包就能发现和使用它了。

Sys.setenv(RETICULATE_PYTHON="/usr/local/bin/python3")
Sys.setenv(RETICULATE_PYTHON_ENV="/opt/.virtualenvs/r-tensorflow")

本文 Python 运行环境如下:

reticulate::py_config()
# python:         /opt/.virtualenvs/r-tensorflow/bin/python
# libpython:      /usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/config-3.9-darwin/libpython3.9.dylib
# pythonhome:     /opt/.virtualenvs/r-tensorflow:/opt/.virtualenvs/r-tensorflow
# virtualenv:     /opt/.virtualenvs/r-tensorflow/bin/activate_this.py
# version:        3.9.13 (main, May 19 2022, 14:10:54)  [Clang 13.1.6 (clang-1316.0.21.2)]
# numpy:          /opt/.virtualenvs/r-tensorflow/lib/python3.9/site-packages/numpy
# numpy_version:  1.22.0
# 
# NOTE: Python version was forced by RETICULATE_PYTHON

参考文献

Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman; Hall/CRC. https://plotly-r.com.