Plotly Python
plotly.js 是一款功能非常全面的开源数据可视化库,目前 Github 上的星赞超过 14K。图形种类非常丰富,仅地图制作就涵盖了热力图、专题地图、飞线图、散点图、气泡图等常用的图形,且支持调用 MapBox 地图服务用作底图。相比较于 R 版(Sievert 2020),Python 版 plotly.py 几乎与上游 JS 版同步更新,功能更加全面,BUG 也少,模块依赖也少,接口也好用些。本文图形和动画主要用 Python 版 plotly 制作。
散点图
目前,Plotly 的高级绘图子模块 express 支持35 种常见图形,各图形参数见此,帮助文档见此。
- Basics: scatter, line, area, bar, funnel, timeline
- Part-of-Whole: pie, sunburst, treemap, icicle, funnel_area
- 1D Distributions: histogram, box, violin, strip, ecdf
- 2D Distributions: density_heatmap, density_contour
- Matrix or Image Input: imshow
- 3-Dimensional: scatter_3d, line_3d
- Multidimensional: scatter_matrix, parallel_coordinates, parallel_categories
- Tile Maps: scatter_mapbox, line_mapbox, choropleth_mapbox, density_mapbox
- Outline Maps: scatter_geo, line_geo, choropleth
- Polar Charts: scatter_polar, line_polar, bar_polar
- Ternary Charts: scatter_ternary, line_ternary
c(
"scatter", "density_contour", "density_heatmap",
"line", "area", "bar", "timeline", "histogram",
"histogram", "violin", "box", "strip", "scatter_3d",
"line_3d", "scatter_ternary", "line_ternary",
"scatter_polar", "line_polar", "bar_polar", "choropleth",
"scatter_geo", "line_geo", "scatter_mapbox",
"choropleth_mapbox", "density_mapbox", "line_mapbox",
"scatter_matrix", "parallel_coordinates",
"parallel_categories", "pie", "sunburst", "treemap",
"icicle", "funnel", "funnel_area"
)
express 支持快速地绘制图形,使用接口类似 R 语言,从 R 语言切换过来简单,和统计模型结合的非常好,以分组线性回归散点图为例。
import plotly.express as px
px.scatter(
px.data.iris(),
x="sepal_width",
y="sepal_length",
color="species",
trendline="ols",
template="simple_white",
labels={
"sepal_length": "Sepal Length (cm)",
"sepal_width": "Sepal Width (cm)",
"species": "Species of Iris",
},
title="Edgar Anderson's Iris Data",
color_discrete_sequence=px.colors.qualitative.Set2
)
导出图形非常简单,保存绘图对象,然后调函数 write_image()
导出即可。
import plotly.express as px
fig = px.scatter(
px.data.iris(),
x="sepal_width",
y="sepal_length",
color="species",
trendline="ols",
template="simple_white",
labels={
"sepal_length": "Sepal Length (cm)",
"sepal_width": "Sepal Width (cm)",
"species": "Species of Iris",
},
title="Edgar Anderson's Iris Data",
color_discrete_sequence=px.colors.qualitative.Set2,
)
fig.write_image("img/iris.png", engine="kaleido")
fig.write_image("img/iris.pdf", engine="kaleido")
fig.write_image("img/iris.svg", engine="kaleido")
支持导出多种格式,且导出的图片效果很好,如图1所示。
气泡动画
plotly 内置了著名的 gapminder 数据集,如下:
import plotly.express as px
df = px.data.gapminder()
df.head(6)
# country continent year ... gdpPercap iso_alpha iso_num
# 0 Afghanistan Asia 1952 ... 779.445314 AFG 4
# 1 Afghanistan Asia 1957 ... 820.853030 AFG 4
# 2 Afghanistan Asia 1962 ... 853.100710 AFG 4
# 3 Afghanistan Asia 1967 ... 836.197138 AFG 4
# 4 Afghanistan Asia 1972 ... 739.981106 AFG 4
# 5 Afghanistan Asia 1977 ... 786.113360 AFG 4
#
# [6 rows x 8 columns]
相比于 R 包 gapminder 内置的数据集,Python 版新加了一列 iso_alpha
。
import plotly.express as px
df = px.data.gapminder()
px.scatter(
df,
# 横轴
x="gdpPercap",
# 纵轴
y="lifeExp",
# 动画帧
animation_frame="year",
# 动画分组
animation_group="country",
# 气泡大小映射到总人口
size="pop",
# 颜色映射到地区/大洲
color="continent",
# 悬浮
hover_name="country",
# 图形主题
template="none",
# 横轴坐标取对数
log_x=True,
size_max=55,
# 横轴范围
range_x=[100, 100000],
# 纵轴范围
range_y=[25, 90],
# 动画标题
title = "各国人均寿命与人均GDP关系演变",
# 设置横纵坐标轴标题
labels={
"gdpPercap": "人均 GDP (美元)",
"lifeExp": "人均寿命 (岁)",
"year": "年份",
"continent": "大洲",
"country": "国家",
"pop": "人口总数"
},
# 调用来自 colorbrewer 的 Set2 调色板
color_discrete_sequence=px.colors.colorbrewer.Set2
)
类似 ggplot2 包的主题设置,Plotly Express 也是支持 自定义风格样式的,其中图形主题 最为方便快捷。默认主题为 plotly
,可用的有:
import plotly.io as pio
pio.templates
# Templates configuration
# -----------------------
# Default template: 'plotly'
# Available templates:
# ['ggplot2', 'seaborn', 'simple_white', 'plotly',
# 'plotly_white', 'plotly_dark', 'presentation', 'xgridoff',
# 'ygridoff', 'gridon', 'none']
只需将 template="none"
替换为以上任意一种,即可获得不一样的效果。
import plotly.express as px
df = px.data.gapminder()
px.scatter(
df,
# 横轴
x="gdpPercap",
# 纵轴
y="lifeExp",
# 动画帧
animation_frame="year",
# 动画分组
animation_group="country",
# 气泡大小映射到总人口
size="pop",
# 颜色映射到地区/大洲
color="continent",
# 悬浮
hover_name="country",
# 图形主题
template="ggplot2",
# 横轴坐标取对数
log_x=True,
size_max=55,
# 横轴范围
range_x=[100, 100000],
# 纵轴范围
range_y=[25, 90],
# 动画标题
title = "各国人均寿命与人均GDP关系演变",
# 设置横纵坐标轴标题
labels={
"gdpPercap": "人均 GDP (美元)",
"lifeExp": "人均寿命 (岁)",
"year": "年份",
"continent": "大洲",
"country": "国家",
"pop": "人口总数"
},
# 调用来自 colorbrewer 的 Set2 调色板
color_discrete_sequence=px.colors.colorbrewer.Set2
)
除了主题模版,plotly 模块还内置了很多调色板,比如家喻户晓的 colorbrewer 调色板,见下图。
import plotly.express as px
fig = px.colors.colorbrewer.swatches()
fig.show()
地图动画
参考 plotly 官网 choropleth map
import plotly.express as px
# 提取 2007 年的数据
df = px.data.gapminder().query("year == 2007")
# 2007 年世界平均寿命
avg_lifeExp = (df["lifeExp"] * df["pop"]).sum() / df["pop"].sum()
# 绘制地区分布图
px.choropleth(
df,
locations="iso_alpha",
color="lifeExp",
color_continuous_scale=px.colors.diverging.BrBG,
color_continuous_midpoint=avg_lifeExp,
title="2007 年世界平均寿命 %.1f(岁)" % avg_lifeExp,
labels={
"iso_alpha": "国家编码",
"lifeExp": "人均寿命 (岁)"
}
)
与气泡动画类似,只需指定 animation_frame
和 animation_group
的参数值,就可以生成动画。
import plotly.express as px
df = px.data.gapminder()
px.choropleth(
df,
locations="iso_alpha",
color="lifeExp",
# 动画帧
animation_frame="year",
# 动画分组
animation_group="country",
# 悬浮提示
hover_name="country",
color_continuous_scale=px.colors.sequential.Viridis,
)
Plotly R
下面采用 Carson Sievert 开发的 plotly 包(Sievert 2020)制作网页动画,仍是以 gapminder 数据集为例,示例修改自 plotly 官网的 动画示例。关于 plotly 以及绘制散点图的介绍,见前文交互式网页图形与 R 语言,此处不再赘述。
library(gapminder)
library(plotly)
plot_ly(
data = gapminder,
# 横轴
x = ~gdpPercap,
# 纵轴
y = ~lifeExp,
# 气泡大小
size = ~pop,
fill = ~"",
# 分组配色
color = ~continent,
# 时间变量
frame = ~year,
# 悬浮提示
text = ~ paste0(
"年份:", year, "<br>",
"国家:", country, "<br>",
"人口总数:", round(pop / 10^6, 2), "百万", "<br>",
"人均 GDP:", round(gdpPercap), "美元", "<br>",
"预期寿命:", round(lifeExp), "岁"
),
hoverinfo = "text",
type = "scatter",
mode = "markers",
marker = list(
symbol = "circle",
sizemode = "diameter",
line = list(width = 2, color = "#FFFFFF"),
opacity = 0.8
)
) |>
layout(
xaxis = list(
type = "log", title = "人均 GDP(美元)"
),
yaxis = list(title = "预期寿命(年)"),
legend = list(title = list(text = "<b> 大洲 </b>"))
) |>
animation_opts(
# 帧与帧之间播放的时间间隔,包含转场时间
frame = 1000,
# 转场的效果
easing = "linear",
redraw = FALSE,
# 动画帧与帧之间转场的时间间隔,单位毫秒
transition = 1000,
mode = "immediate"
) |>
animation_slider(
currentvalue = list(
prefix = "年份 ",
xanchor = "right",
font = list(color = "gray", size = 30)
)
) |>
# 下面的参数会传递给 layout.updatemenus 的 button 对象
animation_button(
# 按钮位置
x = 0, xanchor = "right",
y = -0.3, yanchor = "bottom",
visible = TRUE, # 显示播放按钮
label = "播放", # 按钮文本
# bgcolor = "#5b89f7", # 按钮背景色
font = list(color = "orange")# 文本颜色
) |>
config(
# 去掉图片右上角工具条
displayModeBar = FALSE
)
在上图的制作过程中,有几点值得一提:
如图 2 所示,相比于 echarts4r, 气泡即使有重叠和覆盖,只要鼠标悬浮其上,就能显示被覆盖的 tooltip。
读者看到
plot_ly(fill = ~"",...)
可能会奇怪,为什么参数fill
的值是空字符串?这应该是 plotly 的 R 包或 JS 库的问题,不加会有很多警告:`line.width` does not currently support multiple values.
笔者是参考SO的帖子加上的。
动画控制参数
animation_opts()
,详见 Plotly 动画属性,结合此帮助文档,可知参数easing
除了上面的取值linear
,还有很多,全部参数值见表1。表 1: 动画转场特效 linear quad cubic sin exp circle elastic back bounce linear-in quad-in cubic-in sin-in exp-in circle-in elastic-in back-in bounce-in linear-out quad-out cubic-out sin-out exp-out circle-out elastic-out back-out bounce-out linear-in-out quad-in-out cubic-in-out sin-in-out exp-in-out circle-in-out elastic-in-out back-in-out bounce-in-out animation_opts()
的其它默认参数设置见plotly:::animation_opts_defaults()
。动画播放和暂停的控制按钮一直有问题,即点击播放后,按钮不会切换为暂停按钮,点击暂停后也不能恢复,详见 plotly 包的 Github 问题贴。然而,Python 版的 plotly 模块制作此动画,一切正常,代码也紧凑很多。
添加如下 CSS 可以去掉图形右上角烦人的工具条。
.modebar { display: none !important; }
运行环境
本文是在 RStudio IDE 内用 R Markdown 编辑的,用 blogdown 构建网站,Hugo 渲染 knitr 之后的 Markdown 文件,得益于 blogdown 对 R Markdown 格式的支持,图、表和参考文献的交叉引用非常方便,省了不少文字编辑功夫。文中使用了多个 R 包,为方便复现本文内容,下面列出详细的环境信息:
xfun::session_info(packages = c(
"knitr", "rmarkdown", "blogdown", "reticulate"
), dependencies = FALSE)
# R version 4.2.0 (2022-04-22)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Big Sur/Monterey 10.16
#
# Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
#
# Package version:
# blogdown_1.10 knitr_1.39 reticulate_1.25 rmarkdown_2.14
#
# Pandoc version: 2.18
#
# Hugo version: 0.98.0
本文用 virtualenv 创建虚拟 Python 环境,安装 plotly 模块 5.5.0 版本,还安装 kaleido (导出图形)和 statsmodels (线性回归)模块。
## 安装 virtualenv
brew install virtualenv
## 准备虚拟环境存放地址
sudo mkdir -p /opt/.virtualenvs/r-tensorflow
sudo chown -R $(whoami):staff /opt/.virtualenvs/r-tensorflow
## 方便后续复用
export RETICULATE_PYTHON_ENV=/opt/.virtualenvs/r-tensorflow
## 创建虚拟环境
virtualenv -p /usr/local/bin/python3 $RETICULATE_PYTHON_ENV
## 激活虚拟环境
source $RETICULATE_PYTHON_ENV/bin/activate
最后,在文件 ~/.Rprofile
里设置环境变量 RETICULATE_PYTHON
和 RETICULATE_PYTHON_ENV
,这样, reticulate 包就能发现和使用它了。
Sys.setenv(RETICULATE_PYTHON="/usr/local/bin/python3")
Sys.setenv(RETICULATE_PYTHON_ENV="/opt/.virtualenvs/r-tensorflow")
本文 Python 运行环境如下:
reticulate::py_config()
# python: /opt/.virtualenvs/r-tensorflow/bin/python
# libpython: /usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/config-3.9-darwin/libpython3.9.dylib
# pythonhome: /opt/.virtualenvs/r-tensorflow:/opt/.virtualenvs/r-tensorflow
# virtualenv: /opt/.virtualenvs/r-tensorflow/bin/activate_this.py
# version: 3.9.13 (main, May 19 2022, 14:10:54) [Clang 13.1.6 (clang-1316.0.21.2)]
# numpy: /opt/.virtualenvs/r-tensorflow/lib/python3.9/site-packages/numpy
# numpy_version: 1.22.0
#
# NOTE: Python version was forced by RETICULATE_PYTHON