NumPy 快速入门

1 基础
2 数组形状操作
3 拷贝和视图
4 高级索引和索引技巧
- 4.1 用索引数组来索引

本文继《10分钟入门pandas》之后，又一篇快速入门 Python 模块的文章，来自 NumPy 的官网NumPy quickstart，也是要同步添加 R 代码实现。

介绍 NumPy 中的数组操作，一维数组、二维数组和N维数组的差异。
使用常见函数操作数组（一些线性代数运算），不使用 for 循环，更 Python 范。
理解 NumPy 中 N 维数组的两个属性：轴 axis 和形状 shape。

首先导入一个 Python 模块：

import numpy as np

R 内置一套数组操作的工具，abind 包提供额外的一些数组操作函数。

library(abind) # abind / asub

1 基础

NumPy 中主要的数据对象是多维数组，且数组中的元素类型都是相同的。数组的维数称之为轴 axes 。

比如，三维空间中的一个点的坐标 [1, 2, 1] 可以用数组来表示，它有一个轴，轴上有三个元素，它的长度为 3。下面这个数组有两个轴，第一个轴长度为 2，第二个轴长度为 3 。

[[1., 0., 0.],
 [0., 1., 2.]]

NumPy 中的数组用 ndarray 或 array 来表示。注意，numpy.array 和标准 Python 库中的类 array.array 是不同的，后者仅能处理一维数组的情况，功能更少。一个 ndarray 对象有如下一些重要的属性。

ndarray.ndim 数组的维数（轴的数目）。
ndarray.shape 数组的大小。用一个元组来表示数组中各个轴/维的大小。对一个 n 行 m 列的矩阵来说，它的形状 shape 是 (n,m) 。这个形状元组的长度是轴的数目，维数。
ndarray.size 数组中所有元素的总数。它是形状元组中元素的连乘积。
ndarray.dtype 数组中元素的数据类型。数组中元素的类型可以用标准 Python 类型来指定，NumPy 也提供它自己的类型，如 numpy.int32, numpy.int16 和 numpy.float64。
ndarray.itemsize 数组中每个元素的存储大小（以字节 bytes 计）。如，数组中 float64 类型的元素，它的 itemsize 是 8 (=64/8) 个字节，而 complex32 类型有 4 (=32/8) 个字节。ndarray.itemsize 等价于 ndarray.dtype.itemsize 。
ndarray.data 数组中的实际元素。通常，我们不需要使用这个属性，而是使用索引来连接数组中的元素。

1.1 一个例子

import numpy as np
a = np.arange(15).reshape(3, 5)
a

## array([[ 0,  1,  2,  3,  4],
##        [ 5,  6,  7,  8,  9],
##        [10, 11, 12, 13, 14]])

a.shape

## (3, 5)

a.ndim

## 2

a.dtype.name

## 'int64'

a.itemsize

## 8

a.size

## 15

type(a)

## <class 'numpy.ndarray'>

b = np.array([6, 7, 8])
b

## array([6, 7, 8])

type(b)

## <class 'numpy.ndarray'>

在 R 语言中，二维数组和矩阵是等价的。

array(data = 0:14, dim = c(3, 5)) # 默认按列填充

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    3    6    9   12
## [2,]    1    4    7   10   13
## [3,]    2    5    8   11   14

a = matrix(data = 0L:14L, nrow = 3, ncol = 5, byrow = T)
a

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    1    2    3    4
## [2,]    5    6    7    8    9
## [3,]   10   11   12   13   14

dim(a)

## [1] 3 5

length(dim(a))

## [1] 2

typeof(a)

## [1] "integer"

length(a)

## [1] 15

class(a)

## [1] "matrix" "array"

mode(a)

## [1] "numeric"

storage.mode(a)

## [1] "integer"

object.size(a)

## 280 bytes

1.2 创建数组

创建不同元素类型的数组，比如整型数组、浮点型数组。

import numpy as np
a = np.array([2, 3, 4])
a

## array([2, 3, 4])

a.dtype

## dtype('int64')

b = np.array([1.2, 3.5, 5.1])
b.dtype

## dtype('float64')

a = array(data = c(2L, 3L, 4L), dim = c(1, 3))
a

##      [,1] [,2] [,3]
## [1,]    2    3    4

typeof(a)

## [1] "integer"

b = array(data = c(1.2, 3.5, 5.1), dim = c(1, 3))
typeof(b)

## [1] "double"

创建一个二维数组，数组的元素类型可以在创建数组时指定。

b = np.array([(1.5, 2, 3), (4, 5, 6)])
b

## array([[1.5, 2. , 3. ],
##        [4. , 5. , 6. ]])

c = np.array([(1, 2), (3, 4)], dtype=complex)
c

## array([[1.+0.j, 2.+0.j],
##        [3.+0.j, 4.+0.j]])

当知道数组的大小，但是数组的内容暂定，需要初始化数组时，有三个函数可用，分别是 zeros 、ones 、empty 。

NumPy 的 arange 函数可以创建一个数值列（一维数组），返回类型也是数组，类似 Python 内置的函数 range 。但是，当传递浮点数参数时，由于有限的浮点数精度，arange 函数返回的元素个数难以预测。此时，最好使用函数 linspace ，它可以指定元素的个数（一维数组的长度）。

# 初始化二维、多维数组
np.zeros((3, 4))

## array([[0., 0., 0., 0.],
##        [0., 0., 0., 0.],
##        [0., 0., 0., 0.]])

np.ones((2, 3, 4), dtype=np.int16)

## array([[[1, 1, 1, 1],
##         [1, 1, 1, 1],
##         [1, 1, 1, 1]],
## 
##        [[1, 1, 1, 1],
##         [1, 1, 1, 1],
##         [1, 1, 1, 1]]], dtype=int16)

np.empty((2, 3)) # 数组元素是随机的，不推荐

## array([[1.5, 2. , 3. ],
##        [4. , 5. , 6. ]])

# 初始化一维数组 指定首尾和间距
np.arange(10, 30, 5)

## array([10, 15, 20, 25])

range(10, 30, 5)

## range(10, 30, 5)

np.arange(0, 2, 0.3)  # it accepts float arguments

## array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

# 初始化一维数组 指定首尾和长度
from numpy import pi
np.linspace(0, 2, 9)                   # 9 numbers from 0 to 2

## array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

x = np.linspace(0, 2 * pi, 100)        # useful to evaluate function at lots of points
f = np.sin(x)

array(data = 0, dim = c(3, 4))

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0

array(data = 1, dim = c(3, 4, 2))

## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    1    1
## [3,]    1    1    1    1

# 参数 by 指定间距
seq(from = 10, to = 30, by = 5)

## [1] 10 15 20 25 30

seq(from = 0, to = 2, by = 0.3)

## [1] 0.0 0.3 0.6 0.9 1.2 1.5 1.8

# 参数 length.out 指定长度
seq(from = 0, to = 2, length.out = 9)

## [1] 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

# pi 是内置的常数
x = seq(from = 0, to = 2*pi, length.out = 100)
f = sin(x)

1.3 打印数组

在Python 环境中，打印（显示） NumPy 创建的数组，一维数组是向量，二维数组是矩阵，三维数组是矩阵列表。

a = np.arange(6)                    # 1d array
print(a)

## [0 1 2 3 4 5]

b = np.arange(12).reshape(4, 3)     # 2d array
print(b)

## [[ 0  1  2]
##  [ 3  4  5]
##  [ 6  7  8]
##  [ 9 10 11]]

c = np.arange(24).reshape(2, 3, 4)  # 3d array
print(c)

## [[[ 0  1  2  3]
##   [ 4  5  6  7]
##   [ 8  9 10 11]]
## 
##  [[12 13 14 15]
##   [16 17 18 19]
##   [20 21 22 23]]]

如果数组规模非常大，NumPy 会自动跳过数组的中间部分，只打印四周角落。

print(np.arange(10000))

## [   0    1    2 ... 9997 9998 9999]

print(np.arange(10000).reshape(100, 100))

## [[   0    1    2 ...   97   98   99]
##  [ 100  101  102 ...  197  198  199]
##  [ 200  201  202 ...  297  298  299]
##  ...
##  [9700 9701 9702 ... 9797 9798 9799]
##  [9800 9801 9802 ... 9897 9898 9899]
##  [9900 9901 9902 ... 9997 9998 9999]]

1.4 基本操作

两个数组之间的算术操作是数组中的元素逐对进行的。

a = np.array([20, 30, 40, 50])
b = np.arange(4)
b

## array([0, 1, 2, 3])

c = a - b
c

## array([20, 29, 38, 47])

b**2

## array([0, 1, 4, 9])

10 * np.sin(a)

## array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])

a < 35

## array([ True,  True, False, False])

a = array(data = c(20, 30, 40, 50), dim = c(1, 4))
b = array(data = 0:3, dim = c(1, 4))
b

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    2    3

c = a -b 
c

##      [,1] [,2] [,3] [,4]
## [1,]   20   29   38   47

b**2

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    4    9

10 * sin(a)

##       [,1]  [,2]  [,3]   [,4]
## [1,] 9.129 -9.88 7.451 -2.624

a < 35

##      [,1] [,2]  [,3]  [,4]
## [1,] TRUE TRUE FALSE FALSE

二维数组即矩阵，乘积算子 * 是对数组中的元素逐对进行的。而真正的矩阵乘法是用 @ 或者函数 dot 。

A = np.array([[1, 1],
              [0, 1]])
B = np.array([[2, 0],
              [3, 4]])
A * B     # elementwise product

## array([[2, 0],
##        [0, 4]])

A @ B     # matrix product

## array([[5, 4],
##        [3, 4]])

A.dot(B)  # another matrix product

## array([[5, 4],
##        [3, 4]])

A <- array(data = c(1, 0, 1, 1), dim = c(2, 2))
B <- array(data = c(2, 3, 0, 4), dim = c(2, 2))
A * B

##      [,1] [,2]
## [1,]    2    0
## [2,]    0    4

A %*% B

##      [,1] [,2]
## [1,]    5    4
## [2,]    3    4

crossprod(t(A), B)

##      [,1] [,2]
## [1,]    5    4
## [2,]    3    4

算子 *= 和 += 修改现有的数组，而不是创建一个新的数组。

rg = np.random.default_rng(1)  # create instance of default random number generator
a = np.ones((2, 3), dtype=int)
b = rg.random((2, 3))
a *= 3
a

## array([[3, 3, 3],
##        [3, 3, 3]])

b += a
b

## array([[3.51182162, 3.9504637 , 3.14415961],
##        [3.94864945, 3.31183145, 3.42332645]])

# a += b  # b is not automatically converted to integer type

当不同类型的数组进行算术操作，会涉及类型转化，通常是向上类型转化，如整型遇到浮点型，整型转化为浮点型。实数遇到复数，先将实数转化为复数。

a = np.ones(3, dtype=np.int32)
b = np.linspace(0, pi, 3)
b.dtype.name

## 'float64'

c = a + b
c

## array([1.        , 2.57079633, 4.14159265])

c.dtype.name

## 'float64'

d = np.exp(c * 1j)
d

## array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
##        -0.54030231-0.84147098j])

d.dtype.name

## 'complex128'

a = array(data = 1L, dim = c(1, 3))
b = seq(from = 0, to = pi, length.out = 3)
typeof(a)

## [1] "integer"

typeof(b)

## [1] "double"

c = a + b
c

##      [,1]  [,2]  [,3]
## [1,]    1 2.571 4.142

typeof(c)

## [1] "double"

d = exp(c * 1i)
d

##                [,1]            [,2]            [,3]
## [1,] 0.5403+0.8415i -0.8415+0.5403i -0.5403-0.8415i

typeof(d)

## [1] "complex"

很多一元算子，如对数组元素求和、求最大值、求最小值，ndarray 类型提供内置的计算方法。

a = rg.random((2, 3))
a

## array([[0.82770259, 0.40919914, 0.54959369],
##        [0.02755911, 0.75351311, 0.53814331]])

a.sum()

## np.float64(3.1057109529998157)

a.min()

## np.float64(0.027559113243068367)

a.max()

## np.float64(0.8277025938204418)

a = array(data = rnorm(6, 0, 1), dim = c(2, 3))
a

##         [,1]   [,2]   [,3]
## [1,]  0.3887 0.4374 -1.732
## [2,] -0.5898 0.5872 -1.054

sum(a)

## [1] -1.962

min(a)

## [1] -1.732

max(a)

## [1] 0.5872

还有一些算子是根据数组的轴（行、列）计算的。

b = np.arange(12).reshape(3, 4)
b

## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [ 8,  9, 10, 11]])

b.sum(axis=0)     # sum of each column

## array([12, 15, 18, 21])

b.min(axis=1)     # min of each row

## array([0, 4, 8])

b.cumsum(axis=1)  # cumulative sum along each row

## array([[ 0,  1,  3,  6],
##        [ 4,  9, 15, 22],
##        [ 8, 17, 27, 38]])

b = matrix(data = 0:11, nrow = 3, ncol = 4, byrow = T)
b

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    2    3
## [2,]    4    5    6    7
## [3,]    8    9   10   11

colSums(b)

## [1] 12 15 18 21

apply(b, 2, sum)

## [1] 12 15 18 21

apply(b, 1, min)

## [1] 0 4 8

t(apply(b, 1, cumsum))

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    3    6
## [2,]    4    9   15   22
## [3,]    8   17   27   38

1.5 通用函数

NumPy 提供一些常见的数学函数，如 sin、 cos 和 exp 等，这些函数称之为通用函数 Universal functions 。在 NumPy 中，在数组上应用正弦函数、指数函数、开平方根函数等是对数组的元素逐个运算。

B = np.arange(3)
B

## array([0, 1, 2])

np.exp(B)

## array([1.        , 2.71828183, 7.3890561 ])

np.sqrt(B)

## array([0.        , 1.        , 1.41421356])

C = np.array([2., -1., 4.])
np.add(B, C)

## array([2., 0., 6.])

B = array(data = 0:2, dim = c(1, 3))
B

##      [,1] [,2] [,3]
## [1,]    0    1    2

exp(B)

##      [,1]  [,2]  [,3]
## [1,]    1 2.718 7.389

sqrt(B)

##      [,1] [,2]  [,3]
## [1,]    0    1 1.414

C = array(data = c(2., -1., 4.), dim = c(1, 3))
B + C

##      [,1] [,2] [,3]
## [1,]    2    0    6

1.6 索引、切片和迭代

可以通过索引、切片和迭代的方式访问一维数组中的元素。

a = np.arange(10)**3
a

## array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

a[2]

## np.int64(8)

a[2:5]

## array([ 8, 27, 64])

# equivalent to a[0:6:2] = 1000;
# from start to position 6, exclusive, set every 2nd element to 1000
a[:6:2] = 1000
a

## array([1000,    1, 1000,   27, 1000,  125,  216,  343,  512,  729])

a[::-1]  # reversed a

## array([ 729,  512,  343,  216,  125, 1000,   27, 1000,    1, 1000])

for i in a:
    print(i**(1 / 3.))

## 9.999999999999998
## 1.0
## 9.999999999999998
## 3.0
## 9.999999999999998
## 4.999999999999999
## 5.999999999999999
## 6.999999999999999
## 7.999999999999999
## 8.999999999999998

a = array(data = 0:9, dim = c(1, 10)) ** 3
a

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    0    1    8   27   64  125  216  343  512   729

a[3]

## [1] 8

a[3:5]

## [1]  8 27 64

a[,c(1, 3, 5)] = 1000
a

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1000    1 1000   27 1000  125  216  343  512   729

rev(a)

##  [1]  729  512  343  216  125 1000   27 1000    1 1000

for (i in a)
    print(i**(1 / 3.))

## [1] 10
## [1] 1
## [1] 10
## [1] 3
## [1] 10
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9

对于多维数组，每个轴可以有一个索引，索引以逗号分隔的元组表示。当提供的索引比轴的数目更少，缺失的索引将看作切片。

def f(x, y):
    return 10 * x + y
b = np.fromfunction(f, (5, 4), dtype=int)
b

## array([[ 0,  1,  2,  3],
##        [10, 11, 12, 13],
##        [20, 21, 22, 23],
##        [30, 31, 32, 33],
##        [40, 41, 42, 43]])

b[2, 3]

## np.int64(23)

b[0:5, 1]  # each row in the second column of b

## array([ 1, 11, 21, 31, 41])

b[:, 1]    # equivalent to the previous example

## array([ 1, 11, 21, 31, 41])

b[1:3, :]  # each column in the second and third row of b

## array([[10, 11, 12, 13],
##        [20, 21, 22, 23]])

b[-1]   # the last row. Equivalent to b[-1, :]

## array([40, 41, 42, 43])

a = array(data = 1, dim = c(5, 4))
b = array(data = rep(0:3, each = 5), dim = c(5, 4))
b = 10 * 0:4 + b
b

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    2    3
## [2,]   10   11   12   13
## [3,]   20   21   22   23
## [4,]   30   31   32   33
## [5,]   40   41   42   43

b[3, 4]

## [1] 23

b[1:5, 2]

## [1]  1 11 21 31 41

b[, 2]

## [1]  1 11 21 31 41

b[2:3, ]

##      [,1] [,2] [,3] [,4]
## [1,]   10   11   12   13
## [2,]   20   21   22   23

b[-1, ] # 它表示去掉第一行，这与 Python 差别很大

##      [,1] [,2] [,3] [,4]
## [1,]   10   11   12   13
## [2,]   20   21   22   23
## [3,]   30   31   32   33
## [4,]   40   41   42   43

访问多维数组：切片。下面是一个 2x2x3 的三维数组，访问数组中的切片时，以省略号 ... 表示未明写的维度。

c = np.array([[[  0,  1,  2],  # a 3D array (two stacked 2D arrays)
               [ 10, 12, 13]],
              [[100, 101, 102],
               [110, 112, 113]]])
c

## array([[[  0,   1,   2],
##         [ 10,  12,  13]],
## 
##        [[100, 101, 102],
##         [110, 112, 113]]])

c.shape

## (2, 2, 3)

c[1, ...]  # same as c[1, :, :] or c[1]

## array([[100, 101, 102],
##        [110, 112, 113]])

c[..., 2]  # same as c[:, :, 2]

## array([[  2,  13],
##        [102, 113]])

注意：缺省维度的表示与 R 语言是不一样的。

c = array(data = c(0 , 10, 1, 12, 2, 13,
                   100, 110, 101, 112,
                   102, 113), dim = c(2, 3, 2))
c

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    0    1    2
## [2,]   10   12   13
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]  100  101  102
## [2,]  110  112  113

dim(c)

## [1] 2 3 2

c[, ,2]

##      [,1] [,2] [,3]
## [1,]  100  101  102
## [2,]  110  112  113

c[, 3,]

##      [,1] [,2]
## [1,]    2  102
## [2,]   13  113

多维数组的迭代操作是基于第一个轴（行）。

for row in c:
    print(row)

## [[ 0  1  2]
##  [10 12 13]]
## [[100 101 102]
##  [110 112 113]]

如果是遍历数组中的每一个元素，则需要用 flat 将数组转成向量。

for element in c.flat:
    print(element)

## 0
## 1
## 2
## 10
## 12
## 13
## 100
## 101
## 102
## 110
## 112
## 113

R 语言中迭代多维数组与 Python 不同。

for (row in c) {
  print(row)
}

## [1] 0
## [1] 10
## [1] 1
## [1] 12
## [1] 2
## [1] 13
## [1] 100
## [1] 110
## [1] 101
## [1] 112
## [1] 102
## [1] 113

2 数组形状操作

2.1 改变数组的形状

将 3x4 的数组转为 6x2 的数组。数组中的元素是按行填充/排列的，操作数组时，也是按行进行的。

a = np.floor(10 * rg.random((3, 4)))
a

## array([[3., 7., 3., 4.],
##        [1., 4., 2., 2.],
##        [7., 2., 4., 9.]])

a.shape

## (3, 4)

a.ravel()  # returns the array, flattened

## array([3., 7., 3., 4., 1., 4., 2., 2., 7., 2., 4., 9.])

a.reshape(6, 2)  # returns the array with a modified shape

## array([[3., 7.],
##        [3., 4.],
##        [1., 4.],
##        [2., 2.],
##        [7., 2.],
##        [4., 9.]])

a.T  # returns the array, transposed

## array([[3., 1., 7.],
##        [7., 4., 2.],
##        [3., 2., 4.],
##        [4., 2., 9.]])

a.T.shape

## (4, 3)

a.shape

## (3, 4)

R 语言中，对数组的操作默认都是按列进行的。

a = floor(10 * array(data = runif(12), dim = c(3, 4)))
a

##      [,1] [,2] [,3] [,4]
## [1,]    8    7    1    6
## [2,]    8    9    3    1
## [3,]    8    1    2    7

dim(a)

## [1] 3 4

as.vector(a)

##  [1] 8 8 8 7 9 1 1 3 2 6 1 7

array(as.vector(a), dim = c(6, 2))

##      [,1] [,2]
## [1,]    8    1
## [2,]    8    3
## [3,]    8    2
## [4,]    7    6
## [5,]    9    1
## [6,]    1    7

aperm(a, perm = c(2, 1))

##      [,1] [,2] [,3]
## [1,]    8    8    8
## [2,]    7    9    1
## [3,]    1    3    2
## [4,]    6    1    7

dim(aperm(a))

## [1] 4 3

dim(a)

## [1] 3 4

函数 aperm() 的参数 perm 接受一个整型向量，向量的长度是维数，向量的元素是 1:N （N 代表维数）之间的数。常见的三维数组中， 1 代表行，2 代表列，3 表示剩下的那个轴（维度）。

NumPy 中的 resize 函数可以直接修改原来的数组。如果函数 reshape 其中一个参数为 -1 ，它表示该位置根据计算得出。

## array([[3., 7., 3., 4.],
##        [1., 4., 2., 2.],
##        [7., 2., 4., 9.]])

a.resize((2, 6))
a

## array([[3., 7., 3., 4., 1., 4.],
##        [2., 2., 7., 2., 4., 9.]])

a.reshape(3, -1)

## array([[3., 7., 3., 4.],
##        [1., 4., 2., 2.],
##        [7., 2., 4., 9.]])

R 语言中无此强行转化数组形状的操作。我暂时也没有想到应用场景。

2.2 堆叠合并数组

垂直 vstack 还是水平 hstack 方向堆叠数组。

a = np.floor(10 * rg.random((2, 2)))
a

## array([[9., 7.],
##        [5., 2.]])

b = np.floor(10 * rg.random((2, 2)))
b

## array([[1., 9.],
##        [5., 1.]])

np.vstack((a, b))

## array([[9., 7.],
##        [5., 2.],
##        [1., 9.],
##        [5., 1.]])

np.hstack((a, b))

## array([[9., 7., 1., 9.],
##        [5., 2., 5., 1.]])

a = floor(10 * array(data = runif(4), dim = c(2, 2)))
a

##      [,1] [,2]
## [1,]    1    3
## [2,]    1    3

b = floor(10 * array(data = runif(4), dim = c(2, 2)))
b

##      [,1] [,2]
## [1,]    9    3
## [2,]    6    5

rbind(a, b)

##      [,1] [,2]
## [1,]    1    3
## [2,]    1    3
## [3,]    9    3
## [4,]    6    5

cbind(a, b)

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    9    3
## [2,]    1    3    6    5

library(abind)
abind(a, b, along = 1) # rbind

##      [,1] [,2]
## [1,]    1    3
## [2,]    1    3
## [3,]    9    3
## [4,]    6    5

abind(a, b, along = 2) # cbind

##      [,1] [,2] [,3] [,4]
## [1,]    1    3    9    3
## [2,]    1    3    6    5

函数 column_stack 可以将两个一维数组堆叠形成一个二维数组。对二维数组的堆叠操作来说，函数 column_stack 等价于函数 hstack 。

from numpy import newaxis
np.column_stack((a, b))  # with 2D arrays

## array([[9., 7., 1., 9.],
##        [5., 2., 5., 1.]])

a = np.array([4., 2.])
b = np.array([3., 8.])
np.column_stack((a, b))  # returns a 2D array

## array([[4., 3.],
##        [2., 8.]])

np.hstack((a, b))        # the result is different

## array([4., 2., 3., 8.])

a[:, newaxis]  # view `a` as a 2D column vector

## array([[4.],
##        [2.]])

np.column_stack((a[:, newaxis], b[:, newaxis]))

## array([[4., 3.],
##        [2., 8.]])

np.hstack((a[:, newaxis], b[:, newaxis]))  # the result is the same

## array([[4., 3.],
##        [2., 8.]])

2.3 拆分数组

函数 hsplit 和 vsplit 可以分别将数组按照水平方向、垂直方向拆分，还可以指定分割点。

a = np.floor(10 * rg.random((2, 12)))
a

## array([[6., 7., 6., 9., 0., 5., 4., 0., 6., 8., 5., 2.],
##        [8., 5., 5., 7., 1., 8., 6., 7., 1., 8., 1., 0.]])

# Split `a` into 3
np.hsplit(a, 3)

## [array([[6., 7., 6., 9.],
##        [8., 5., 5., 7.]]), array([[0., 5., 4., 0.],
##        [1., 8., 6., 7.]]), array([[6., 8., 5., 2.],
##        [1., 8., 1., 0.]])]

# Split `a` after the third and the fourth column
np.hsplit(a, (3, 4))

## [array([[6., 7., 6.],
##        [8., 5., 5.]]), array([[9.],
##        [7.]]), array([[0., 5., 4., 0., 6., 8., 5., 2.],
##        [1., 8., 6., 7., 1., 8., 1., 0.]])]

a = floor(10 * array(data = runif(24), dim = c(2, 12)))
a

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]    6    9    1    2    0    1    4    4    4     5     5     3
## [2,]    2    0    6    1    6    1    1    6    6     8     5     6

asplit(a, MARGIN = 1)

## [[1]]
##  [1] 6 9 1 2 0 1 4 4 4 5 5 3
## 
## [[2]]
##  [1] 2 0 6 1 6 1 1 6 6 8 5 6

asplit() 函数只能按某一个边际维度拆分数据，并不支持 3 等分，按指定位置切分这样的操作。

3 拷贝和视图

3.1 无拷贝

a = np.array([[ 0,  1,  2,  3],
              [ 4,  5,  6,  7],
              [ 8,  9, 10, 11]])
b = a            # no new object is created
b is a           # a and b are two names for the same ndarray object

## True

def f(x):
    print(id(x))
id(a)  # id is a unique identifier of an object

## 4508540272

f(a)

## 4508540272

注意： R 语言中都是深拷贝行为，先复制一份原数据，再修改数据，最后，保存在新的数据对象中。

3.2 视图和浅拷贝

修改原来数组中的元素

c = a.view()
c is a

## False

c.base is a            # c is a view of the data owned by a

## True

c.flags.owndata

## False

c = c.reshape((2, 6))  # a's shape doesn't change, reassigned c is still a view of a
a.shape

## (3, 4)

c[0, 4] = 1234         # a's data changes
a

## array([[   0,    1,    2,    3],
##        [1234,    5,    6,    7],
##        [   8,    9,   10,   11]])

s = a[:, 1:3]
s[:] = 10  # s[:] is a view of s. Note the difference between s = 10 and s[:] = 10
a

## array([[   0,   10,   10,    3],
##        [1234,   10,   10,    7],
##        [   8,   10,   10,   11]])

对数组 c 的修改，同步发生到 a 身上。

3.3 深拷贝

创建一个新的数组，在新的数组中操作

d = a.copy()  # a new array object with new data is created
d is a

## False

d.base is a  # d doesn't share anything with a

## False

d[0, 0] = 9999
d

## array([[9999,   10,   10,    3],
##        [1234,   10,   10,    7],
##        [   8,   10,   10,   11]])

## array([[   0,   10,   10,    3],
##        [1234,   10,   10,    7],
##        [   8,   10,   10,   11]])

4 高级索引和索引技巧

4.1 用索引数组来索引

一个数组的元素代表另一个数组元素的索引位置。

a = np.arange(12)**2  # the first 12 square numbers
a

## array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])

i = np.array([1, 1, 3, 8, 5])  # an array of indices
a[i]  # the elements of `a` at the positions `i`

## array([ 1,  1,  9, 64, 25])

j = np.array([[3, 4], [9, 7]])  # a bidimensional array of indices
a[j]  # the same shape as `j`

## array([[ 9, 16],
##        [81, 49]])

a = np.arange(12).reshape(3, 4)
a

## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [ 8,  9, 10, 11]])

i = np.array([[0, 1],  # indices for the first dim of `a`
              [1, 2]])
j = np.array([[2, 1],  # indices for the second dim
              [3, 3]])
a[i, j]  # i and j must have equal shape

## array([[ 2,  5],
##        [ 7, 11]])

a[i, 2]

## array([[ 2,  6],
##        [ 6, 10]])

a[:, j]

## array([[[ 2,  1],
##         [ 3,  3]],
## 
##        [[ 6,  5],
##         [ 7,  7]],
## 
##        [[10,  9],
##         [11, 11]]])

# 一维数组
a = array(data = (0:11)^2, dim = c(1, 12)) 
a

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]    0    1    4    9   16   25   36   49   64    81   100   121

i = array(data = c(2, 2, 4, 9, 6), dim = c(1, 5))
a[i]

## [1]  1  1  9 64 25

j = array(data = c(4, 10, 5, 8), dim = c(2, 2))
a[as.vector(j)]

## [1]  9 81 16 49

library(abind)
# 按列索引
asub(x = a, idx = list(c(4, 10, 5, 8)), dims = 2)

## [1]  9 81 16 49

R 语言的索引规则与 Python 很不一样，在 R 语言中，按照边际 MARGIN 来切片或索引。

黄湘云