1 准备
2 张量操作
- 2.1 基础操作
- 2.2 线性代数
  - 2.2.1 SVD 分解
  - 2.2.2 QR 分解
3 数值优化

1 准备

创建数组

import tensorflow as tf
A = tf.constant([[1.0, -1.0], [1.0, 1.0]])
A

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[ 1., -1.],
##        [ 1.,  1.]], dtype=float32)>

import numpy as np
a = np.array([[1., -1.], [1., 1.]])
a

## array([[ 1., -1.],
##        [ 1.,  1.]])

2 张量操作

2.1 基础操作

均值和矩阵乘法

# 整个矩阵求平均值
tf.reduce_mean(A)

## <tf.Tensor: shape=(), dtype=float32, numpy=0.5>

# 按列求平均值
tf.reduce_mean(A, axis = 0)

## <tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 0.], dtype=float32)>

# 按行求平均值
tf.reduce_mean(A, axis = 1)

## <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 1.], dtype=float32)>

w = tf.Variable([[1.], [2.]]) # 2x1
x = tf.constant([[3., 4.]])   # 1x2
tf.matmul(w, x) # 矩阵乘法

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[3., 4.],
##        [6., 8.]], dtype=float32)>

2.2 线性代数

2.2.1 SVD 分解

注意 NumPy 和 Tensorflow 分解矩阵的结果在表示上的差别。

s, u, v = np.linalg.svd(a)
s

## array([[-0.70710678, -0.70710678],
##        [-0.70710678,  0.70710678]])

## array([1.41421356, 1.41421356])

## array([[-1., -0.],
##        [ 0.,  1.]])

np.dot(s * u, v)

## array([[ 1., -1.],
##        [ 1.,  1.]])

S, U, V = tf.linalg.svd(A)
S

## <tf.Tensor: shape=(2,), dtype=float32, numpy=array([1.4142135, 1.4142135], dtype=float32)>

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[ 0.70710677, -0.70710677],
##        [ 0.70710677,  0.70710677]], dtype=float32)>

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[1., 0.],
##        [0., 1.]], dtype=float32)>

tf.matmul(U, tf.matmul(tf.linalg.diag(S), V, adjoint_b=True))

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[ 0.99999994, -0.99999994],
##        [ 0.99999994,  0.99999994]], dtype=float32)>

2.2.2 QR 分解

a is a tensor.
q is a tensor of orthonormal matrices.
r is a tensor of upper triangular matrices.

q, r = np.linalg.qr(a)
q

## array([[-0.70710678, -0.70710678],
##        [-0.70710678,  0.70710678]])

## array([[-1.41421356e+00, -3.31822250e-16],
##        [ 0.00000000e+00,  1.41421356e+00]])

q @ r

## array([[ 1., -1.],
##        [ 1.,  1.]])

q @ q.T

## array([[ 1.00000000e+00, -2.22044605e-16],
##        [-2.22044605e-16,  1.00000000e+00]])

Q, R = tf.linalg.qr(A)
Q

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[-0.7071068 , -0.70710677],
##        [-0.70710677,  0.7071068 ]], dtype=float32)>

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[-1.4142135e+00,  1.1920929e-07],
##        [ 0.0000000e+00,  1.4142135e+00]], dtype=float32)>

Q @ R # 等价于 tf.matmul(Q, R)

## <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
## array([[ 1.        , -1.        ],
##        [ 0.99999994,  0.99999994]], dtype=float32)>

3 数值优化

2017 年有篇介绍 tensorflow 的文章，挺有意思，拟合混合正态分布的参数，请看为什么统计学家也应该学学 TensorFlow。我这里简单点，直接拉来一个目标函数，复用 SciPy 做科学计算中的无约束非线性优化示例。

import numpy as np
from scipy.optimize import minimize

def rosen(x):
    """The Rosenbrock function"""
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)
# 初始值
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
# 调用优化器
res = minimize(rosen, x0, method='L-BFGS-B')
# 最终结果：最优解
print(res.x)

## [0.99999957 0.99999915 0.99999834 0.99999667 0.99999336]

# 定义 Rosenbrock 函数（TensorFlow 版本）
# tf.reduce_sum tf.square 类似 tf.reduce_mean 是基础操作函数
def rosen_tf(x):
    return tf.reduce_sum(
        100.0 * tf.square(x[1:] - tf.square(x[:-1])) + tf.square(1.0 - x[:-1])
    )

# 初始化变量（与原始 SciPy 代码相同的初始点）
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2], dtype=np.float32)
# x 是可训练的变量
x = tf.Variable(x0, trainable=True, dtype=tf.float32)

# 创建优化器（使用 Adam 优化器，学习率可根据需要调整）
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# 优化过程：自动微分、梯度优化
# 迭代 1000 次会更加接近最优解
for _ in range(10):
    with tf.GradientTape() as tape:
        # 损失函数
        loss = rosen_tf(x)
    # 自动微分
    gradients = tape.gradient(loss, [x])
    # 梯度优化
    optimizer.apply_gradients(zip(gradients, [x]))

## <Variable path=adam/iteration, shape=(), dtype=int64, value=1>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=2>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=3>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=4>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=5>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=6>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=7>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=8>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=9>
## <Variable path=adam/iteration, shape=(), dtype=int64, value=10>

# 最优解
x.numpy()

## array([1.201897 , 0.798988 , 0.899658 , 1.8009351, 1.2992563],
##       dtype=float32)

# 目标函数值
loss.numpy()

## np.float32(553.37646)

loss

## <tf.Tensor: shape=(), dtype=float32, numpy=553.37646484375>

# 目标函数的梯度
gradients

## [<tf.Tensor: shape=(5,), dtype=float32, numpy=
## array([ 329.04385, -220.2965 , -309.52826, 1646.1017 , -397.82605],
##       dtype=float32)>]

类似 SciPy 的优化求解器来自 TensorFlow Probability，如下代码调整自 API 文档 tfp.optimizer.lbfgs_minimize。

pip install tensorflow_probability tf-keras

import tensorflow_probability as tfp

# 目标函数及梯度
def rosen_loss_and_gradient(x):
    return tfp.math.value_and_gradient(rosen_tf, x)
# L-BFGS 优化器
optim_results = tfp.optimizer.lbfgs_minimize(
    value_and_gradients_function=rosen_loss_and_gradient,
    initial_position=x0,
    num_correction_pairs=10,
    tolerance=1e-8
  )

# 迭代次数
optim_results.num_iterations.numpy()

## np.int32(18)

# 结果显示收敛
optim_results.converged.numpy()

## np.True_

# 最优解
optim_results.position

## <tf.Tensor: shape=(5,), dtype=float32, numpy=array([1., 1., 1., 1., 1.], dtype=float32)>

optim_results.position.numpy()

## array([1., 1., 1., 1., 1.], dtype=float32)

# 目标函数值
optim_results.objective_value.numpy()

## np.float32(0.0)

# 目标函数计算次数
optim_results.num_objective_evaluations.numpy()

## np.int32(52)

走马观花地翻了下 Tensorflow 的 API 目录，产生 Tensorflow 在手，天下我有的感觉。

Tensorflow 的张量（多维数组）操作

黄湘云