Learning Chainer with Examples 〜深層学習への入門〜

はじめに

深層学習フレームワークPyTorchのチュートリアルの中に「Learning PyTorch with Examples」というページがある。隠れ層がひとつの簡単なネットワーク（多層パーセプトロン）をnumpyだけで実装し、少しずつPyTorchのAPIで置き換えていく内容である。個々のAPIが何をしているのかが直感的に分かるとても良い入門記事となっている。本ブログでは、同じことをChainerで行う。本題に入る前に、ここで取り上げるネットワーク構造を用いて、誤差逆伝播法について手短に説明する。

動作環境

macOS Sierra
Python 2.7.14
Chainer 2.1.0

対象とするネットワークの構造

以下のネットワークを扱う。

入力層、隠れ層、出力層の3層から成るネットワークである。それぞれのユニット数は、 $N_{\rm I}$ 、 $N_{\rm H}$ 、 $N_{\rm O}$ である。 $\vec{x}$ は入力データ（ $N_{\rm I}$ 次元の列ベクトル）、 $\vec{y}$ はそれに対応する教師データ（ $N_{\rm O}$ 次元の列ベクトル）を表す。 $\vec{x}$ を入力し、その出力を教師データ $\vec{y}$ と比較することにより訓練を行う。 $W_1$ は入力層と隠れ層の間の重みを表す $N_{\rm H}\times N_{\rm I}$ 行列、 $W_2$ は隠れ層と出力層の間の重みを表す $N_{\rm O}\times N_{\rm H}$ 行列である。

誤差関数

入力データ $\vec{x}$ に対する出力 $\vec{y}_p$ は次式で与えられる。

(1) $\begin{eqnarray*} \vec{y}_p &=& W_2\;\vec{h}_r \\ \vec{h}_r &=& \vec{g}(\vec{h}) \\ \vec{h} &=& W_1\;\vec{x} \end{eqnarray*}$

ここで、 $\vec{g}(\vec{h})$ は次式で定義される活性化関数Rectified Linear Unit（ReLU）である。

(2) $\begin{eqnarray*} \vec{g}(\vec{h})^T&=&\left(f(h_1),\cdots,f(h_{N_{\rm H}})\right) \\ f(h_i)&=& \left \{ \begin{array}{ll} h_i & (h_i\geq 0) \\ 0 & (\mbox{otherwise}) \end{array} \right. \end{eqnarray*}$

$\vec{y}_p$ と教師データ $\vec{y}$ を比較するため次の誤差関数を定義する。

(3) $\begin{equation*} L(W_1,W_2)=\frac{1}{N_{\rm O}}\|\vec{y}_p-\vec{y}\|^{2} \end{equation*}$

$\vec{y}_p$ と $\vec{y}$ の各成分の差の2乗和の平均値である。訓練により、 $L(W_1,W_2)$ が最小となるように重み $W_1$ と $W_2$ を最適化する。

誤差逆伝播法と勾配降下法

$L(W_1,W_2)$ を $W_1$ と $W_2$ で偏微分する。 $W_1,W_2$ の成分を $w_{1,ij},w_{2,ij}$ 、 $\vec{y}_{p},\vec{h}_r, \vec{h}$ の成分を $y_{p,i},h_{r,i},h_{i}$ と書くことにする。微分の連鎖律を用いて、

(4) $\begin{eqnarray*} \frac{\partial L}{\partial w_{2,ij}} &=&\sum_{m}\frac{\partial L}{\partial y_{p,m}}\frac{\partial y_{p,m}}{\partial w_{2,ij}} \\ &=&\sum_{m,n}\frac{\partial L}{\partial y_{p,m}}\frac{\partial}{\partial w_{2,ij}}\left( w_{2,mn}h_{r,n} \rith) \\ &=&\sum_{m,n}\frac{\partial L}{\partial y_{p,m}}\delta_{mi}\delta_{nj}h_{r,n} \\ &=&\frac{\partial L}{\partial y_{p,i}}h_{r,j} \end{eqnarray*}$

(5) $\begin{eqnarray*} \frac{\partial L}{\partial w_{1,ij}}&=&\sum_{m}\frac{\partial L}{\partial h_m}\frac{\partial h_m}{\partial w_{1,ij}} \\ &=&\sum_{m,n}\frac{\partial L}{\partial h_m}\frac{\partial}{\partial w_{1,ij}}\left(w_{1,mn}x_n\right) \\ &=&\sum_{m,n}\frac{\partial L}{\partial h_m}\delta_{mi}\delta_{nj}x_n \\ &=&\frac{\partial L}{\partial h_i}x_j \end{eqnarray*}$

を得る。式(5)の最後の式に現れる $\frac{\partial L}{\partial h_i}$ は以下のように変形できる。

(6) $\begin{eqnarray*} \frac{\partial L}{\partial h_i}&=&\sum_j \frac{\partial L}{\partial y_{p,j}}\frac{\partial y_{p,j}}{\partial h_{i}} \\ &=&\sum_{j,m} \frac{\partial L}{\partial y_{p,j}} \frac{\partial}{\partial h_i}\left(w_{2,jm}f(h_m)\right) \\ &=&\sum_{j,m} \frac{\partial L}{\partial y_{p,j}} w_{2,jm}f'(h_m) \delta_{mi} \\ &=&\sum_{j} \frac{\partial L}{\partial y_{p,j}} w_{2,ji}f'(h_i) \\ &=&f'(h_i)\sum_{j} \frac{\partial L}{\partial y_{p,j}}w_{2,ji} \end{eqnarray*}$

すなわち、 $\frac{\partial L}{\partial h_i}$ は式(4)の最後の式に現れる $\frac{\partial L}{\partial y_{p,i}}$ を使って計算することができる。 $\frac{\partial L}{\partial y_{p,i}}$ は、最上層（出力層）の出力 $\vec{y}_p$ についての微分であり、これさえ計算できれば最下層（入力層）の出力 $\vec{h}$ についての微分 $\frac{\partial L}{\partial h_i}$ を求めることができる。いま、

(7) $\begin{equation*} \frac{\partial L}{\partial y_{p,i}}=\frac{1}{N_{\rm O}}2(y_{p,i}-y_{i}) \end{equation*}$

である。これはネットワークの出力と教師データの差分、すなわち、誤差を表す。誤差を下層に向かって伝播することになる（誤差逆伝播法）。式(7)を用いて、

(8) $\begin{eqnarray*} \frac{\partial L}{\partial w_{2,ij}}&=&\frac{\partial L}{\partial y_{p,i}}h_{r,j} \\ &=&\frac{1}{N_{\rm O}}2(y_{p,i}-y_{i})h_{r,j} \end{eqnarray*}$

を得る。行列の形で書けば

(9) $\begin{equation*} \frac{\partial L}{\partial W_2}&=&\frac{1}{N_{\rm O}}2(\vec{y}_p-\vec{y})\;\vec{h}_r^T \end{equation*}$

となる。ここで、 $T$ は転置を表す。同様にして、

(10) $\begin{eqnarray*} \frac{\partial L}{\partial w_{1,ij}}&=&\frac{\partial L}{\partial h_i}x_j \\ &=&f'(h_i)\sum_{m} \frac{\partial L}{\partial y_{p,m}}w_{2,mi}\;x_j \\ &=&\frac{1}{N_{\rm O}}f'(h_i)\sum_{m} 2(y_{p,m}-y_{m})w_{2,mi}\;x_j \\ &=&\frac{1}{N_{\rm O}}f'(h_i)\sum_{m}w_{2,mi}\;2(y_{p,m}-y_{m})\;x_j \end{eqnarray*}$

を得る。ただし、 $f'(x)$ は $x<0$ のとき $0$ 、それ以外のときは $1$ となることに注意する。行列の形で書けば、

(11) $\begin{equation*} \frac{\partial L}{\partial W_1}&=&\frac{1}{N_{\rm O}}\left[W_2^T\;2(\vec{y}_p-\vec{y})\right]\;\vec{x}^T \end{equation*}$

である。ただし、 $\vec{h}<0$ のときは $0$ であることに注意する。これらを用いて、 $W_1$ と $W_2$ を以下のように更新する。

(12) $\begin{eqnarray*} W_1 &\leftarrow& W_1 - \eta\;\frac{\partial L}{\partial W_1} \\ W_2 &\leftarrow& W_2 - \eta\;\frac{\partial L}{\partial W_2} \end{eqnarray*}$

ここで、 $\eta$ は学習率（正の微小量）である。上の更新を繰り返すことにより、 $L$ を最小値に近づけていく（勾配降下法）。

上の議論は1つのデータの組み $(\vec{x},\vec{y})$ に対するものである。 $M$ 個の組みを扱う場合、式(9),(11)は以下のように拡張される。

(13) $\begin{eqnarray*} \frac{\partial L}{\partial W_2}&=&\frac{1}{N_{\rm O}}2(Y_p-Y)\;H_r^T \\ \frac{\partial L}{\partial W_1}&=&\frac{1}{N_{\rm O}}\left[W_2^T\;2(Y_p-Y)\right]\;X^T \end{eqnarray*}$

ここで、新たに以下の行列を導入した。

(14) $\begin{eqnarray*} X&=&\left(\vec{x}^{\;(1)},\cdots,\vec{x}^{\;(M)}\right) \\ Y&=&\left(\vec{y}^{\;(1)},\cdots,\vec{y}^{\;(M)}\right) \\ Y_p&=&\left(\vec{y}_p^{\;(1)},\cdots,\vec{y}_p^{\;(M)}\right) \\ H_r&=&\left(\vec{h}_r^{\;(1)},\cdots,\vec{h}_r^{\;(M)}\right) \end{eqnarray*}$

実際のコードでは、計算効率のため、ここまでに定義した全ての行列の行と列を転置したものを扱う。後で示すコードと一致させるため、式(13)をさらに変形しておく。式(13)の両辺の転置を取って、

(15) $\begin{eqnarray*} \left(\frac{\partial L}{\partial W_2}\right)^T &=&\left(\frac{1}{N_{\rm O}}2(Y_p-Y)\;H_r^T\right)^T \\ &=&\frac{1}{N_{\rm O}}H_r\;2(Y_p-Y)^T \\ \left(\frac{\partial L}{\partial W_1}\right)^T &=&\left(\frac{1}{N_{\rm O}}\left[W_2^T\;2(Y_p-Y)\right]\;X^T\right)^T \\ &=&\frac{1}{N_{\rm O}}X \left[2(Y_p-Y)^T\;W_2\right] \end{eqnarray*}$

行と列を入れ替えた行列を同じ表記のまま改めて定義し直すと次式を得る。

(16) $\begin{eqnarray*} \frac{\partial L}{\partial W_2} &=&\frac{1}{N_{\rm O}}H_r^T\;2(Y_p-Y) \\ \frac{\partial L}{\partial W_1} &=&\frac{1}{N_{\rm O}}X^T\left[2(Y_p-Y)\;W_2^T\right] \end{eqnarray*}$

numpyによる実装

最初にChainerを使わずnumpyだけを用いて実装したものを示す（PyTorchの記事と同じ）。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np

EPOCHS = 300
M = 64
N_I = 1000
N_H = 100
N_O = 10
LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results
np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)
Y = np.random.randn(M, N_O).astype(np.float32)
W1 = np.random.randn(N_I, N_H).astype(np.float32)
W2 = np.random.randn(N_H, N_O).astype(np.float32)


def sample_1():
    # create random input and output data
    x = X
    y = Y

    # randomly initialize weights
    w1 = W1
    w2 = W2

    y_size = np.float32(M * N_O)
    for t in range(EPOCHS):
        # forward pass
        h = x.dot(w1)
        h_r = np.maximum(h, 0)
        y_p = h_r.dot(w2)

        # compute mean squared error and print loss
        loss = np.square(y_p - y).sum() / y_size
        print(loss)

        # backward pass: compute gradients of loss with respect to w2
        grad_y_p = 2.0 * (y_p - y) / y_size
        grad_w2 = h_r.T.dot(grad_y_p)

        # backward pass: compute gradients of loss with respect to w1
        grad_h_r = grad_y_p.dot(w2.T)
        grad_h = grad_h_r
        grad_h[h < 0] = 0
        grad_w1 = x.T.dot(grad_h)

        # update weights
        w1 -= LEARNING_RATE * grad_w1
        w2 -= LEARNING_RATE * grad_w2


if __name__ == '__main__':
    sample_1()

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import numpy as np

EPOCHS = 300

M = 64

N_I = 1000

N_H = 100

N_O = 10

LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results

np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)

Y = np.random.randn(M, N_O).astype(np.float32)

W1 = np.random.randn(N_I, N_H).astype(np.float32)

W2 = np.random.randn(N_H, N_O).astype(np.float32)

def sample_1():

# create random input and output data

x = X

y = Y

# randomly initialize weights

w1 = W1

w2 = W2

y_size = np.float32(M * N_O)

for t in range(EPOCHS):

# forward pass

h = x.dot(w1)

h_r = np.maximum(h, 0)

y_p = h_r.dot(w2)

# compute mean squared error and print loss

loss = np.square(y_p - y).sum() / y_size

print(loss)

# backward pass: compute gradients of loss with respect to w2

grad_y_p = 2.0 * (y_p - y) / y_size

grad_w2 = h_r.T.dot(grad_y_p)

# backward pass: compute gradients of loss with respect to w1

grad_h_r = grad_y_p.dot(w2.T)

grad_h = grad_h_r

grad_h[h < 0] = 0

grad_w1 = x.T.dot(grad_h)

# update weights

w1 -= LEARNING_RATE * grad_w1

w2 -= LEARNING_RATE * grad_w2

if __name__ == '__main__':

sample_1()

24-25行目： $\vec{x}$ と $\vec{y}$ に適当な値を設定する。
28-29行目： $W_1$ と $W_2$ を適当な値で初期化する。
32行目以降：勾配降下法を行うルーチンである。
34-36行目： $\vec{y}_p$ を計算する（forward計算）。
39行目： $L$ を計算する。
43-44行目： $\frac{\partial L}{\partial W_2}$ を計算する（backward計算）。
47-50行目： $\frac{\partial L}{\partial W_1}$ を計算する（backward計算）。
53-54行目： $W_1$ と $W_2$ を更新する。

理論式と一対一対応していることに注意する。

chainer.Variableの導入

上のコードでは微分計算を全て書き下ろした。chainer.Variableを使うことで、微分計算を自動化することができる。書き換えの手順は以下の通りである。

numpy.arrayからVariable変数を作る（26,27,30,31行目）。
numpy.array間の演算を行う関数をchainer.functions内の関数に置き換える（35,36,37,40行目）。
lossを求めた（35-40行目）あと、loss.backwardを実行する(49行目)。ここで誤差逆伝播法が実行される。
微分値は、w1.gradとw2.gradで得ることができる（52-53行目）。
適当なタイミングでw1とw2の勾配のゼロ初期化が必要である（44-45行目）。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
import chainer.functions as F
from chainer import Variable

EPOCHS = 300
M = 64
N_I = 1000
N_H = 100
N_O = 10
LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results
np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)
Y = np.random.randn(M, N_O).astype(np.float32)
W1 = np.random.randn(N_I, N_H).astype(np.float32)
W2 = np.random.randn(N_H, N_O).astype(np.float32)


def sample_2():
    # create random input and output data
    x = Variable(X)
    y = Variable(Y)

    # randomly initialize weights
    w1 = Variable(W1)
    w2 = Variable(W2)

    for t in range(EPOCHS):
        # forward pass: compute predicted y
        h = F.matmul(x, w1)
        h_r = F.relu(h)
        y_p = F.matmul(h_r, w2)

        # compute and print loss
        loss = F.mean_squared_error(y_p, y)
        print(loss.data)

        # manually zero the gradients
        w1.zerograd()
        w2.zerograd()

        # backward pass
        # loss.grad = np.ones(loss.shape, dtype=np.float32)
        loss.backward()

        # update weights
        w1.data -= LEARNING_RATE * w1.grad
        w2.data -= LEARNING_RATE * w2.grad


if __name__ == '__main__':
    sample_2()

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import numpy as np

import chainer.functions as F

from chainer import Variable

EPOCHS = 300

M = 64

N_I = 1000

N_H = 100

N_O = 10

LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results

np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)

Y = np.random.randn(M, N_O).astype(np.float32)

W1 = np.random.randn(N_I, N_H).astype(np.float32)

W2 = np.random.randn(N_H, N_O).astype(np.float32)

def sample_2():

# create random input and output data

x = Variable(X)

y = Variable(Y)

# randomly initialize weights

w1 = Variable(W1)

w2 = Variable(W2)

for t in range(EPOCHS):

# forward pass: compute predicted y

h = F.matmul(x, w1)

h_r = F.relu(h)

y_p = F.matmul(h_r, w2)

# compute and print loss

loss = F.mean_squared_error(y_p, y)

print(loss.data)

# manually zero the gradients

w1.zerograd()

w2.zerograd()

# backward pass

# loss.grad = np.ones(loss.shape, dtype=np.float32)

loss.backward()

# update weights

w1.data -= LEARNING_RATE * w1.grad

w2.data -= LEARNING_RATE * w2.grad

if __name__ == '__main__':

sample_2()

Variableとchainer.functionsを使うことで、背後で自動微分が実行される。

chainer.Chainの導入

オリジナルなネットワークを、chainer.Chainを継承したクラスとして実装することができる。forward計算をメソッド__call__内で定義する（34-38行目）。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
import chainer
from chainer import Variable
import chainer.functions as F
import chainer.links as L

EPOCHS = 300
M = 64
N_I = 1000
N_H = 100
N_O = 10
LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results
np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)
Y = np.random.randn(M, N_O).astype(np.float32)
W1 = np.random.randn(N_I, N_H).astype(np.float32)
W2 = np.random.randn(N_H, N_O).astype(np.float32)


class TwoLayerNet(chainer.Chain):

    def __init__(self, d_in, h, d_out):
        super(TwoLayerNet, self).__init__(
            linear1=L.Linear(d_in, h,  initialW=W1.transpose()),
            linear2=L.Linear(h, d_out, initialW=W2.transpose())
        )

    def __call__(self, x):
        g = self.linear1(x)
        h_r = F.relu(g)
        y_p = self.linear2(h_r)
        return y_p


def sample_3():
    # create random input and output data
    x = Variable(X)
    y = Variable(Y)

    # create a network
    model = TwoLayerNet(N_I, N_H, N_O)

    for t in range(EPOCHS):
        # forward
        y_p = model(x)

        # compute and print loss
        loss = F.mean_squared_error(y_p, y)
        print(loss.data)

        # zero the gradients
        model.cleargrads()

        # backward
        loss.backward()

        # update weights
        model.linear1.W.data -= LEARNING_RATE * model.linear1.W.grad
        model.linear2.W.data -= LEARNING_RATE * model.linear2.W.grad


if __name__ == '__main__':
    sample_3()

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import numpy as np

import chainer

from chainer import Variable

import chainer.functions as F

import chainer.links as L

EPOCHS = 300

M = 64

N_I = 1000

N_H = 100

N_O = 10

LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results

np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)

Y = np.random.randn(M, N_O).astype(np.float32)

W1 = np.random.randn(N_I, N_H).astype(np.float32)

W2 = np.random.randn(N_H, N_O).astype(np.float32)

class TwoLayerNet(chainer.Chain):

def __init__(self, d_in, h, d_out):

super(TwoLayerNet, self).__init__(

linear1=L.Linear(d_in, h, initialW=W1.transpose()),

linear2=L.Linear(h, d_out, initialW=W2.transpose())

)

def __call__(self, x):

g = self.linear1(x)

h_r = F.relu(g)

y_p = self.linear2(h_r)

return y_p

def sample_3():

# create random input and output data

x = Variable(X)

y = Variable(Y)

# create a network

model = TwoLayerNet(N_I, N_H, N_O)

for t in range(EPOCHS):

# forward

y_p = model(x)

# compute and print loss

loss = F.mean_squared_error(y_p, y)

print(loss.data)

# zero the gradients

model.cleargrads()

# backward

loss.backward()

# update weights

model.linear1.W.data -= LEARNING_RATE * model.linear1.W.grad

model.linear2.W.data -= LEARNING_RATE * model.linear2.W.grad

if __name__ == '__main__':

sample_3()

chainer.optimizerの導入

ここまでの例では重みの更新式を露わに書いてきた。chainer.optimizersを使うことで、この煩雑さをなくすことできる（71行目）。今回の最適化手法は（確率的）勾配降下法であるが（51行目）、chainer.optimizersは様々な最適化手法を提供している。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
import chainer
from chainer import Variable
import chainer.functions as F
import chainer.optimizers as P
import chainer.links as L

EPOCHS = 300
M = 64
N_I = 1000
N_H = 100
N_O = 10
LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results
np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)
Y = np.random.randn(M, N_O).astype(np.float32)
W1 = np.random.randn(N_I, N_H).astype(np.float32)
W2 = np.random.randn(N_H, N_O).astype(np.float32)


class TwoLayerNet(chainer.Chain):

    def __init__(self, d_in, h, d_out):
        super(TwoLayerNet, self).__init__(
            linear1=L.Linear(d_in, h,  initialW=W1.transpose()),
            linear2=L.Linear(h, d_out, initialW=W2.transpose())
        )

    def __call__(self, x):
        g = self.linear1(x)
        h_r = F.relu(g)
        y_p = self.linear2(h_r)
        return y_p


def sample_4():
    # create random input and output data
    x = Variable(X)
    y = Variable(Y)

    # create a network
    model = TwoLayerNet(N_I, N_H, N_O)

    # create an optimizer
    optimizer = P.SGD(lr=LEARNING_RATE)

    # connect the optimizer with the network
    optimizer.setup(model)

    for t in range(EPOCHS):
        # forward pass: compute predicted y
        y_p = model(x)

        # compute and print loss
        loss = F.mean_squared_error(y_p, y)
        print(loss.data)

        # zero the gradients
        model.cleargrads()

        # backward
        loss.backward()

        # update weights
        optimizer.update()


if __name__ == '__main__':
    sample_4()

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import numpy as np

import chainer

from chainer import Variable

import chainer.functions as F

import chainer.optimizers as P

import chainer.links as L

EPOCHS = 300

M = 64

N_I = 1000

N_H = 100

N_O = 10

LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results

np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)

Y = np.random.randn(M, N_O).astype(np.float32)

W1 = np.random.randn(N_I, N_H).astype(np.float32)

W2 = np.random.randn(N_H, N_O).astype(np.float32)

class TwoLayerNet(chainer.Chain):

def __init__(self, d_in, h, d_out):

super(TwoLayerNet, self).__init__(

linear1=L.Linear(d_in, h, initialW=W1.transpose()),

linear2=L.Linear(h, d_out, initialW=W2.transpose())

)

def __call__(self, x):

g = self.linear1(x)

h_r = F.relu(g)

y_p = self.linear2(h_r)

return y_p

def sample_4():

# create random input and output data

x = Variable(X)

y = Variable(Y)

# create a network

model = TwoLayerNet(N_I, N_H, N_O)

# create an optimizer

optimizer = P.SGD(lr=LEARNING_RATE)

# connect the optimizer with the network

optimizer.setup(model)

for t in range(EPOCHS):

# forward pass: compute predicted y

y_p = model(x)

# compute and print loss

loss = F.mean_squared_error(y_p, y)

print(loss.data)

# zero the gradients

model.cleargrads()

# backward

loss.backward()

# update weights

optimizer.update()

if __name__ == '__main__':

sample_4()

ここまでの置き換えで、ループの中の処理をかなりクリアにするこができた。

chainer.trainingの導入

ここまでの例では、勾配降下法を行うループを露わに書いてきた。chainer.trainingなどを使うことでループをなくすことができる。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
import chainer
import chainer.functions as F
import chainer.optimizers as P
import chainer.links as L
import chainer.datasets as D
import chainer.iterators as Iter
from chainer import training
from chainer.training import extensions
from chainer import reporter

EPOCHS = 300
M = 64
N_I = 1000
N_H = 100
N_O = 10
LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results
np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)
Y = np.random.randn(M, N_O).astype(np.float32)
W1 = np.random.randn(N_I, N_H).astype(np.float32)
W2 = np.random.randn(N_H, N_O).astype(np.float32)


class TwoLayerNet(chainer.Chain):

    def __init__(self, d_in, h, d_out):
        super(TwoLayerNet, self).__init__(
            linear1=L.Linear(d_in, h,  initialW=W1.transpose()),
            linear2=L.Linear(h, d_out, initialW=W2.transpose())
        )

    def __call__(self, x):
        g = self.linear1(x)
        h_r = F.relu(g)
        y_p = self.linear2(h_r)
        return y_p


class LossCalculator(chainer.Chain):

    def __init__(self, model):
        super(LossCalculator, self).__init__()
        with self.init_scope():
            self.model = model

    def __call__(self, x, y):
        y_p = self.model(x)
        loss = F.mean_squared_error(y_p, y)
        reporter.report({'loss': loss}, self)
        return loss


def sample_5():
    # make a iterator
    dataset = D.TupleDataset(X, Y)
    train_iter = Iter.SerialIterator(dataset, batch_size=M, shuffle=False)

    # create a network
    model = TwoLayerNet(N_I, N_H, N_O)
    loss_calculator = LossCalculator(model)

    # create an optimizer
    optimizer = P.SGD(lr=LEARNING_RATE)

    # connect the optimizer with the network
    optimizer.setup(loss_calculator)

    # make a updater
    updater = training.StandardUpdater(train_iter, optimizer)

    # make a trainer
    trainer = training.Trainer(updater, (EPOCHS, 'epoch'), out='result')
    trainer.extend(extensions.LogReport())
    trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'elapsed_time']))

    trainer.run()


if __name__ == '__main__':
    sample_5()

#!/usr/bin/env python

# -*- coding: utf-8 -*-

import numpy as np

import chainer

import chainer.functions as F

import chainer.optimizers as P

import chainer.links as L

import chainer.datasets as D

import chainer.iterators as Iter

from chainer import training

from chainer.training import extensions

from chainer import reporter

EPOCHS = 300

M = 64

N_I = 1000

N_H = 100

N_O = 10

LEARNING_RATE = 1.0e-04

# set a specified seed to random value generator in order to reproduce the same results

np.random.seed(1)

X = np.random.randn(M, N_I).astype(np.float32)

Y = np.random.randn(M, N_O).astype(np.float32)

W1 = np.random.randn(N_I, N_H).astype(np.float32)

W2 = np.random.randn(N_H, N_O).astype(np.float32)

class TwoLayerNet(chainer.Chain):

def __init__(self, d_in, h, d_out):

super(TwoLayerNet, self).__init__(

linear1=L.Linear(d_in, h, initialW=W1.transpose()),

linear2=L.Linear(h, d_out, initialW=W2.transpose())

)

def __call__(self, x):

g = self.linear1(x)

h_r = F.relu(g)

y_p = self.linear2(h_r)

return y_p

class LossCalculator(chainer.Chain):

def __init__(self, model):

super(LossCalculator, self).__init__()

with self.init_scope():

self.model = model

def __call__(self, x, y):

y_p = self.model(x)

loss = F.mean_squared_error(y_p, y)

reporter.report({'loss': loss}, self)

return loss

def sample_5():

# make a iterator

dataset = D.TupleDataset(X, Y)

train_iter = Iter.SerialIterator(dataset, batch_size=M, shuffle=False)

# create a network

model = TwoLayerNet(N_I, N_H, N_O)

loss_calculator = LossCalculator(model)

# create an optimizer

optimizer = P.SGD(lr=LEARNING_RATE)

# connect the optimizer with the network

optimizer.setup(loss_calculator)

# make a updater

updater = training.StandardUpdater(train_iter, optimizer)

# make a trainer

trainer = training.Trainer(updater, (EPOCHS, 'epoch'), out='result')

trainer.extend(extensions.LogReport())

trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'elapsed_time']))

trainer.run()

if __name__ == '__main__':

sample_5()

上のコードは、numpyによるコードと異なり、何をするのかを宣言するコードとなっている。これが、Chainerによる抽象化の恩恵である。もちろん、これら5つのスクリプトで計算した誤差（loss）とエポック（epoch）の関係は（ほぼ）一致する。

epoch数が増えるにつれて誤差は小さくなることが分かる。

まとめ

Chainerの提供するAPIを使うことにより、深層学習の背後にある煩雑さ（誤差逆伝播法、勾配降下法をはじめとする各種最適化手法などの詳細）を隠蔽して、ネットワーク構造やその訓練過程を実装することができる。今回は触れなかったが、数行追加するだけでGPUとCPUのどちらでも動作するコードに仕立て上げることも可能である。
これまでに大変多くの深層学習フレームワークが公開されている。どれか一つに固執することなく複数のフレームワークを使うことをお薦めしたい。各フレームワークの長所・短所を知ることで深層学習についての理解も深まるためである。

Kumada Seiya

仕事であろうとなかろうと勉強し続ける、その結果”中身”を知ったエンジニアになれる

Doc2Vecでpythonスクリプトを学習

「IoTの活用方法」ついて「QCD」から考える

Learning Chainer with Examples 〜深層学習への入門〜

はじめに

動作環境

対象とするネットワークの構造

誤差関数

誤差逆伝播法と勾配降下法

numpyによる実装

chainer.Variableの導入

chainer.Chainの導入

chainer.optimizerの導入

chainer.trainingの導入

まとめ

Kumada Seiya

最近の記事

Nano Bananaの描画能力

LangExtract

LLMとMCPの連携

LLMの出力の構造化データへの変換

MCP Python SDKによるMCPサーバの構築

OpenAI Agents SDK

Variational Auto Encoder 〜外れ値検知への応用〜

PyMCによるMarkov Chain Monte Carlo

線形回帰をPythonで数式から逃げずに実装してみた

DeepWalk

k近傍法による時系列データの異常検知

fastTextで未知語の類似語を探してみる

Google Vision APIでOCR

アーカイブ

カテゴリー