Deep Learning part2

前回のCaffeインストール & 動作確認の続きから。今回は学習済みのモデルを利用してgoogle画像検索で拾ってきた適当な画像を判別するところまでやってみた。

models, net, layer

Caffeによる学習方法を定義する際にmodels, net, layerみたいな用語が出てくる。これらがどういう関係なのかを把握したいのだ。

ソースコード管理上は、こんな感じ。

* models/bvlc_xxxnet/
    - deploy.prototxt
    - solver.prototxt
    - train_val.prototxt

? bvlcってなんぞ？
solver.prototxtは学習時のパラメ―タを列挙するようだな
- 学習で利用するネットの定義
- 繰り返し回数,ラーニングレートの初期値,テストの実行間隔などのパラメータ。
deploy.prototxtとtrain_val.prototxtは、それぞれレイヤのヒエラルキー構造みたいのが書いてるけど、どう使い分けるか不明。→トレーニング時のレイヤネットワークと、判別時のレイヤネットワークってことらしい。

bvlc_alexnetを見てみよう。。

models/bvlc_alexnet/solver.prototxt

net: "models/bvlc_alexnet/train_val.prototxt"
test_iter: 1000
test_interval: 1000
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 450000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000
snapshot_prefix: "models/bvlc_alexnet/caffe_alexnet_train"
solver_mode: GPU

学習で用いるnetにmodels/bvlc_alexnet/train_val.prototxtを参照しているっぽい。solver.prototxtが一番上位の設定なのかな。train_val.prototxtにはレイヤの階層構造が書いてあって結構長い。

What is the purpose of deploy.prototxt?

I know the MATLAB wrapper uses the deploy.prototxt for forward propagation. I think in general it’s used for the deployment of an already trained model.
Also note that you can simply use the deploy.prototxt from the bvlc_reference_caffenet, making sure to change the “name” and “top” of the fc8 layer, as well as the “bottom” of the layer after the fc8 layer, to the name you gave it in train_val.prototxt.

意訳:

Forward propagationのためにMATLABラッパーが使うよ。普通は既に学習済みのモデルをデプロイするために使うねー。それとfc8レイヤ（加えてfc8レイヤ以降のレイヤもだけど）のnameとtopをtrain_val.prototextで設定してやつ変更して使うこともできるよん。

いまいち要領を得ない回答だ…。

うーん。多分だけど、大量のデータを与えて訓練を行うときに利用するのがtrain_val.prototext、訓練済みのモデルを利用して分類を行うときに利用するのがdeploy.prototextなのかなー？→そうらしい。

トレーニング

Brewing ImageNetに書いてあるとおりに、もう少し自分で手を動かして学習とやらをやってみよう…！

まずはデータが必要そうだ。大量の画像データ。ImageNetってとこにILSVRC12 challengeで使われたデータがあるらしい。ILSVRC12 challengeの概要。

一部抜粋してきた。

Data

The validation and test data for this competition will consist of 150,000 photographs, collected from flickr and other search engines, hand labeled with the presence or absence of 1000 object categories. The 1000 object categories contain both internal nodes and leaf nodes of ImageNet, but do not overlap with each other. A random subset of 50,000 of the images with labels will be released as validation data included in the development kit along with a list of the 1000 categories. The remaining images will be used for evaluation and will be released without labels at test time.
The training data, the subset of ImageNet containing the 1000 categories and 1.2 million images, will be packaged for easy downloading. The validation and test data for this competition are not contained in the ImageNet training data (we will remove any duplicates).

意訳:

検証及びテストデータ数が150,000。flicker及び他の検索エンジンから集めてきた。手作業で1000カテゴリのラベルを振ってある。この1000カテゴリだが、ImageNetの末端ノードと中間ノードを含んでいる。ただ、重複はしていない。この1000カテゴリに従って、ランダムにピックアップしたラベル付きの50,000枚のイメージが、検証データとしてリリースされる。残りのイメージは、評価、及びテスト時にラベル無しで利用される。トレーニングデータだが、これは1000カテゴリを含んだImageNetのサブセットだ。120万イメージあるからな。パックにしてやったからダウンロードは楽だぜ。この競技での検証データ、テストデータはImageNetのトレーニングデータは含んでないからな。

120万イメージ…データサイズ138GB…こんなの無理に決まってるじゃない！ももも、もう少し小規模なトレーニングデータとそのラベルファイルは無いのか！！

138GBのダウンロードは難しいけど、このダウンロードが終わったとして、次に何をすればいいかは見ておく。このトレーニング用データの他に、データ中の各画像のカテゴリが何であるかを記載した情報が必要になる。この情報を取得するスクリプトがCaffeのリポジトリ中にあったので使う。

./data/ilsvrc12/get_ilsvrc_aux.sh

DIR="$( cd "$(dirname "$0")" ; pwd -P )"
cd $DIR

echo "Downloading..."

wget http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz

echo "Unzipping..."

tar -xf caffe_ilsvrc12.tar.gz && rm -f caffe_ilsvrc12.tar.gz

echo "Done."

tar.gzをダウンロードして展開してるだけだね。こんなファイルが展開される。

$ ls
det_synset_words.txt  imagenet.bet.pickle        synset_words.txt  test.txt   val.txt
get_ilsvrc_aux.sh     imagenet_mean.binaryproto  synsets.txt       train.txt

トレーニングデータの分類はこんな感じにつらつらと書かれている。test.txt, val.txtも同様。それぞれテスト用と検証用なのかな。0ってのがカテゴリ番号だと思われる。999まである。

train.txt

n01440764/n01440764_10026.JPEG 0
n01440764/n01440764_10027.JPEG 0
n01440764/n01440764_10029.JPEG 0
n01440764/n01440764_10040.JPEG 0
n01440764/n01440764_10042.JPEG 0
n01440764/n01440764_10043.JPEG 0

これとは別にカテゴリとカテゴリ名の対応表がこちら。

synset_words.txt

n01440764 tench, Tinca tinca
n01443537 goldfish, Carassius auratus
n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
n01491361 tiger shark, Galeocerdo cuvieri
n01494475 hammerhead, hammerhead shark
...
n13054560 bolete
n13133613 ear, spike, capitulum
n15075141 toilet tissue, toilet paper, bathroom tissue

ちょうど1000行。カテゴリ番号ではなくて、ディレクトリ名が書いてあるのが気になるが…上から順に0,1,2,...,999ってことでいいのかな。

続いてトレーニングデータのリサイズについて。

You may want to resize the images to 256x256 in advance. By default, we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightweight mincepie package. If you prefer things to be simpler, you can also use shell commands, something like:

意訳:

+αでimageを256x256にリサイズしたい奴もいるかもな。デフォルトだと、リサイズはしない設定になっているぜ。クラスタ環境があるやつはMAP-REDUCE使った並列処理でリサイズしたほうがマシだろうしな。例えばだが、Yangqingさんは軽量なmincepleパッケージを使ってたぜ。もっと単純にリサイズだけやりたいんだったらシェルコマンドでやってしまうといいぜ！

で、そのシェルコマンドがこちら。

for name in /path/to/imagenet/val/*.JPEG; do
    convert -resize 256x256\! $name $name
done

Take a look at examples/imagenet/create_imagenet.sh. Set the paths to the train and val dirs as needed, and set “RESIZE=true” to resize all images to 256x256 if you haven’t resized the images in advance.

意訳:

ちなみにexamples/imagenet/create_imagenet.shだが、こいつを開いて訓練用データと検証用データのディレクトリパスを設定しておいてくれ。あと、上の方法でイメージをリサイズしてない場合はRESIZE=trueもな！全ての画像を256 x 256にするから。

訓練用データと検証用データのデータベース(LevelDB)作成。

作成用のスクリプトが提供されている。中身を見てみよう。

examples/imagenet/create_imagenet.sh から一部抜粋。

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/train.txt \
    $EXAMPLE/ilsvrc12_train_lmdb

echo "Creating val lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $VAL_DATA_ROOT \
    $DATA/val.txt \
    $EXAMPLE/ilsvrc12_val_lmdb

convert_imagesetっていう変換用のプログラムを叩いているだけが。引数に訓練/テスト画像ディレクトリのパスとtrain.txt（画像のパスとカテゴリのペアが入ったファイル）を渡している。出力がそれぞれ、$EXAMPLE/ilsvrc12_train_lmdbと$EXAMPLE/ilsvrc12_val_lmdb。実行するリソースも時間も無いが理解はした…！

Image Mean - 平均画像？

学習のためのモデルが各画像の平均をとった？Image Meanというのを必要とするから、これを作れってことらしい。

./examples/imagenet/make_imagenet_mean.sh

EXAMPLE=examples/imagenet
DATA=data/ilsvrc12
TOOLS=build/tools

$TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_train_lmdb \
  $DATA/imagenet_mean.binaryproto

実行するスクリプトの中身はこんなの。さっき作った訓練用データのLevelDBを引数で渡すと、imagenet_mean.binaryprotoというImage Meanを作ってくれるようだ。。

TODO 後でこの平均画像とやらを表示する方法が無いか探ってみよう…。

モデルの定義

モデルの定義（Deep Learning Networkの同義と取っていいのかな）は既に提供されているものがある。このインストラクションではmodels/bvlc_reference_caffenet/train_val.prototxtってのを使うらしい。Krizhevsky らが作ったものベースだと書いてある。

他にも以下のようなものがある。色々調べていると、bvlc_alexnetを使っている人が多いような。ILSVRC12とやらでトップを成績を収めたものをベースにレイヤの配置を改良したものだとか。

models/bvlc_alexnet/train_val.prototxt
models/bvlc_googlenet/train_val.prototxt
models/bvlc_reference_rcnn_ilsvrc13/train_val.prototxt

このファイルの中に上で作成したLevelDBやらImage Meanをパスを記述するところがあるので、別のディレクトリに作ってしまったぜって場合は変更する必要がありそう。各パラメータの意味は今のところよく分からぬ。

あとはこのprototxtファイルの中に2つのネットワーク定義が書かれている。それぞれphase: TRAINとphase: TEST。どちらのネットワークもだいたい同じなんだけど、どちらかのフェーズでしか使わないものはinclude { phase: TEST }。こんな感じで区別している。この場合は入力レイヤ（複数）と1つの出力レイヤだけが違うよってことらしい。

学習

./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt

すっごい時間かかりそう。学習を再開したくば（途中で止めていいのかなー？）こうらしい。

./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --snapshot=models/bvlc_reference_caffenet/caffenet_train_10000.solverstate

学習済みモデルの取得

ImageNetのデータ（133G）を学習させるのは諦めた。代わりに学習済みモデルをダウンロードして、識別が上手くいくかを先に試したい。

https://github.com/BVLC/caffe/tree/master/models/bvlc_reference_caffenet に学習済みモデルデータが配布されていた。こいつを使うぞ。

cd models/bvlc_reference_caffenet
wget http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel

データサイズが233Mあるけどトレーニングデータ(138GB)に比べればマシである。

学習済みモデルを使った分類

こちらの記事を参考にpythonのスクリプトを改修して猫画像の分類をやってみた。

cd ~/caffe
cp models/bvlc_reference_caffenet/deploy.prototxt works/
cp models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel works/
cp python/caffe/imagenet/ilsvrc_2012_mean.npy works/
# 猫画像をworks下にダウンロード

実行するpythonスクリプト。

import numpy as np
import matplotlib.pyplot as plt
import sys
caffe_root = '../'
sys.path.insert(0, caffe_root + 'python')
import caffe

# Set the pathes of model definition, trained model and image file to be predicted
MODEL_FILE = './deploy.prototxt'
PRETRAINED = './bvlc_reference_caffenet.caffemodel'
IMAGE_FILE = './cat.jpg'

# load classifier
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
                       mean=np.load('./ilsvrc_2012_mean.npy').mean(1).mean(1),
                       channel_swap=(2,1,0),
                       raw_scale=255,
                       image_dims=(256, 256))

#net.set_phase_test()
#caffe.set_phase_test()
#net.set_mode_cpu()
caffe.set_mode_cpu()

# load image file to be predicted
input_image = caffe.io.load_image(IMAGE_FILE)

# predict
prediction = net.predict([input_image])
sorted_predict = sorted(range(len(prediction[0])),key=lambda x:prediction[0][x],reverse=True)

# print top5 result
for i in sorted_predict[0:5]:
    print 'class=',i,', score=',prediction[0][i]

# plot image and result
plt.subplot(2,1,1)
plt.imshow(input_image)
plt.subplot(2,1,2)
plt.plot(prediction[0])
plt.show()

実行するよ。

python
execfile('imagenet.py')
...
E0702 18:35:09.714195 12505 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: ./bvlc_reference_caffenet.caffemodel
I0702 18:35:09.836817 12505 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter
class= 285 , score= 0.496321
class= 281 , score= 0.395892
class= 282 , score= 0.081973
class= 287 , score= 0.00814288
class= 284 , score= 0.00703797

deprecatedなパラメータ使ってるぜ！と言われたものの、結果は出てきた。2012年度のモデル？使ってるから古いのかなー。とにかく、class=285なんじゃない？とのこと。285ってなんだろう。対応表を見て確認してみる。

data/ilsvrc12/synset_words.txt

285 n02124075 Egyptian cat
281 n02123045 tabby, tabby cat
282 n02123159 tiger cat
287 n02127052 lynx, catamount
284 n02123597 Siamese cat, Siamese

おぉ…いい線いってるぞ。猫だとは認識されている。Egyptian catで検索した画像がこちら。確かに特徴がよく似ている気がする…！なんか感動した！！

次は自前の画像データとカテゴリを用意して学習をやってみたい。つづく。

参考リンク

Brewing ImageNet
- Caffe: ImageNetのトレーニングデータを使った学習のインストラクション
ImageNet
- 画像認識コンテストみたいのを1年おきに開催していて、トレーニング用の画像データなんかを配布している。
- 配布しているデータの規模が大きすぎて、個人で遊ぶのはちょっとつらそう。
- Deep Learning覚え書き（Caffenetによる画像分類編）
Caffe Model Zoo
- 学習済みモデルのリンク集みたいなの

Written with StackEdit.

Yukaary Craft

Thursday, July 2, 2015