728x90
월요일!
오늘은 복잡한 이미지 학습(캐글의 개와 고양이 예제)과 Generator를 배운다.
1. 이미지 파일 → csv 파일로 변환
jpg 파일을 읽어서 RGB pixel 값을 얻어내고(decoding) 실수로 변환한 다음 정규화
작업을 위해 관리자 권한으로 tqdm(상태 진행률 알려주는 프로그레스 바 라이브러리)과 ipywidgets, 이미지 처리를 위한 opencv 설치
@ Anaconda Prompt
conda install -c conda-forge tqdm
conda install -c conda-forge ipywidgets
pip install opencv-python
jupyter notebook
@ Jupyter Notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2 as cv # OpenCV
from sklearn import utils
from tqdm.notebook import tqdm # progressbar처리
import os # file 경로 설정
# file 경로 설정
train_dir = 'C:/jupyter_home/data/cat_dog/train'
# label(target)을 알아내기 위한 함수. cat => 0, dog => 1. img는 파일이름(cat.0.jpg)
def labeling(img):
class_name = img.split('.')[0]
if class_name == 'cat':
return 0
if class_name == 'dog':
return 1
# label data(t_data)와 pixel data(x_data)를 저장할 변수
x_data = []
t_data = []
# 파일을 하나씩 반복하면서 처리
for img in tqdm(os.listdir(train_dir),
total=len(os.listdir(train_dir)),
position=0,
leave=True):
label_data = labeling(img) # 0 or 1
img_path = os.path.join(train_dir,img)
# img_path(이미지 full path)를 이용해서 opencv를 통해 픽셀데이터를 추출
img_data = cv.resize(cv.imread(img_path, cv.IMREAD_GRAYSCALE), (80,80))
t_data.append(label_data) # [0, 1, 1, 0, 0, ...]
x_data.append(img_data.ravel())
t_df = pd.DataFrame({'label' : t_data})
# display(t_df.head())
x_df = pd.DataFrame(x_data)
# display(x_df.head())
# x_data와 t_data를 결합
df = pd.merge(t_df, x_df, left_index=True, right_index=True)
# display(df.head()); print(df.shape) # (25000, 6401)
# shuffle로 dataframe의 row를 섞기
shuffle_df = utils.shuffle(df)
# display(shuffle_df.head())
# 최종적으로 만들어진 DataFrame을 파일로 저장
shuffle_df.to_csv('C:/jupyter_home/data/cat_dog/train.csv', index=False)
2. Google Colab에서 CNN 구현
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
# Raw Data Loading
# 데이터가 500M 정도이기 때문에 불러올 수 있지만, 더 크면 유료 Google Storage Service 사용해야
df = pd.read_csv('/content/drive/MyDrive/Colab 멀캠 이지연/cat_dog/train.csv')
# display(df.head()); print(df.shape) # (25000, 6401)
x_data = df.drop('label', axis=1, inplace=False).values
t_data = df['label'].values
plt.imshow(x_data[777:778].reshape(80,80),cmap='gray')
plt.show()
# Data Split
train_x_data, test_x_data, train_t_data, test_t_data = \
train_test_split(x_data, t_data, test_size=0.3, stratify=t_data)
# Normalization
scaler = MinMaxScaler()
scaler.fit(train_x_data)
norm_train_x_data = scaler.transform(train_x_data)
norm_test_x_data = scaler.transform(test_x_data)
# Model
model = Sequential()
# CNN Model Feature Extracting
model.add(Conv2D(filters=64,
kernel_size=(3,3),
activation='relu',
padding='SAME',
input_shape=(80,80,1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=128,
kernel_size=(3,3),
activation='relu',
padding='SAME'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64,
kernel_size=(3,3),
activation='relu',
padding='SAME'))
model.add(MaxPooling2D(pool_size=(2,2)))
# DNN: FC Layer
model.add(Flatten())
model.add(Dropout(rate=0.5))
# DNN: Hidden Layer
model.add(Dense(units=256, activation='relu'))
model.add(Dense(units=1, activation='sigmoid')) # Binary Classification
print(model.summary())
model.compile(optimizer=Adam(learning_rate=1e-4),
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(norm_train_x_data.reshape(-1, 80, 80, 1),
train_t_data.reshape(-1,1),
epochs=200,
batch_size=100,
verbose=1,
validation_split=0.3)
# Model 저장
model.save('/content/drive/MyDrive/Colab 멀캠 이지연/cat_dog/full_data_model/full_data_model.h5')
# Evaluation
result = model.evaluate(norm_test_x_data.reshape(-1,80,80,1),
test_t_data.reshape(-1,1))
print(result) # loss: 0.8640 - accuracy: 0.7831
# new_model = load_model('/content/drive/MyDrive/Colab 멀캠 이지연/cat_dog/full_data_model/full_data_model.h5')
# new_model.evaluate(norm_test_x_data.reshape(-1,80,80,1),
# test_t_data.reshape(-1,1))
# print(new_model) # loss: 0.8640 - accuracy: 0.7831
# history 객체로 그래프 그려보기
train_acc = history.history['accuracy']
train_loss = history.history['loss']
valid_acc = history.history['val_accuracy']
valid_loss = history.history['val_loss']
figure = plt.figure(figsize=(15,5))
ax1 = figure.add_subplot(1,2,1)
ax2 = figure.add_subplot(1,2,2)
ax1.plot(train_acc, color='b', label='training accuracy')
ax1.plot(valid_acc, color='r', label='valid accuracy')
ax1.legend()
ax1.grid()
ax2.plot(train_loss, color='b', label='training loss')
ax2.plot(valid_loss, color='r', label='valid loss')
ax2.legend()
ax2.grid()
plt.tight_layout()
plt.show()
앞의 방법은 모든 이미지를 CSV 파일(50MB)로 변환해서 한 번에 DataFrame으로 load. 만약 각 이미지의 사이즈가 크면, DataFrame으로 한 번에 불러올 수 없음
→ 이런 경우 keras의 ImageDataGenerator를 사용, 하지만 속도가 느림
- 1. folder에서 이미지를 읽는 방법 : 규칙을 가지는 folder를 생성해서 data(image)를 분할,
- 2. DataFrame에서 이미지를 읽는 방법이 있음
3. keras의 ImageDataGenerator 사용 - 규칙을 가지는 folder를 생성해서 data(image)를 분할
@ Jupyter Notebook
import os, shutil
original_dataset_dir = './data/cat_dog/train'
base_dir = 'data/cat_dog_full'
os.mkdir(base_dir)
train_dir = os.path.join(base_dir,'train').replace('\\','/')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir,'validation').replace('\\','/')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir,'test').replace('\\','/')
os.mkdir(test_dir)
train_cats_dir = os.path.join(train_dir,'cats').replace('\\','/')
os.mkdir(train_cats_dir)
train_dogs_dir = os.path.join(train_dir,'dogs').replace('\\','/')
os.mkdir(train_dogs_dir)
validation_cats_dir = os.path.join(validation_dir,'cats').replace('\\','/')
os.mkdir(validation_cats_dir)
validation_dogs_dir = os.path.join(validation_dir,'dogs').replace('\\','/')
os.mkdir(validation_dogs_dir)
test_cats_dir = os.path.join(test_dir,'cats').replace('\\','/')
os.mkdir(test_cats_dir)
test_dogs_dir = os.path.join(test_dir,'dogs').replace('\\','/')
os.mkdir(test_dogs_dir)
## file 복사 ##
## 고양이와 멍멍이가 각각 12,500개
## train : 7,000
## validation : 3,000
## test : 2,500
fnames = ['cat.{}.jpg'.format(i) for i in range(7000)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname).replace('\\','/')
dst = os.path.join(train_cats_dir, fname).replace('\\','/')
shutil.copyfile(src,dst)
fnames = ['cat.{}.jpg'.format(i) for i in range(7000,10000)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname).replace('\\','/')
dst = os.path.join(validation_cats_dir, fname).replace('\\','/')
shutil.copyfile(src,dst)
fnames = ['cat.{}.jpg'.format(i) for i in range(10000,12500)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname).replace('\\','/')
dst = os.path.join(test_cats_dir, fname).replace('\\','/')
shutil.copyfile(src,dst)
fnames = ['dog.{}.jpg'.format(i) for i in range(7000)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname).replace('\\','/')
dst = os.path.join(train_dogs_dir, fname).replace('\\','/')
shutil.copyfile(src,dst)
fnames = ['dog.{}.jpg'.format(i) for i in range(7000,10000)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname).replace('\\','/')
dst = os.path.join(validation_dogs_dir, fname).replace('\\','/')
shutil.copyfile(src,dst)
fnames = ['dog.{}.jpg'.format(i) for i in range(10000,12500)]
for fname in fnames:
src = os.path.join(original_dataset_dir,fname).replace('\\','/')
dst = os.path.join(test_dogs_dir, fname).replace('\\','/')
shutil.copyfile(src,dst)
# keras가 제공하는 ImageDataGenerator
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
train_dir = './data/cat_dog_full/train'
valid_dir = './data/cat_dog_full/validation'
#------------------------------------------------------------------#
# ImageDataGenerator 생성
train_datagen = ImageDataGenerator(rescale=1/255)
validation_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
train_dir, # target directory
classes=['cats','dogs'], # [0, 1]
target_size=(150,150), # image resize
batch_size=20,
class_mode='binary'
)
validation_generator = validation_datagen.flow_from_directory(
valid_dir, # target directory
classes=['cats','dogs'], # [0, 1]
target_size=(150,150), # image resize
batch_size=20,
class_mode='binary'
)
for x_data, t_data in train_generator:
print(x_data.shape) # (20, 150, 150, 3)
print(t_data.shape) # (20,)
break
figure = plt.figure()
ax = []
for i in range(20):
ax.append(figure.add_subplot(4,5,i+1)) # 4행 5열
for x_data, t_data in train_generator:
print(x_data.shape) # (20, 150, 150, 3)
print(t_data.shape) # (20,)
for idx, img_data in enumerate(x_data): # enumerate : 첫번째 인자를 인덱스로, 나머지는 데이터로
ax[idx].imshow(img_data)
break
plt.tight_layout()
plt.show()
1. 4/11 월 | 2. 4/12 화 | 3. 4/13 수 | 4. 4/14 목 | 5. 4/15 금 |
Deep Learning Perceptron, Nueral Network |
Deep Learning Initialization, ReLU, Drop-out, Early-Stopping |
Deep Learning Image, CNN, Convolution Layer, Channel, Filter, Stride, Padding, Feature Map, Activation Map |
Deep Learning CNN, Feature Extraction, Pooling |
Deep Learning CNN |
6. 4/18 월 | 7. 4/19 화 | 8. 4/20 수 | 9. 4/21 목 | 10. 4/22 금 |
Deep Learning Generator |
Deep Learning 전이학습, AWS |
Deep Learning |