4/28 목_ReduceLROnPlateau, Optimizer(Momentum, AdaGrad, RMSProp)

개인 공부

4/28 목_ReduceLROnPlateau, Optimizer(Momentum, AdaGrad, RMSProp)

Jeon2 2022. 4. 29. 02:05

728x90

1. ReduceLROnPlateau()

모델의 개선이 없을 경우, Learning Rate를 조절하여 모델의 개선을 유도하는 콜백 함수

ReduceLROnPlateau(monitor='val_loss', patience=5, mode='auto', verbose=1, factor=0.5)

# 혹은
model.fit_generator(
    train_generator, epochs=50, validation_data=val_generator,
    callbacks=[ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=10, verbose=1, mode='auto', min_lr=1e-05)])

Arguments

monitor: ReduceLROnPlateau의 기준이 되는 값
rho: Discounting factor for the history/coming gradient. Defaults to 0.9
factor: Learning rate를 얼마나 감소시킬지 정하는 인자 값. 기존 learning rate * factor 값으로 learning rate가 개선됨
patience: 값의 개선이 없을 경우, 최적의 monitor 값을 기준으로 몇 번의 epoch을 진행하고, learning rate를 조절할 지의 인내하는 횟수

다른 콜백 함수들로는 ModelCheckpoint(모델 가중치 중간 저장), EarlyStopping(모델 성능이 개선되지 않을 때 학습 조기 종료)

2. Momentum

SGD와 달리 새로운 변수 v(물리에서 말하는 속도, velocity)를 사용하는 최적화 방법(Optimizer)

Momentum은 '운동량'을 뜻하는 단어로 기울기 방향으로 힘을 받아 물체가 가속되어 공이 구르는 듯한 움직임을 보임

3. AdaGrad(Adaptive Gradient)

업데이트 횟수에 따라 학습률을 조절하는 옵션이 추가된 최적화 방법

즉, 많이 변화하지 않는 변수들은 학습률을 크게하고, 변화하는 변수들에 대해서는 학습률을 적게 설정함

(많이 변화한 변수는 최저값에 근접했을 것이라고 가정하기 때문에 작은 크기로 이동하며 세밀하게 값을 조정함)

4. RMSProp(Root Mean Square Propatation)

모든 기울기를 단순히 더하는 것이 아니라 최신 기울기 정보를 더 크게 반영하는 원리의 최적화 방법

이때 과거의 정보는 약하게 반영하고, 최신의 정보를 크게 반영하기 위해 지수평균이동을 사용함

- RMSprop class

tf.keras.optimizers.RMSprop(
    learning_rate=0.001,
    rho=0.9,
    momentum=0.0,
    epsilon=1e-07,
    centered=False,
    name="RMSprop",
    **kwargs
)

Arguments

learning_rate: A Tensor, floating point value, or a schedule that is a tf.keras.optimizers.schedules.LearningRateSchedule, or a callable that takes no arguments and returns the actual value to use. The learning rate. Defaults to 0.001.
rho: Discounting factor for the history/coming gradient. Defaults to 0.9.
momentum: A scalar or a scalar Tensor. Defaults to 0.0.
epsilon: A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper. Defaults to 1e-7.
centered: Boolean. If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.
name: Optional name prefix for the operations created when applying gradients. Defaults to "RMSprop".
**kwargs: keyword arguments. Allowed arguments are clipvalue, clipnorm, global_clipnorm. If clipvalue (float) is set, the gradient of each weight is clipped to be no higher than this value. If clipnorm (float) is set, the gradient of each weight is individually clipped so that its norm is no higher than this value. If global_clipnorm (float) is set the gradient of all weights is clipped so that their global norm is no higher than this value.

728x90

저작자표시 (새창열림)