tensorflow 2.0 keras Training and evaluation (4)

이글은 다음 문서를 참조합니다.

www.tensorflow.org/guide/keras/train_and_evaluate

(번역은 자력 + 파파고 + 구글 번역기를 사용하였으니, 부자연스럽더라도 양해바랍니다.)

Part 2: Writing your own training & evaluation loops from scratch

fit(), evaluate() 함수를 통한 학습&평가 방식이 아닌 좀 더 low-level을 다루고 싶다면, 매우 간단하게 커스터마이징할 수 있습니다. 그러나 디버깅할때 수많은 노력이 필요할 것입니다.

Using the GradientTape: a first end-to-end example

loss에 대하여 layer의 학습가능한 weight의 gradient를 알고싶다면 GradientTape scope를 정의해야 합니다. optimizer객체를 사용하여 model.trainable_variables를 통해 업데이트되는 gradient를 사용할 수 있습니다.

예제를 살펴보겠습니다.

# Get the model.
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Iterate over epochs.
for epoch in range(3):
  print('Start of epoch %d' % (epoch,))

  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

    # Open a GradientTape to record the operations run
    # during the forward pass, which enables autodifferentiation.
    with tf.GradientTape() as tape:

      # Run the forward pass of the layer.
      # The operations that the layer applies
      # to its inputs are going to be recorded
      # on the GradientTape.
      logits = model(x_batch_train)  # Logits for this minibatch

      # Compute the loss value for this minibatch.
      loss_value = loss_fn(y_batch_train, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
    grads = tape.gradient(loss_value, model.trainable_variables)

    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

Low-level handling of metrics

우리들이 직접 정의한 metric이나 내장된 metric을 쉽게 사용(재사용)할 수 있습니다.

첫 루프에서 메트릭을 초기화합니다.
각 batch 후에 metric.update_state()를 호출합니다.
metric의 현재 값을 알고싶다면 metric.result()를 호출합니다.
(보통 the end of epoch에) metric의 상태를 clear하고 싶을 때 metric.reset_states()를 호출합니다.

SparseCategoricalAccuracy예제를 살펴보겠습니다.

# Get model
inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy()

# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()

# Prepare the training dataset.
batch_size = 64
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

# Prepare the validation dataset.
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(64)


# Iterate over epochs.
for epoch in range(3):
  print('Start of epoch %d' % (epoch,))

  # Iterate over the batches of the dataset.
  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
    with tf.GradientTape() as tape:
      logits = model(x_batch_train)
      loss_value = loss_fn(y_batch_train, logits)
    grads = tape.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Update training metric.
    train_acc_metric(y_batch_train, logits)

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

  # Display metrics at the end of each epoch.
  train_acc = train_acc_metric.result()
  print('Training acc over epoch: %s' % (float(train_acc),))
  # Reset training metrics at the end of each epoch
  train_acc_metric.reset_states()

  # Run a validation loop at the end of each epoch.
  for x_batch_val, y_batch_val in val_dataset:
    val_logits = model(x_batch_val)
    # Update val metrics
    val_acc_metric(y_batch_val, val_logits)
  val_acc = val_acc_metric.result()
  val_acc_metric.reset_states()
  print('Validation acc: %s' % (float(val_acc),))

Low-level handling of extra losses

이전에 우리는 call method에서 self.add_loss(value)를 통하여 regularization loss를 사용할 수 있다는 것을 보았습니다.

일반적으로, custom 학습 루프에 이러한 loss를 고려하게 될 것입니다.(모델을 직접 작성하여 나만의 loss를 만들고 싶은 경우)

class ActivityRegularizationLayer(layers.Layer):

  def call(self, inputs):
    self.add_loss(1e-2 * tf.reduce_sum(inputs))
    return inputs

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
# Insert activity regularization as a layer
x = ActivityRegularizationLayer()(x)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, activation='softmax', name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

logits = model(x_train)

모델을 만들고,

logits = model(x_train[:64])
print(model.losses)

forward pass동안 만들어지는 loss는 model.losses에 저장됩니다.

추적된 loss는 모델의 __call__을 시작할 때 clear되기 떄문에, 한 번의 forward pass 동안 발생한 손실만 볼 수 있을 것이다. 예를 들어 모델을 반복적으로 호출한 다음 loss를 쿼리하면 마지막에 발생한 최신 loss만 알 수 있습니다.

logits = model(x_train[:64])
logits = model(x_train[64: 128])
logits = model(x_train[128: 192])
print(model.losses)

[<tf.Tensor: id=999851, shape=(), dtype=float32, numpy=6.88884>]

학습하는 동안 발생하는 모든 loss를 고려하고 싶다면, 학습 loop에서 total_loss에 sum(model.losses)를 추가해야합니다.

optimizer = keras.optimizers.SGD(learning_rate=1e-3)

for epoch in range(3):
  print('Start of epoch %d' % (epoch,))

  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
    with tf.GradientTape() as tape:
      logits = model(x_batch_train)
      loss_value = loss_fn(y_batch_train, logits)

      # Add extra losses created during this forward pass:
      loss_value += sum(model.losses)

    grads = tape.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Log every 200 batches.
    if step % 200 == 0:
        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
        print('Seen so far: %s samples' % ((step + 1) * 64))

'# Machine Learning > TensorFlow doc 정리' 카테고리의 다른 글

tensorflow 2.0 keras Writing layers and models with tf keras (2) (0)	2019.04.13
tensorflow 2.0 keras Writing layers and models with tf keras (1) (0)	2019.04.13
tensorflow 2.0 keras Training and evaluation (3) (1)	2019.04.09
tensorflow 2.0 keras Training and evaluation (2) (0)	2019.04.09
tensorflow 2.0 keras Training and evaluation (1) (0)	2019.04.08

대학원생이 쉽게 설명해보기