대학원생이 쉽게 설명해보기

전체 글

[프로그래머스] 루시와 엘라 찾기 2022.06.18
[프로그래머스] 없어진 기록 찾기 2022.06.18
unable to create link(name already exists) 에러 2022.03.02
Numerical Feature Engineering, Gaussian Rank 예제 2022.02.19
Traffic forecasting using graph neural networks and LSTM 2022.02.06
Structured data learning with TabTransformer 2022.01.23
Colab, Mecab 설치 2022.01.11
Binary Accuracy vs Accuracy in TF 2021.11.10
TensorFlow Learning Rate WarmUp Scheduler 구현해보기 2021.08.06
Useful Python Decorator 알아보기 2021.07.31

` PREV 1 ···3 4 5 6 7 8 9 ···46 NEXT

[프로그래머스] 루시와 엘라 찾기

SELECT ANIMAL_ID
    , NAME  
    , SEX_UPON_INTAKE
    
    FROM ANIMAL_INS
    WHERE NAME REGEXP "^(Lucy|Ella|Pickle|Rogan|Sabrina|Mitty)$"
    ORDER BY ANIMAL_ID

'# 코딩 문제 관련 > SQL' 카테고리의 다른 글

[프로그래머스] 헤비 유저가 보유한 장소 (0)	2022.06.18
[프로그래머스] 오랜 기간 보호한 동물 (1), (2) (0)	2022.06.18
[프로그래머스] 있었는데요 없었습니다 (0)	2022.06.18
[프로그래머스] 우유와 요거트가 담긴 장바구니 (0)	2022.06.18
[프로그래머스] 없어진 기록 찾기 (0)	2022.06.18

[프로그래머스] 없어진 기록 찾기

SELECT T1.ANIMAL_ID
    , T1.NAME
    FROM ANIMAL_OUTS T1
        LEFT JOIN ANIMAL_INS T2
            ON T1.ANIMAL_ID = T2.ANIMAL_ID
                AND T1.ANIMAL_TYPE = T2.ANIMAL_TYPE
    WHERE T2.ANIMAL_ID IS NULL

'# 코딩 문제 관련 > SQL' 카테고리의 다른 글

[프로그래머스] 헤비 유저가 보유한 장소 (0)	2022.06.18
[프로그래머스] 오랜 기간 보호한 동물 (1), (2) (0)	2022.06.18
[프로그래머스] 있었는데요 없었습니다 (0)	2022.06.18
[프로그래머스] 우유와 요거트가 담긴 장바구니 (0)	2022.06.18
[프로그래머스] 루시와 엘라 찾기 (0)	2022.06.18

unable to create link(name already exists) 에러

이 에러는 크게 3가지를 체크하면 해결할 수 있습니다.

Custom Layer의 name
Variable의 name
Model의 name

1, 2,번이 주요한 이유이고, 3번은 거의 없음(아니 없다..)

Custom Layer를 stack 한다거나 여러번 사용하는 코드 부분에 name attribute가 할당되는지 확인하거나 아래처럼 변경하면 됩니다.

stacked_layer = [CustomLayer(~, name = f'layer_{i}') for i in range(10)]

그래도 에러가 발생하면 위 2번처럼 아래 코드와 같은 부분이 있는지 확인합니다.

class CustomLayer(Layer):
  def __init__(self, ~):
    a = tf.Variable(shape)
    b = tf.Variable(shape)

name을 다르게 줍니다.

class CustomLayer(Layer):
  def __init__(self, ~):
    a = tf.Variable(shape, name = 'a')
    b = tf.Variable(shape, name = 'b')

'# Machine Learning > @ error 해결' 카테고리의 다른 글

mac os mecab 설치하기 (0)	2021.07.04
NotFoundError: {{function_node __inference_train_function_370086} (0)	2020.08.05
Ubuntu 18.04 "unprotected private key ~" (0)	2020.04.22
(TensorFlow) ImportError: DLL load failed: 지정된 모듈을 찾을 수 없습니다. (0)	2019.12.26
Error: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, ~ (0)	2019.10.14

Numerical Feature Engineering, Gaussian Rank 예제

신경망(Neural Network)뿐 아니라 대부분 모델에서 학습을 수행하기 전에 데이터 값의 범위를 정규화(Normalization), 표준화(Standardization) 하는 것은 매우 중요합니다.

이러한 처리 방법을 사용해서 값의 범위를 비슷하게 만들거나 평균이 1인 정규 분포 형태로 만들어주는 것은 학습 시에 큰 도움이 됩니다.

하지만 현실 데이터에선 값의 범위가 균형적(balanced)이거나 정규분포 형태를 거의 보기 어렵고, 주로 꼬리가 치우친(Skewed or unbalanced) 형태를 띄고 있는데요. Skewed Data는 모델 입력으로 사용하기 전, 전처리 과정(Preprocessing or Engineering)에서 어떤 처리 방법을 사용할지 한번쯤은 고민하게 만드는 골치아픈 데이터입니다.

이때, 기본적으로 세 가지 방법을 후보로 생각해볼 수 있습니다.

Min-Max Scaler; 정규화
Standard Mean/Std Scaler; 표준화
Apply Log-function; 로그화

이 방법들을 통해 성능을 많이 향상시킬 수 있는데, 이뿐 아니라 Gaussian Rank 방법을 추가로 고려해볼 수 있습니다.

Gaussian Rank 방법은 Numeric Feature Distribution을 Normal Distribution으로 변형시켜줍니다.

-1 ~ 1 사이의 값(clipped value)을 순서(sorted)매깁니다.
순서 값에 오차역함수(inverse error function)을 적용하여 마치 Normal Distribution처럼 만듭니다.

Gaussian Rank 방법은 Porto Seguro's Safe Driver Prediction(캐글 대회, 아래 URL 참고) 1등 솔루션 인터뷰를 보면,
Min-Max Scaler나 Standard mean/std Scaler보다 더 좋은 성능으로 가장 좋았다고 나와있습니다.

https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/discussion/44629#250927

Porto Seguro’s Safe Driver Prediction | Kaggle

www.kaggle.com

총 세 가지 구현 방법을 찾게되었는데, 아래 세 개 코드를 참고하시길 바랍니다.

NumPy 구현 - 1 (Github 참조) : https://github.com/affjljoo3581/Job-Recommend-Competition
NumPy 구현 - 2 (Kaggle 참조) : https://www.kaggle.com/tottenham/10-fold-simple-dnn-with-rank-gauss
sklearn - QuantileTransformer 활용 (Kaggle 참조)
https://www.kaggle.com/kushal1506/moa-pytorch-0-01859-rankgauss-pca-nn

< 공통 사용 모듈 Import >

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.special import erfinv
from sklearn.preprocessing import QuantileTransformer

np.random.seed(42)

먼저, 예제로 사용할 더미 데이터를 생성합니다.

# 데이터 생성
X = (np.random.randn(200, 1) * 5 + 10) ** 2
df = pd.DataFrame(X)
df.head()

1. NumPy 구현 - 1

Gaussian Rank 적용

epsilon = 1e-4
noise_scale = 0.001

gaussian_noise = np.random.normal(0, 1, df[df.columns[0]].shape)
transformed_data = df[df.columns[0]] + noise_scale * df[df.columns[0]].max() * gaussian_noise

data_rank = np.argsort(np.argsort(transformed_data))
data_rank = 2 * data_rank / data_rank.max() - 1

clipped_data_rank = np.clip(data_rank, -1 + epsilon, 1 - epsilon)
# 오차역함수 적용
transformed_data = erfinv(clipped_data_rank)

결과 확인

plt.figure(figsize = (10, 5))

plt.subplot(1, 2, 1)
plt.hist(df[df.columns[0]])
plt.title('Before')

plt.subplot(1, 2, 2)
plt.hist(transformed_data)
plt.title('After Gaussian Rank')

plt.tight_layout()
plt.show()

2. NumPy 구현 - 2

함수 정의

def rank_gauss(x):
    N = x.shape[0]
    temp = x.argsort()
    
    rank_x = temp.argsort() / N
    rank_x -= rank_x.mean()
    rank_x *= 2
    
    efi_x = erfinv(rank_x)
    efi_x -= efi_x.mean()
    return efi_x
    
transformed_data = rank_gauss(df[df.columns[0]])

결과 확인

3. sklearn QuantileTransformer 활용

transformer = QuantileTransformer(n_quantiles=100,random_state=0, output_distribution="normal")
transformer.fit(X)

transfored_data = transformer.transform(X)

결과 확인

'# 기타 공부한 것들 > math' 카테고리의 다른 글

BatchNormalization과 립시츠 함수(Lipschitz Function) (0)	2020.08.20
Bayes 정리: 간단 예제로 이해하기 (0)	2020.06.01
균등분포(균일분포, Uniform dist) (0)	2019.02.26
지수분포 (1)	2019.02.22
다항분포(multinomial distribution) (0)	2019.02.14

Traffic forecasting using graph neural networks and LSTM

이 글은 다음 Keras Example을 번역합니다.

https://keras.io/examples/timeseries/timeseries_traffic_forecasting/

Keras documentation: Traffic forecasting using graph neural networks and LSTM

Traffic forecasting using graph neural networks and LSTM Author: Arash Khodadadi Date created: 2021/12/28 Last modified: 2021/12/28 Description: This example demonstrates how to do timeseries forecasting over graphs. View in Colab • GitHub source Introdu

keras.io

Introduction

이 예제는 GNN과 LSTM을 활용하여 교통 상태를 예측해봅니다. 특히, 도로 구역별 체증(traffic speed) 히스토리를 활용해 향후 체증을 예측합니다.

문제를 해결하기 위해 사용되는 주요한 방법은 도로 구역별 체증 상태를 시계열 형태로 두고, 과거 상태를 이용해 미래를 예측하는 것입니다.

하지만 단순 시게열 방법은 인접 도로간 체증 상태를 활용하지 못합니다. 따라서 이웃한(인접한) 도로 체증의 복잡한 연관관계를 활용할 수 있도록 graph로 표현되는 traffic network를 구성하고, 그래프로 표현되는 체증을 활용합니다. 이 예제는 그래프로 이루어진 시계열 데이터를 입력받을 수 있는 모델을 구성합니다. 첫 번째로 어떻게 데이터를 가공하는지 살펴보고, 그래프를 예측하기 위해 tf.data.Dataset을 만듭니다. 그 후, GCN(Graph Convolution Network)와 LSTM으로 구성된 모델을 구현합니다.

데이터 전처리와 모델 구조는 다음 논문을 참조합니다.

Yu, Bing, Haoteng Yin, and Zhanxing Zhu. "Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting." Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018. (github)

Setup

import pandas as pd
import numpy as np
import os
import typing
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Data preparation

Data description

실세계 교통 체증 데이터셋 PeMSD7을 활용합니다. 이 데이터셋은 Yu et al., 2018에서 수집 및 준비되었고, 여기서 활용할 수 있습니다. 데이터셋에 대한 구체적인 설명은 논문을 참조하세요.

이 데이터셋은 두 개의 파일로 구성되어 있습니다.

W_228.csv: 캘리포니아 7구역의 228개 거리
V_228.csv: 2012년 5, 6월 평일의 교통 체증 데이터

Loading Data

url = "https://github.com/VeritasYin/STGCN_IJCAI-18/raw/master/data_loader/PeMS-M.zip"
data_dir = keras.utils.get_file(origin=url, extract=True, archive_format="zip")
data_dir = data_dir.rstrip(".zip")

route_distances = pd.read_csv(
    os.path.join(data_dir, "W_228.csv"), header=None
).to_numpy()
speeds_array = pd.read_csv(os.path.join(data_dir, "V_228.csv"), header=None).to_numpy()

print(f"route_distances shape={route_distances.shape}") # (228, 228)
print(f"speeds_array shape={speeds_array.shape}") # (12672, 228)

sub-sampling roads

크기 문제를 감소시키고, 학습 속도를 증가시키기 위해 228개 도로 중 26개 도로만 사용합니다. 0번 도로부터 시작해 총 25개 도로를 선택합니다. sample_routes에 포함되어 있는 도로들은 각 도로별 상관성을 가지고 있습니다.

sample_routes = [
    0, 1, 4,
    7, 8, 11,
    15, 108, 109,
    114, 115, 118,
    120, 123, 124,
    126, 127, 129,
    130, 132, 133,
    136, 139, 144,
    147, 216,
]
route_distances = route_distances[np.ix_(sample_routes, sample_routes)]
speeds_array = speeds_array[:, sample_routes]

print(f"route_distances shape={route_distances.shape}") # (26, 26)
print(f"speeds_array shape={speeds_array.shape}") # (12672, 26)

Data Visualization

두 개 도로를 선택하여 시각화해봅니다.

plt.figure(figsize=(18, 6))
plt.plot(speeds_array[:, [0, -1]])
plt.legend(["route_0", "route_25"])

각 도로별 상관관계도 시각화할 수 있습니다.

plt.figure(figsize=(8, 8))
plt.matshow(np.corrcoef(speeds_array.T), 0)
plt.xlabel("road number")
plt.ylabel("road number")

Splitting and normalizing data

train_size, val_size = 0.5, 0.2


def preprocess(data_array: np.ndarray, train_size: float, val_size: float):
    """Splits data into train/val/test sets and normalizes the data.

    Args:
        data_array: ndarray of shape `(num_time_steps, num_routes)`
        train_size: A float value between 0.0 and 1.0 that represent the proportion of the dataset
            to include in the train split.
        val_size: A float value between 0.0 and 1.0 that represent the proportion of the dataset
            to include in the validation split.

    Returns:
        `train_array`, `val_array`, `test_array`
    """

    # Split 수행
    num_time_steps = data_array.shape[0]
    num_train, num_val = (
        int(num_time_steps * train_size),
        int(num_time_steps * val_size),
    )
    train_array = data_array[:num_train]
    mean, std = train_array.mean(axis=0), train_array.std(axis=0)

    # Normalize 수행
    train_array = (train_array - mean) / std
    val_array = (data_array[num_train : (num_train + num_val)] - mean) / std
    test_array = (data_array[(num_train + num_val) :] - mean) / std

    return train_array, val_array, test_array


train_array, val_array, test_array = preprocess(speeds_array, train_size, val_size)

print(f"train set size: {train_array.shape}") # (6336, 26)
print(f"validation set size: {val_array.shape}") # (2534, 26)
print(f"test set size: {test_array.shape}") # (3802, 26)

Creating TensorFlow Datasets

예측 문제를 해결하기 위한 Dataset을 만듭니다. 이 예제에서 다룰 데이터는 t+1, t+2, ..., t+T 형태의 시퀀스를 활용해 t+T+1, ..., t+T+h 시점의 미래 체증을 예측합니다. t 시점의 입력값은 N 크기의 T 벡터이며, 예측(또는 타겟)값은 N 크기의 h 벡터입니다. 여기서 N은 도로 수 입니다.

데이터셋을 만들기 위해 Keras에서 제공하는 timeseries_dataset_from_array() 함수를 활용합니다. 또, 아래의 create_tf_dataset() 함수는 numpy.ndarray을 입력받고, tf.data.Dataset을 반환합니다. 이 함수 안에서 input_sequence_length와 forcast_horizon은 각각 위의 T와 h를 의미합니다.

multi_horizon 인자를 추가적으로 설명합니다. forcast_horizon=3으로 가정하고, multi_horizon=True로 설정되어 있으면, 모델은 t+T+1, t+T+2, t+T+3 시점을 예측하고, Target 값의 형태는 (T, 3)이 됩니다. 반대로 multi_horizon=False로 설정되어 있으면, 마지막 시점인 t+T+3 시점만 예측하고 Target 값 형태는 (T, 1)이 됩니다.

여기서 사용되는 input tensor의 형태는 (batch_size, input_sequence_length, num_routes, 1)입니다. 마지막 차원은 모델 입력을 일반화하기 위해 사용했습니다.
예를 들어, 각 도로별 온도를 추가 특성으로서 활용하고 싶다면, 마지막 차원에서 각 도로(num_routes)는 speed, temperature 두 가지 특성을 가질 것이므로 (batch_size, input_sequence_length, num_routes, 2)가 됩니다. 하지만 이번 예제에서는 추가로 다루지 않으며, 항상 1을 유지합니다.

과거 12개 값을 활용해 미래 3개 값을 예측합니다.

from tensorflow.keras.preprocessing import timeseries_dataset_from_array

batch_size = 64
input_sequence_length = 12
forecast_horizon = 3
multi_horizon = False


def create_tf_dataset(
    data_array: np.ndarray,
    input_sequence_length: int,
    forecast_horizon: int,
    batch_size: int = 128,
    shuffle=True,
    multi_horizon=True,
):
    """Creates tensorflow dataset from numpy array.

    This function creates a dataset where each element is a tuple `(inputs, targets)`.
    `inputs` is a Tensor
    of shape `(batch_size, input_sequence_length, num_routes, 1)` containing
    the `input_sequence_length` past values of the timeseries for each node.
    `targets` is a Tensor of shape `(batch_size, forecast_horizon, num_routes)`
    containing the `forecast_horizon`
    future values of the timeseries for each node.

    Args:
        data_array: np.ndarray with shape `(num_time_steps, num_routes)`
        input_sequence_length: Length of the input sequence (in number of timesteps).
        forecast_horizon: If `multi_horizon=True`, the target will be the values of the timeseries for 1 to
            `forecast_horizon` timesteps ahead. If `multi_horizon=False`, the target will be the value of the
            timeseries `forecast_horizon` steps ahead (only one value).
        batch_size: Number of timeseries samples in each batch.
        shuffle: Whether to shuffle output samples, or instead draw them in chronological order.
        multi_horizon: See `forecast_horizon`.

    Returns:
        A tf.data.Dataset instance.
    """

    inputs = timeseries_dataset_from_array(
        np.expand_dims(data_array[:-forecast_horizon], axis=-1),
        None,
        sequence_length=input_sequence_length,
        shuffle=False,
        batch_size=batch_size,
    ) # 개별 입력 형태 = (64, 12, 26, 1)

    target_offset = (
        input_sequence_length
        if multi_horizon
        else input_sequence_length + forecast_horizon - 1
    )
    # multi_horizon이 True이면 forcast_horizon 크기 만큼 Target으로 활용
    target_seq_length = forecast_horizon if multi_horizon else 1
    targets = timeseries_dataset_from_array(
        data_array[target_offset:], # input_sequence_length 이후부터
        None,
        sequence_length=target_seq_length, # target_seq_length 크기만큼
        shuffle=False,
        batch_size=batch_size,
    ) # (64, 3, 26)

    dataset = tf.data.Dataset.zip((inputs, targets))
    if shuffle:
        dataset = dataset.shuffle(100)

    return dataset.prefetch(16).cache()


# (64, 12, 26, 1), (64, 3, 26)을 반환
train_dataset, val_dataset = (
    create_tf_dataset(data_array, input_sequence_length, forecast_horizon, batch_size)
    for data_array in [train_array, val_array]
)

test_dataset = create_tf_dataset(
    test_array,
    input_sequence_length,
    forecast_horizon,
    batch_size=test_array.shape[0],
    shuffle=False,
    multi_horizon=multi_horizon,
)

Roads Graph

PeMSD7 데이터셋은 도로 구역별 거리를 포함하고 있습니다. 이 거리를 활용하여 인접 매트릭스를 만듭니다. 논문에서 사용된 것처럼 두 도로의 거리가 threshold보다 낮다면 그래프의 edge가 존재하는 것으로 구성합니다.

def compute_adjacency_matrix(
    route_distances: np.ndarray, sigma2: float, epsilon: float
):
    """Computes the adjacency matrix from distances matrix.

    It uses the formula in https://github.com/VeritasYin/STGCN_IJCAI-18#data-preprocessing to
    compute an adjacency matrix from the distance matrix.
    The implementation follows that paper.

    Args:
        route_distances: np.ndarray of shape `(num_routes, num_routes)`. Entry `i,j` of this array is the
            distance between roads `i,j`.
        sigma2: Determines the width of the Gaussian kernel applied to the square distances matrix.
        epsilon: A threshold specifying if there is an edge between two nodes. Specifically, `A[i,j]=1`
            if `np.exp(-w2[i,j] / sigma2) >= epsilon` and `A[i,j]=0` otherwise, where `A` is the adjacency
            matrix and `w2=route_distances * route_distances`

    Returns:
        A boolean graph adjacency matrix.
    """
    num_routes = route_distances.shape[0]
    route_distances = route_distances / 10000.0
    w2, w_mask = (
        route_distances * route_distances,
        np.ones([num_routes, num_routes]) - np.identity(num_routes),
    )
    return (np.exp(-w2 / sigma2) >= epsilon) * w_mask

compute_adjacency_matrix() 함수는 boolean 인접 매트릭스를 반환합니다. 이 매트릭스에서 1은 두 node간 edge가 존재, 0은 존재하지 않음을 의미합니다.

class GraphInfo:
    def __init__(self, edges: typing.Tuple[list, list], num_nodes: int):
        self.edges = edges
        self.num_nodes = num_nodes


sigma2 = 0.1
epsilon = 0.5
adjacency_matrix = compute_adjacency_matrix(route_distances, sigma2, epsilon)
node_indices, neighbor_indices = np.where(adjacency_matrix == 1)
graph = GraphInfo(
    edges=(node_indices.tolist(), neighbor_indices.tolist()),
    num_nodes=adjacency_matrix.shape[0],
)
# 26, 150
print(f"number of nodes: {graph.num_nodes}, number of edges: {len(graph.edges[0])}")

Network architecture

모델은 그래프를 예측하기 위해 Graph Convolution Layer와 LSTM Layer로 구성됩니다.

Graph convolution layer

여기서 사용되는 Graph convolution layer 구조는 이 예제와 비슷합니다. 다른 점은 예제에서는 (num_nodes, in_feat) 2D tensor를 입력하지만, 이번 예제에서는 (num_nodes, batch_size, input_seq_length, in_feat) 4D tensor를 입력으로 사용합니다. graph convolution layer는 다음 단계로 계산이 수행됩니다.

노드들의 표현은 input feature에 self.weight가 곱해지면서 self.compute_nodes_representation()에서 계산됩니다.
aggregated neighbors' messages는 집계된 neighbors' 표현에 self.weight를 곱하여 self.compute_aggregated_messages()에서 계산됩니다.
층의 마지막에서 노드 표현과 neighbors' aggregated messages가 결합되면서 self.update()에서 계산됩니다.

위 설명은 아래 코드에서 주석을 참고하여 읽어보면 쉽게 이해됩니다.

class GraphConv(layers.Layer):
    def __init__(
        self,
        in_feat,
        out_feat,
        graph_info: GraphInfo,
        aggregation_type="mean",
        combination_type="concat",
        activation: typing.Optional[str] = None,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.in_feat = in_feat
        self.out_feat = out_feat
        self.graph_info = graph_info
        self.aggregation_type = aggregation_type
        self.combination_type = combination_type
        self.weight = tf.Variable(
            initial_value=keras.initializers.glorot_uniform()(
                shape=(in_feat, out_feat), dtype="float32"
            ),
            trainable=True,
        )
        self.activation = layers.Activation(activation)

    def aggregate(self, neighbour_representations: tf.Tensor):
      # aggregation 방법에 따라 쓰이는 함수가 달라집니다
        aggregation_func = {
            "sum": tf.math.unsorted_segment_sum,
            "mean": tf.math.unsorted_segment_mean,
            "max": tf.math.unsorted_segment_max,
        }.get(self.aggregation_type)

        if aggregation_func:
            return aggregation_func(
                neighbour_representations,
                self.graph_info.edges[0],
                num_segments=self.graph_info.num_nodes,
            )

        raise ValueError(f"Invalid aggregation type: {self.aggregation_type}")

    def compute_nodes_representation(self, features: tf.Tensor):
        """각 노드 표현을 계산합니다.

        노드 표현은 feature tensor에 self.weight가 곱해지면서 만들어집니다.
        self.weight shape는 (in_feat, out_feat).

        Args:
            features: Tensor of shape `(num_nodes, batch_size, input_seq_len, in_feat)`

        Returns:
            A tensor of shape `(num_nodes, batch_size, input_seq_len, out_feat)`
        """
        return tf.matmul(features, self.weight)

    def compute_aggregated_messages(self, features: tf.Tensor):
        neighbour_representations = tf.gather(features, self.graph_info.edges[1])
        aggregated_messages = self.aggregate(neighbour_representations)
        return tf.matmul(aggregated_messages, self.weight)

    def update(self, nodes_representation: tf.Tensor, aggregated_messages: tf.Tensor):
        if self.combination_type == "concat":
            h = tf.concat([nodes_representation, aggregated_messages], axis=-1)
        elif self.combination_type == "add":
            h = nodes_representation + aggregated_messages
        else:
            raise ValueError(f"Invalid combination type: {self.combination_type}.")

        return self.activation(h)

    def call(self, features: tf.Tensor):
        """Forward pass.

        Args:
            features: tensor of shape `(num_nodes, batch_size, input_seq_len, in_feat)`

        Returns:
            A tensor of shape `(num_nodes, batch_size, input_seq_len, out_feat)`
        """
        nodes_representation = self.compute_nodes_representation(features)
        aggregated_messages = self.compute_aggregated_messages(features)
        return self.update(nodes_representation, aggregated_messages)

LSTM plus graph convolution

graph convolution layer를 통과시키면, 노드 표현을 포함한 4D tensor를 output으로 얻습니다. 여기서 각 timestep은 이웃 노드의 정보가 집계(aggregate)된 노드 표현입니다.

좋은 예측을 위해선 이웃 노드의 정보뿐만 아니라 시간에 따른 정보를 처리할 수 있어야 합니다. 이를 위해 node tensor를 recurrent layer에 통과시킵니다. 아래 코드에서 첫 번째로 graph convolution layer에 입력을 넣고나서 LSTM layer를 통과시키는 LSTMGC layer를 볼 수 있습니다.

class LSTMGC(layers.Layer):
    """Layer comprising a convolution layer followed by LSTM and dense layers."""

    def __init__(
        self,
        in_feat,
        out_feat,
        lstm_units: int,
        input_seq_len: int,
        output_seq_len: int,
        graph_info: GraphInfo,
        graph_conv_params: typing.Optional[dict] = None,
        **kwargs,
    ):
        super().__init__(**kwargs)

        # graph conv layer
        if graph_conv_params is None:
            graph_conv_params = {
                "aggregation_type": "mean",
                "combination_type": "concat",
                "activation": None,
            }
        self.graph_conv = GraphConv(in_feat, out_feat, graph_info, **graph_conv_params)

        self.lstm = layers.LSTM(lstm_units, activation="relu")
        self.dense = layers.Dense(output_seq_len)

        self.input_seq_len, self.output_seq_len = input_seq_len, output_seq_len

    def call(self, inputs):
        """Forward pass.

        Args:
            inputs: tf.Tensor of shape `(batch_size, input_seq_len, num_nodes, in_feat)`

        Returns:
            A tensor of shape `(batch_size, output_seq_len, num_nodes)`.
        """

        # convert shape to  (num_nodes, batch_size, input_seq_len, in_feat)
        inputs = tf.transpose(inputs, [2, 0, 1, 3])

        gcn_out = self.graph_conv(
            inputs
        )  # gcn_out has shape: (num_nodes, batch_size, input_seq_len, out_feat)
        shape = tf.shape(gcn_out)
        num_nodes, batch_size, input_seq_len, out_feat = (
            shape[0],
            shape[1],
            shape[2],
            shape[3],
        )

        # LSTM takes only 3D tensors as input
        gcn_out = tf.reshape(gcn_out, (batch_size * num_nodes, input_seq_len, out_feat))
        lstm_out = self.lstm(
            gcn_out
        )  # lstm_out has shape: (batch_size * num_nodes, lstm_units)

        dense_output = self.dense(
            lstm_out
        )  # dense_output has shape: (batch_size * num_nodes, output_seq_len)
        output = tf.reshape(dense_output, (num_nodes, batch_size, self.output_seq_len))
        return tf.transpose(
            output, [1, 2, 0]
        )  # returns Tensor of shape (batch_size, output_seq_len, num_nodes)

Model training

데이터를 다시 살펴보면 아래와 같습니다.

X 데이터는 (64, 12, 26, 1) ~ (batch_size, input_sequence_length(각 도로별 과거 체증값), 도로 노드, 도로 특성)
Y 데이터는 (64, 3, 26) ~ (batch_size, 예측 시점 수, 도로 노드)

in_feat = 1
batch_size = 64
epochs = 20
input_sequence_length = 12
forecast_horizon = 3
multi_horizon = False
out_feat = 10
lstm_units = 64
graph_conv_params = {
    "aggregation_type": "mean",
    "combination_type": "concat",
    "activation": None,
}

st_gcn = LSTMGC(
    in_feat,
    out_feat,
    lstm_units,
    input_sequence_length,
    forecast_horizon,
    graph,
    graph_conv_params,
)
inputs = layers.Input((input_sequence_length, graph.num_nodes, in_feat))
outputs = st_gcn(inputs)

model = keras.models.Model(inputs, outputs)
model.compile(
    optimizer=keras.optimizers.RMSprop(learning_rate=0.0002),
    loss=keras.losses.MeanSquaredError(),
)
model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=epochs,
    callbacks=[keras.callbacks.EarlyStopping(patience=10)],
)

Making forecasts on test set

학습된 모델로 테스트셋을 활용하여 예측 해봅니다. 아래에서 모델 예측값으로 얻은 MAE와 단순(naive) MAE를 비교합니다. 단순 예측는 각 노드의 마지막 값이 사용됩니다.

# (3788, 12, 26, 1), (3788, 1, 26)
x_test, y = next(test_dataset.as_numpy_iterator())
y_pred = model.predict(x_test)
plt.figure(figsize=(18, 6))
plt.plot(y[:, 0, 0])
plt.plot(y_pred[:, 0, 0])
plt.legend(["actual", "forecast"])

# naive는 마지막 값과 비교
naive_mse, model_mse = (
    np.square(x_test[:, -1, :, 0] - y[:, 0, :]).mean(),
    np.square(y_pred[:, 0, :] - y[:, 0, :]).mean(),
)
print(f"naive MAE: {naive_mse}, model MAE: {model_mse}")

LSTMGC block을 쌓으면 좀 더 좋은 결과를 얻을 수 있습니다.

'# Machine Learning > (번역) Keras Code Example' 카테고리의 다른 글

Structured data learning with TabTransformer (0)	2022.01.23
Image classification with modern MLP models (0)	2021.05.29
Classification with Gated Residual and Variable Selection Networks (0)	2021.04.13
Classification with Neural Decision Forests (0)	2021.01.24
A Transformer-based recommendation system (4)	2021.01.10

Structured data learning with TabTransformer

이 글은 다음 Keras Example을 번역합니다.

https://keras.io/examples/structured_data/tabtransformer/

Keras documentation: Structured data learning with TabTransformer

Structured data learning with TabTransformer Author: Khalid Salama Date created: 2022/01/18 Last modified: 2022/01/18 Description: Using contextual embeddings for structured data classification. View in Colab • GitHub source Introduction This example dem

keras.io

Introduction

이 예제는 suvervised, semi-supervised로 활용할 수 있는 TabTransformer를 다룹니다. TabTransformer는 self-attention의 Transformer로 이루어지며, 범주형 특성을 임베딩하는 일반적인 층이 아닌 문맥을 고려할 수 있는 임베딩 층을 사용하여 더 높은 정확도를 달성할 수 있습니다.

이 예제는 TensorFlow 2.7 이상, TensorFlow Addons가 필요합니다.

pip install -U tensorflow-addons

Setup

import math
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_addons as tfa
import matplotlib.pyplot as plt

Prepare the data

이 예제에서는 UC Irvine Machine Learning Repository에서 제공하는 United States Census Income Dataset을 사용합니다. 이 데이터셋은 한 사람이 연간 USD 50,000 이상 벌 가능성이 있는지 여부를 판단하는 이진 분류 문제입니다.

5 numerical feature, 9 categorical feature로 이루어진 48,842 데이터를 포함하고 있습니다.

먼저, 데이터셋을 로드합니다.

CSV_HEADER = [
    "age",
    "workclass",
    "fnlwgt",
    "education",
    "education_num",
    "marital_status",
    "occupation",
    "relationship",
    "race",
    "gender",
    "capital_gain",
    "capital_loss",
    "hours_per_week",
    "native_country",
    "income_bracket",
]

train_data_url = (
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
)
train_data = pd.read_csv(train_data_url, header=None, names=CSV_HEADER)

test_data_url = (
    "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test"
)
test_data = pd.read_csv(test_data_url, header=None, names=CSV_HEADER)

print(f"Train dataset shape: {train_data.shape}") # (32561, 15)
print(f"Test dataset shape: {test_data.shape}") # (16282, 15)

test_data의 첫 번째 행은 검증되지 않은 데이터이므로 제거하고, 레이블에 포함되어 있는 '.'을 제거합니다.

test_data = test_data[1:]
test_data.income_bracket = test_data.income_bracket.apply(
    lambda value: value.replace(".", "")
)

CSV 파일로 저장합니다.

train_data_file = "train_data.csv"
test_data_file = "test_data.csv"

train_data.to_csv(train_data_file, index=False, header=False)
test_data.to_csv(test_data_file, index=False, header=False)

Define dataset metadata

다음은 input feature를 인코딩하고, 처리하기 유용하도록 데이터셋의 메타데이터를 정의합니다.

# NUMERICAL FEATURE 목록입니다
NUMERIC_FEATURE_NAMES = [
    "age",
    "education_num",
    "capital_gain",
    "capital_loss",
    "hours_per_week",
]
# CATEGORICAL FEATURES, VOCABULARY를 모아놓은 DICT입니다
CATEGORICAL_FEATURES_WITH_VOCABULARY = {
    "workclass": sorted(list(train_data["workclass"].unique())),
    "education": sorted(list(train_data["education"].unique())),
    "marital_status": sorted(list(train_data["marital_status"].unique())),
    "occupation": sorted(list(train_data["occupation"].unique())),
    "relationship": sorted(list(train_data["relationship"].unique())),
    "race": sorted(list(train_data["race"].unique())),
    "gender": sorted(list(train_data["gender"].unique())),
    "native_country": sorted(list(train_data["native_country"].unique())),
}
# WEIGHT COLUMN 이름을 정의합니다
WEIGHT_COLUMN_NAME = "fnlwgt"
# CATEGORICAL FEATURE 이름 목록입니다
CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURES_WITH_VOCABULARY.keys())
# INPUT FEATURE의 모든 목록입니다
FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES
# CSV_HEADER에 있는 값이면 [0], 아니면 ['NA']로 채웁니다
COLUMN_DEFAULTS = [
    [0.0] if feature_name in NUMERIC_FEATURE_NAMES + [WEIGHT_COLUMN_NAME] else ["NA"]
    for feature_name in CSV_HEADER
]
# TARGET FEATURE 이름입니다
TARGET_FEATURE_NAME = "income_bracket"
# TARGET FEATURE LABEL 목록입니다
TARGET_LABELS = [" <=50K", " >50K"]

Configure the hyperparameters

모델 구조와 트레이닝 옵션 관련 하이퍼파라미터를 정의합니다.

LEARNING_RATE = 0.001
WEIGHT_DECAY = 0.0001
DROPOUT_RATE = 0.2
BATCH_SIZE = 265
NUM_EPOCHS = 15

NUM_TRANSFORMER_BLOCKS = 3  # transformer block 갯수
NUM_HEADS = 4  # attention head 갯수
EMBEDDING_DIMS = 16  # 임베딩 차원
MLP_HIDDEN_UNITS_FACTORS = [
    2,
    1,
]  # MLP hidden layer unit 갯수
NUM_MLP_BLOCKS = 2  # MLP block 갯수

Implement data reading pipeline

파일을 읽고 처리하는 함수를 정의하고, 훈련 및 평가를 위해 feature와 label을 tf.data.Dataset으로 변환합니다.

target_label_lookup = layers.StringLookup(
    vocabulary=TARGET_LABELS, mask_token=None, num_oov_indices=0
)

# target(label)을 StringLookup 함수에 통과시킵니다
def prepare_example(features, target):
    target_index = target_label_lookup(target)
    weights = features.pop(WEIGHT_COLUMN_NAME)
    return features, target_index, weights


def get_dataset_from_csv(csv_file_path, batch_size=128, shuffle=False):
    dataset = tf.data.experimental.make_csv_dataset(
        csv_file_path,
        batch_size=batch_size,
        column_names=CSV_HEADER,
        column_defaults=COLUMN_DEFAULTS,
        label_name=TARGET_FEATURE_NAME,
        num_epochs=1,
        header=False,
        na_value="?",
        shuffle=shuffle,
    ).map(prepare_example, num_parallel_calls=tf.data.AUTOTUNE, deterministic=False)
    return dataset.cache()

Implement a training and evaluation procedure

def run_experiment(
    model,
    train_data_file,
    test_data_file,
    num_epochs,
    learning_rate,
    weight_decay,
    batch_size,
):

    optimizer = tfa.optimizers.AdamW(
        learning_rate=learning_rate, weight_decay=weight_decay
    )

    model.compile(
        optimizer=optimizer,
        loss=keras.losses.BinaryCrossentropy(),
        metrics=[keras.metrics.BinaryAccuracy(name="accuracy")],
    )

    train_dataset = get_dataset_from_csv(train_data_file, batch_size, shuffle=True)
    validation_dataset = get_dataset_from_csv(test_data_file, batch_size)

    print("Start training the model...")
    history = model.fit(
        train_dataset, epochs=num_epochs, validation_data=validation_dataset
    )
    print("Model training finished")

    _, accuracy = model.evaluate(validation_dataset, verbose=0)

    print(f"Validation accuracy: {round(accuracy * 100, 2)}%")

    return history

Create model inputs

Dictionary 형태로 model input을 구성합니다.

def create_model_inputs():
    inputs = {}
    for feature_name in FEATURE_NAMES:
        if feature_name in NUMERIC_FEATURE_NAMES:
            inputs[feature_name] = layers.Input(
                name=feature_name, shape=(), dtype=tf.float32
            )
        else:
            inputs[feature_name] = layers.Input(
                name=feature_name, shape=(), dtype=tf.string
            )
    return inputs

Encode features

encode_inputs method는 numerical_feature_list와 embedding_dims로 categorical feature를 임베딩한 encoded_categorical_feature_list를 반환합니다.

def encode_inputs(inputs, embedding_dims):

    encoded_categorical_feature_list = []
    numerical_feature_list = []

    for feature_name in inputs:
        if feature_name in CATEGORICAL_FEATURE_NAMES:

            # categorical feature의 vocabulary를 받아옵니다.
            vocabulary = CATEGORICAL_FEATURES_WITH_VOCABULARY[feature_name]

            # vocabulary의 string value를 integer 형태로 변환하고,
            # mask token은 사용하지 않기 떄문에 mask_token은 None으로
            # num_oov_indices는 0으로 설정합니다.
            lookup = layers.StringLookup(
                vocabulary=vocabulary,
                mask_token=None,
                num_oov_indices=0,
                output_mode="int",
            )

            # string input value를 interger 형태로 변환합니다.
            encoded_feature = lookup(inputs[feature_name])

            # Embedding Layer를 정의합니다.
            embedding = layers.Embedding(
                input_dim=len(vocabulary), output_dim=embedding_dims
            )

            # Embedding Layer에 통과시켜 임베딩된 value를 얻습니다.
            encoded_categorical_feature = embedding(encoded_feature)
            encoded_categorical_feature_list.append(encoded_categorical_feature)

        else:

            # numerical feature는 별도의 처리없이 다음과 같이 list에 담습니다.
            numerical_feature = tf.expand_dims(inputs[feature_name], -1)
            numerical_feature_list.append(numerical_feature)

    return encoded_categorical_feature_list, numerical_feature_list

Implement an MLP block

def create_mlp(hidden_units, dropout_rate, activation, normalization_layer, name=None):

    mlp_layers = []
    for units in hidden_units:
        mlp_layers.append(normalization_layer),
        mlp_layers.append(layers.Dense(units, activation=activation))
        mlp_layers.append(layers.Dropout(dropout_rate))

    return keras.Sequential(mlp_layers, name=name)

Experiment 1: a baseline model

첫 번째 실험으로, 간단한 multi-layer feed-forward network를 만듭니다.

def create_baseline_model(
    embedding_dims, num_mlp_blocks, mlp_hidden_units_factors, dropout_rate
):

    # model inputs를 생성합니다.
    inputs = create_model_inputs()
    # categorical, numerical feature를 인코딩합니다.
    encoded_categorical_feature_list, numerical_feature_list = encode_inputs(
        inputs, embedding_dims
    )
    # 모든 feature를 합칩니다.
    features = layers.concatenate(
        encoded_categorical_feature_list + numerical_feature_list
    )
    # features 마지막 차원을 hidden_units 하이퍼파라미터로 사용합니다.
    feedforward_units = [features.shape[-1]]

    # Create several feedforwad layers with skip connections.
    for layer_idx in range(num_mlp_blocks):
        features = create_mlp(
            hidden_units=feedforward_units,
            dropout_rate=dropout_rate,
            activation=keras.activations.gelu,
            normalization_layer=layers.LayerNormalization(epsilon=1e-6),
            name=f"feedforward_{layer_idx}",
        )(features)

    # Compute MLP hidden_units.
    mlp_hidden_units = [
        factor * features.shape[-1] for factor in mlp_hidden_units_factors
    ]
    # Create final MLP.
    features = create_mlp(
        hidden_units=mlp_hidden_units,
        dropout_rate=dropout_rate,
        activation=keras.activations.selu,
        normalization_layer=layers.BatchNormalization(),
        name="MLP",
    )(features)

    # Add a sigmoid as a binary classifer.
    outputs = layers.Dense(units=1, activation="sigmoid", name="sigmoid")(features)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


baseline_model = create_baseline_model(
    embedding_dims=EMBEDDING_DIMS,
    num_mlp_blocks=NUM_MLP_BLOCKS,
    mlp_hidden_units_factors=MLP_HIDDEN_UNITS_FACTORS,
    dropout_rate=DROPOUT_RATE,
)

print("Total model weights:", baseline_model.count_params())
keras.utils.plot_model(baseline_model, show_shapes=True, rankdir="LR")

훈련 및 평가를 수행합니다.

history = run_experiment(
    model=baseline_model,
    train_data_file=train_data_file,
    test_data_file=test_data_file,
    num_epochs=NUM_EPOCHS,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    batch_size=BATCH_SIZE,
)

Experiment 2: TabTransformer

Tabtransformer 구조는 다음과 같습니다.

모든 categorical feature는 동일한 embedding_dims로 category feature embedding됩니다. 각 categorical feature가 고유한 임베딩 벡터를 가지게 됩니다.
categorical feature인 column에 대해 column embedding이 추가됩니다. 예제 모델은 각 column을 표현할 수 있는 Embedding Layer를 추가해서 1번의 categorical feature 임베딩 벡터와 더해줍니다.
임베딩된 categorical feature는 트랜스포머에 입력됩니다. 각 트랜스포머 블럭은 multi-head self-attention layer와 feed-forward layer로 구성됩니다.
categorical feature의 contextual embedding을 담당하는 마지막 Transformer layer에서 numerical feature와 concat을 수행한 뒤, MLP block에 입력됩니다.
1번 실험과 다르게 softmax classifier가 사용됩니다.(?)

이 논문에서 column embedding에 대한 내용을 자세하게 다루며, 모델 구조를 볼 수 있습니다.

모델 구성 순서입니다.

categorical, numerical encoding&embedding → categorical column embedding →
categorical embedding vector + column embedding vector → (Multi-head attention → skip connection →
MLP block → skip connection) → concat with numerical features → MLP block → Classifier

def create_tabtransformer_classifier(
    num_transformer_blocks,
    num_heads,
    embedding_dims,
    mlp_hidden_units_factors,
    dropout_rate,
    use_column_embedding=False,
):

    # model inputs를 생성합니다.
    inputs = create_model_inputs()
    # 각 feature를 인코딩합니다.
    encoded_categorical_feature_list, numerical_feature_list = encode_inputs(
        inputs, embedding_dims
    )
    # categorical feature는 Transformer에 입력하기 위해 stack 합니다.
    # (None, 8, 16)이 됩니다.
    encoded_categorical_features = tf.stack(encoded_categorical_feature_list, axis=1)
    # (None, 5)가 됩니다.
    numerical_features = layers.concatenate(numerical_feature_list)

    # categorical feature embedding에 column embedding을 추가합니다.
    if use_column_embedding:
        num_columns = encoded_categorical_features.shape[1]
        column_embedding = layers.Embedding(
            input_dim=num_columns, output_dim=embedding_dims
        )
        column_indices = tf.range(start=0, limit=num_columns, delta=1)
        # (None, 8, 16) + (8, 16)
        encoded_categorical_features = encoded_categorical_features + column_embedding(
            column_indices
        )

    # Create multiple layers of the Transformer block.
    for block_idx in range(num_transformer_blocks):
        # Create a multi-head attention layer.
        attention_output = layers.MultiHeadAttention(
            num_heads=num_heads,
            key_dim=embedding_dims,
            dropout=dropout_rate,
            name=f"multihead_attention_{block_idx}",
        )(encoded_categorical_features, encoded_categorical_features)
        # Skip connection 1.
        x = layers.Add(name=f"skip_connection1_{block_idx}")(
            [attention_output, encoded_categorical_features]
        )
        # Layer normalization 1.
        x = layers.LayerNormalization(name=f"layer_norm1_{block_idx}", epsilon=1e-6)(x)
        # Feedforward.
        feedforward_output = create_mlp(
            hidden_units=[embedding_dims],
            dropout_rate=dropout_rate,
            activation=keras.activations.gelu,
            normalization_layer=layers.LayerNormalization(epsilon=1e-6),
            name=f"feedforward_{block_idx}",
        )(x)
        # Skip connection 2.
        x = layers.Add(name=f"skip_connection2_{block_idx}")([feedforward_output, x])
        # Layer normalization 2.
        encoded_categorical_features = layers.LayerNormalization(
            name=f"layer_norm2_{block_idx}", epsilon=1e-6
        )(x)

    # Flatten the "contextualized" embeddings of the categorical features.
    categorical_features = layers.Flatten()(encoded_categorical_features)
    # Apply layer normalization to the numerical features.
    numerical_features = layers.LayerNormalization(epsilon=1e-6)(numerical_features)
    # Prepare the input for the final MLP block.
    features = layers.concatenate([categorical_features, numerical_features])

    # Compute MLP hidden_units.
    mlp_hidden_units = [
        factor * features.shape[-1] for factor in mlp_hidden_units_factors
    ]
    # Create final MLP.
    features = create_mlp(
        hidden_units=mlp_hidden_units,
        dropout_rate=dropout_rate,
        activation=keras.activations.selu,
        normalization_layer=layers.BatchNormalization(),
        name="MLP",
    )(features)

    # Add a sigmoid as a binary classifer.
    outputs = layers.Dense(units=1, activation="sigmoid", name="sigmoid")(features)
    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


tabtransformer_model = create_tabtransformer_classifier(
    num_transformer_blocks=NUM_TRANSFORMER_BLOCKS,
    num_heads=NUM_HEADS,
    embedding_dims=EMBEDDING_DIMS,
    mlp_hidden_units_factors=MLP_HIDDEN_UNITS_FACTORS,
    dropout_rate=DROPOUT_RATE,
)

print("Total model weights:", tabtransformer_model.count_params())
keras.utils.plot_model(tabtransformer_model, show_shapes=True, rankdir="LR")

훈련 및 평가를 진행합니다.

history = run_experiment(
    model=tabtransformer_model,
    train_data_file=train_data_file,
    test_data_file=test_data_file,
    num_epochs=NUM_EPOCHS,
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    batch_size=BATCH_SIZE,
)

추가로 아래 예제 마지막 결론을 간단히 해석해보면
TabTransformer는 Embedding이 핵심 아이디어이기 때문에 unlabeled 데이터를 pre-train에 활용할 수 있다고 합니다. 아마 semi-supervised를 표현하는 것 같아 보입니다.

TabTransformer significantly outperforms MLP and recent deep networks for tabular data while matching the performance of tree-based ensemble models. TabTransformer can be learned in end-to-end supervised training using labeled examples. For a scenario where there are a few labeled examples and a large number of unlabeled examples, a pre-training procedure can be employed to train the Transformer layers using unlabeled data. This is followed by fine-tuning of the pre-trained Transformer layers along with the top MLP layer using the labeled data.

'# Machine Learning > (번역) Keras Code Example' 카테고리의 다른 글

Traffic forecasting using graph neural networks and LSTM (0)	2022.02.06
Image classification with modern MLP models (0)	2021.05.29
Classification with Gated Residual and Variable Selection Networks (0)	2021.04.13
Classification with Neural Decision Forests (0)	2021.01.24
A Transformer-based recommendation system (4)	2021.01.10

Colab, Mecab 설치

아래 블로그 코드에서 `22년 1월 11일 기준 에러없이 설치하여 사용했습니다.

블로그 참조이므로 코드는 아래 블로그를 직접 방문하셔서 확인하면 좋을 것 같네요 : )

https://sosomemo.tistory.com/72

Colab 에서 Mecab 사용하기

import os # install konlpy, jdk, JPype !pip install konlpy !apt-get install openjdk-8-jdk-headless -qq > /dev/null !pip3 install JPype1-py3 # install mecab-ko os.chdir('/tmp/') !curl -LO https://bi..

sosomemo.tistory.com

'# 기타 공부한 것들 > etc' 카테고리의 다른 글

[엑셀] INDEX + COUNTA 함수 사용해서 값 자동으로 작성하기 (0)	2023.02.11
(윈도우) jupyter notebook startup 설정하기 (python import 자동화) (1)	2021.07.01
머신러닝을 배우지 않아도 되는 5 가지 이유. (1)	2020.12.01
캐시(페이지) 교체 알고리즘: LRU(Least Recently Used) (0)	2020.08.28
신경망 구조 그려주는 사이트 (0)	2019.12.07

Binary Accuracy vs Accuracy in TF

텐서플로우 2.2 버전부터 Accuracy()/'acc'로 쓰는 정확도의 표현이 match → equal로 Binary Accuracy와 차이를 두었습니다.

(TF ~2.1v) Calculates how often predictions matches labels.
↓
(TF 2.2v~) Calculates how often predictions equals labels.

따라서,

Accuracy: 정확히 일치
(얼마나 같은가; Equal)
Binary_Accuracy: 지정해둔 threshold에 따라 Accuracy를 계산
(얼마나 Match 되는가)

공식 문서에 따르면 Binary Accuracy는 default threshold가 0.5로 지정되어 있습니다.

TF 2.2 이하 버전에서 짜여진 코드를 가지고 최신 버전으로 학습시킬 때,
정확도 점수가 다른 경우 ['acc', 'binary_accuracy']를 확인해보는 것이 좋을 것 같습니다.

acc와 binary_acc의 차이를 알아볼 수 있는 예시입니다.

import tensorflow as tf

y_true = [[1], [1], [0], [0]]
y_pred = [[0.51], [1], [0.49], [0]]

print(tf.keras.metrics.Accuracy()(y_true, y_pred))  # 0.5
print(tf.keras.metrics.BinaryAccuracy()(y_true, y_pred)) # 1.0

'# Machine Learning > TensorFlow Function' 카테고리의 다른 글

tensorflow StringLookUp, 다른 함수 사용해서 구현 (0)	2021.03.17
tf.data.dataset.window 예시 (0)	2020.04.05
tensorflow Loss 함수에 존재하는 from_logits란 (3)	2020.03.06
tf.feature_column에 포함된 여러 함수들 (0)	2019.05.24
tf.image.non_max_suppression (0)	2019.04.16

TensorFlow Learning Rate WarmUp Scheduler 구현해보기

Learning Rate WarmUp은 많은 논문에서 사용하고 있는 유명한 기법입니다.

WarmUp 방법을 통해 학습률은 시간이 지남에 따라 아래 그림처럼 변화합니다.

구현은 아래 두 가지 코드(scheduler, callback 버전)을 참고하시고, decay_fn 등 하이퍼파라미터는 알맞게 변경해서 사용하면 됩니다.

Scheduler 버전

class LRSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, init_lr, warmup_epoch,
                 steps_per_epoch,
                 decay_fn, *,
                 continue_epoch = 0):
        self.init_lr = init_lr
        self.decay_fn = decay_fn
        self.warmup_epoch = warmup_epoch
        self.continue_epoch = continue_epoch
        self.steps_per_epoch = steps_per_epoch
        self.lr = 1e-4 # remove

    def on_epoch_begin(self, epoch):
        epoch = tf.cast(epoch, tf.float64)
        
        global_epoch = tf.cast(epoch + 1, tf.float64)
        warmup_epoch_float = tf.cast(self.warmup_epoch, tf.float64)
        
        lr = tf.cond(
            global_epoch < warmup_epoch_float,
            lambda: tf.cast(self.init_lr * (global_epoch / warmup_epoch_float), tf.float64),
            lambda: tf.cast(self.decay_fn(epoch - warmup_epoch_float), tf.float64)
        )
        self.lr = lr
    
    def __call__(self, step):
        def compute_epoch(step):
            return step // self.steps_per_epoch
        
        epoch = compute_epoch(step)
        epoch = epoch + self.continue_epoch
        
        self.on_epoch_begin(epoch)
        
        return self.lr

def get_steps(x_size, batch_size):
    if x_size / batch_size == 0:
        return x_size // batch_size
    else:
        return x_size // batch_size + 1

# data_size: train_set 크기
data_size = 100000
BATCH_SIZE = 512
EPOCHS = 100
warmup_epoch = int(EPOCHS * 0.1)
init_lr = 0.1
min_lr = 1e-6
power = 1.
    
lr_scheduler = tf.keras.optimizers.schedules.PolynomialDecay(
    initial_learning_rate = init_lr,
    decay_steps = EPOCHS - warmup_epoch,
    end_learning_rate = min_lr,
    power = power
)

# get_steps: epoch당 전체 step 수 계산
lr_schedule = LRSchedule(init_lr, warmup_epoch,
                         steps_per_epoch = get_steps(data_size, BATCH_SIZE),
                         decay_fn = lr_scheduler,
                         continue_epoch = 0)

# 사용 예시
optimizer = tf.keras.optimizers.Adam(learning_rate = lr_schedule)

Callback 버전

class LRSchedule(tf.keras.callbacks.Callback):
    def __init__(self, init_lr, warmup_epoch, decay_fn):
        self.init_lr = init_lr
        self.decay_fn = decay_fn
        self.warmup_epoch = warmup_epoch
        self.lrs = []

    def on_epoch_begin(self, epoch, logs = None):
        global_epoch = tf.cast(epoch + 1, tf.float64)
        warmup_epoch_float = tf.cast(self.warmup_epoch, tf.float64)

        lr = tf.cond(
                global_epoch < warmup_epoch_float,
                lambda: init_lr * (global_epoch / warmup_epoch_float),
                lambda: self.decay_fn(global_epoch - warmup_epoch_float),
                )

        tf.print('learning rate: ', lr)
        tf.keras.backend.set_value(self.model.optimizer.lr, lr)
        
        self.lrs.append(lr)
        
        
epochs = 1000
warmup_epoch = int(epochs * 0.1)
init_lr = 0.1
min_lr = 1e-6
power = 1.
    
lr_scheduler = tf.keras.optimizers.schedules.PolynomialDecay(
    initial_learning_rate = init_lr,
    decay_steps = epochs - warmup_epoch,
    end_learning_rate = min_lr,
    power = power
)

# lr_schedule = LRSchedule(init_lr = init_lr,
#                          warmup_epoch = warmup_epoch,
#                          decay_fn = lr_scheduler)

# for i in range(epochs):
#     lr_schedule.on_epoch_begin(i)

# 사용 예시
model.fit(..., callbacks = [LRSchedule(init_lr = init_lr,
                                      warmup_epoch = warmup_epoch,
                                      decay_fn = lr_scheduler)],
          initial_epoch = 0)

'# Machine Learning > Keras Implementation' 카테고리의 다른 글

Tensorflow Gradient Accumulation 간단 구현 (1)	2021.07.18
Albumentation 사용해서 Augmentation하기 (0)	2020.08.25
Learning Rate: Cosine Annealing (1) (0)	2020.07.31
keras load_model(), 커스텀 객체를 포함한 모델을 로드해보자 (0)	2020.07.10
케라스 layer 시각화하기 (visualization) (0)	2020.03.27

Useful Python Decorator 알아보기

파이썬 문법 중 하나인 데코레이터(Decorator)는 wrapping 방식을 통해 함수를 유연하게 적용하여, 특정 기능을 편리하게 사용하게 해줍니다.

아래처럼 특수기호 @를 붙여서 사용하는 방식이 데코레이터입니다.

@tf.function
def train_step(args):
    pass

글에서는 데코레이터의 개념, 동작방식이 아닌 몇 가지 유용하게 사용되는 또는 사용될만한 데코레이터를 알아보도록 하겠습니다.

@retry

@retry 데코레이터는 특정 예외가 발생한 경우, 설정한 횟수만큼 재시도하는 데코레이터입니다.

일반적인 상황에서는 잘 사용되진 않지만, 네트워크, DB 통신에서 유용하게 사용되는 함수입니다.

retrying 라이브러리나 직접 구현하여 사용하는 방법이 있지만, tenacity 라이브러리가 유명하게 사용됩니다. 아래 코드는 tenacity 공식 문서에 나와있는 예제입니다.

- tenacity 공식 문서: https://tenacity.readthedocs.io/en/latest/

import random
from tenacity import retry

@retry
def do_something_unreliable():
    if random.randint(0, 10) > 1:
        raise IOError("Broken sauce, everything is hosed!!!111one")
    else:
        return "Awesome sauce!"

print(do_something_unreliable())

성공할때까지의 시도, wait 횟수 등도 parameter를 통해 조절할 수 있게 제공해줍니다.

@classmethod

@classmethod는 아래 @staticmethod와 함께 보면 좋을 것 같습니다.

어느정도 자바/C 언어를 경험한 분들이라면 클래스 또는 정적 변수에 익숙할 것입니다. 그 개념 그대로 파이썬에서도 클래스 변수, 클래스 메소드를 사용할 수 있습니다.

단일 클래스나 상속받은 클래스의 변수를 조정할 때 유용하게 사용됩니다.

암묵적으로 @classmethod 데코레이터를 사용하면 첫 번째 인자로 cls 파라미터를 사용합니다. 인스턴스 메소드라 불리우는 self와 비슷한 개념입니다. 우리는 cls 파라미터를 통해 클래스 변수에 접근할 수 있게 됩니다.

위에서 언급하였듯이 @classmethod를 통해 클래스 변수를 조정할 수 있고, upgrade_os 메소드처럼 새로운 생성자를 만들어 줄 수도 있습니다.

class Computer:
    os = 'Linux' # 클래스 변수
    
class Personal_Computer(Computer):
    def __init__(self, c_id, pos):
        self.c_id = c_id
        self.partial_os = pos
        
    @classmethod
    def change_os(cls, this_os):
        if cls.os != this_os:
            cls.os = this_os
        else:
            print(f'{cls.os} Already up-to-date!')
            
    @classmethod
    def upgrade_os(cls, c_id, new_os):
        return cls(c_id, new_os)
            
computer_1 = Personal_Computer('2021', 'window11')
computer_2 = Personal_Computer('2020', 'window10')

# 변경 전, Linux -> Window
print(f'os change Before: {computer_1.os}, {computer_2.os}')
Personal_Computer.change_os(this_os = 'Window')

# 변경 후
print(f'os change After: {computer_1.os}, {computer_2.os}', end = '\n\n')

# upgrade os
print(f'os upgrade Before: {computer_2.partial_os}')
upgraded_computer_2 = computer_2.upgrade_os(computer_1.c_id, 'window11')
print(f'os change After: {upgraded_computer_2.partial_os}')

@staticmethod

정적 메소드입니다. 이 데코레이터를 사용하면 아래 코드처럼 객체 생성없이 바로 클래스를 통해 접근하여 사용이 가능합니다. 유용해보이지만, 많이 사용하지 않는 기능입니다.

class Computer:
    os = 'Linux' # 클래스 변수
    
class Personal_Computer(Computer):      
    ..생략..
    
    @staticmethod
    def print_os():
        print(Computer.os)

Personal_Computer.print_os() # Linux

@property

@property 데코레이터는 getter/setter를 떠올리면 쉽다. 아래 코드처럼 사용 가능합니다.

@property는 getter, @name.setter는 setter 역할을 합니다.

class ERP:
    def __init__(self):
        self._salary = 100
    
    @property
    def salary(self):
        return self._salary
    
    @salary.setter
    def salary(self, value):
        if self._salary < 500:
            self._salary = value
        else:
            raise ValueError('No!')
        
erp = ERP()
print(erp.salary) # 100
erp.salry = 200
print(erp.salry) # 200

물론 파이썬 특성상 이렇게 하지않아도 변수에 자유롭게 접근하거나 새로운 값을 할당할 수 있습니다.
다만 파이썬에서는 언더바('_')를 활용해서 private/protected 의미를 부여하는데요.

'_name' 은 private, '__name'은 protected를 암묵적으로 의미합니다.

언더바를 사용하면 우리가 흔히 사용하는 기능인 객체 메소드 참조(예를 들면 Tab키를 눌러서 어떤 함수가 확인하는)가 되지 않습니다. 참조하려면 언더바를 명시적으로 붙여주어야 합니다.

뿐만 아니라 @name.setter의 기능은 당연히 setter와 같지만, 위의 예제처럼 변수 할당에 조건을 편리하게 걸 수 있다는 장점이 있습니다.

@dataclass

Python 3.7이상 버전부터 사용할 수 있는 @dataclass 데코레이터는 __init__, __eq__, __repr__ 등 함수를 자동으로 등록해주는 기능입니다.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
        
person1 = Person('kim', 10)
person2 = Person('kim', 10)

print(person1.name, person1.age)
print(person1 == person2)

위처럼 별도의 함수를 정의해주지 않아도 바로 사용할 수 있습니다.

@lru_cache

@lru_cache는 메모라이제이션(memorization) 기능을 의미합니다.

이 데코레이터를 사용하면 함수의 반환값을 max_size 크기까지 값을 저장할 수 있습니다.

from functools import lru_cache
import urllib

@lru_cache(maxsize = 32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'https://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'
    
id_check = []
for i in [8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991]:
    pep = get_pep(i)
    
    if i == 8:
        id_check.append(pep)
    
print(get_pep.cache_info()) # 총 hits 3 -> 320 2번, 8 1번 hit
assert id_check[0] is id_check[1] # id 비교, @lru_cache 사용안할 시 에러 발생!

@lru_cache 기능으로 인해 id_check에 들어가는 객체 id가 동일해서(캐시를 이용하기 때문) AssertionError가 발생하지 않습니다

'# 기타 공부한 것들 > 파이썬_etc.' 카테고리의 다른 글

Python DataFrame 전처리 기본 스킬 (explode, melt, cut) (0)	2023.04.16
파이썬 re 모듈을 활용한 정규표현식 간단 설명과 예제 (0)	2021.06.01
파이썬(Python), 알아두면 유용한 방법들 (0)	2021.06.01
파이썬 의존 패키지 복사하기 (0)	2021.05.19
pd.merge에서 join 예제 (0)	2020.03.27

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

전체 글

'# 코딩 문제 관련 > SQL' 카테고리의 다른 글

'# 코딩 문제 관련 > SQL' 카테고리의 다른 글

'# Machine Learning > @ error 해결' 카테고리의 다른 글

1. NumPy 구현 - 1

2. NumPy 구현 - 2

3. sklearn QuantileTransformer 활용

'# 기타 공부한 것들 > math' 카테고리의 다른 글

Introduction

Setup

Data preparation

Network architecture

Model training

Making forecasts on test set

'# Machine Learning > (번역) Keras Code Example' 카테고리의 다른 글

Introduction

Setup

Prepare the data

Define dataset metadata

Configure the hyperparameters

Implement data reading pipeline

Implement a training and evaluation procedure

Create model inputs

Encode features

Implement an MLP block

Experiment 1: a baseline model

Experiment 2: TabTransformer

'# Machine Learning > (번역) Keras Code Example' 카테고리의 다른 글

'# 기타 공부한 것들 > etc' 카테고리의 다른 글

'# Machine Learning > TensorFlow Function' 카테고리의 다른 글

Scheduler 버전

Callback 버전

'# Machine Learning > Keras Implementation' 카테고리의 다른 글

@retry

@classmethod

@staticmethod

@property

@dataclass

@lru_cache

'# 기타 공부한 것들 > 파이썬_etc.' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역