'분류 전체보기' 카테고리의 글 목록 (47 Page)

분류 전체보기

Tensorflow 특정 gpu 사용하기 2019.07.22
Dataframe memory 줄이기(IEEE-CIS Fraud Detection) 2019.07.21
백준 10814번(python) 2019.07.19 3
백준 11650번(python) 2019.07.19
python xml parsing(xml.etreeElementTree) 2019.07.16

` PREV 1 ···44 45 46 47 48 49 50 ···92 NEXT

Tensorflow 특정 gpu 사용하기

사용가능한 gpu의 list를 보려면 다음과 같이 입력해서 확인합니다.

후에 출력되는 name부분을 보시면 됩니다.

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

현재 2개의 gpu를 가지고 있다고 생각하면 cpu:0, gpu:0, gpu:1로 나타낼 수 있습니다.

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
# 1번째 gpu를 사용하고 싶다면, 
os.environ["CUDA_VISIBLE_DEVICES"]="1";

더해서

with K.tf.device('/gpu:1'):
	~
    model.fit(~)
    ~

의 방법도 있습니다.

'# Machine Learning > 글 공부' 카테고리의 다른 글

클래스 불균형, UnderSampling & OverSampling (0)	2019.09.06
keras, Model RAM 놓지 않는 현상 (2)	2019.08.20
(Ubuntu) Ndivia driver 삭제, cuda 삭제 (0)	2019.07.11
(Neural Network) train 후 validation data를 어떻게 처리해야 할까? (0)	2019.07.02
Activation function에 맞는 initilization parameters (0)	2019.05.20

Dataframe memory 줄이기(IEEE-CIS Fraud Detection)

https://www.kaggle.com/c/ieee-fraud-detection/data

IEEE-CIS Fraud Detection

Can you detect fraud from customer transactions?

www.kaggle.com

주로 이미지 관련한 competition이 주로 개최되다가 오랜만에 숫자값을 예측하는 대회가 열렸네요.

고객의 거래 데이터의 이상징후를 감지해보는 대회입니다.

대회의 데이터를 받아보면 상당히 많은 데이터로 메모리를 많이 잡아먹습니다. 실제로 불러오는데에도 시간이 많이 소비됩니다.

여러 커널에서는 메모리를 줄이기 위한 여러 코드를 사용하고 있는데요. 이 글에서는 다음 커널을 참조하였습니다.

https://www.kaggle.com/kabure/extensive-eda-and-modeling-xgb-hyperopt

Extensive EDA and Modeling XGB Hyperopt

Using data from IEEE-CIS Fraud Detection

www.kaggle.com

## Function to reduce the DF size
def reduce_mem_usage(df, verbose=True):
    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    start_mem = df.memory_usage().sum() / 1024**2    
    for col in df.columns:
        col_type = df[col].dtypes
        if col_type in numerics:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)    
    end_mem = df.memory_usage().sum() / 1024**2
    if verbose: print('Mem. usage decreased to {:5.2f} Mb ({:.1f}% reduction)'.format(end_mem, 100 * (start_mem - end_mem) / start_mem))
    return df

실제로 데이터의 type들을 출력해보면 대부분 int64, float64로 이루어져있습니다.

실제 데이터는 int64로 type이 잡혀있지만 실제로 데이터의 범위는 int16범위에만 속한다면?

이에 맞게 줄여줘야 메모리 사용을 줄일 수 있지 않을까요??

위 커널(뿐만 아니라 다른 커널까지)은 이 외에 데이터를 다루는데에 여러 시각화 기법과 Modeling 기법을 설명하고 있습니다.

이 커널만 분석하여도 꽤나 많은 정보를 얻으실 것 같네요.

'# 기타 공부한 것들 > 파이썬_etc.' 카테고리의 다른 글

glob, 디렉토리 안의 특정 확장자 찾기 (0)	2019.08.23
데이터 분석 시 쓰이는 plot(코드x, plot별 메소드 이름) (0)	2019.07.23
python xml parsing(xml.etreeElementTree) (0)	2019.07.16
파이썬 asterisk(*) 사용 용도 (0)	2019.06.18
파이썬을 이용한 네이버 크롤링 + 자바스크립트형 (0)	2019.06.15

백준 10814번(python)

Case = int(input())

customer_list = []
for i in range(Case):
    age, name = input().split()
    age = int(age)
    customer_list.append([age, name, i])
customer_list = sorted(customer_list, key = lambda x : (x[0], x[2]))
sorted_customer_list = list(map(lambda x : x[:-1], customer_list))
for customer in sorted_customer_list:
    print(*customer)

'# 코딩 문제 관련 > 파이썬' 카테고리의 다른 글

백준 카카오 코드 페스티벌 예선 15954번(python) (0)	2019.07.30
백준 카카오 코드 페스티벌 예선 15953번(python) (0)	2019.07.30
백준 11650번(python) (0)	2019.07.19
백준 1107번(python) (0)	2019.07.10
백준 1436번(python) (1)	2019.07.08

백준 11650번(python)

Case = int(input())

coord_list = []
for _ in range(Case):
    coord_list.append(list(map(int, input().split())))
coord_list = sorted(coord_list, key = lambda x : (x[0], x[1]))

for coord in coord_list:
    print(*coord)

'# 코딩 문제 관련 > 파이썬' 카테고리의 다른 글

백준 카카오 코드 페스티벌 예선 15953번(python) (0)	2019.07.30
백준 10814번(python) (3)	2019.07.19
백준 1107번(python) (0)	2019.07.10
백준 1436번(python) (1)	2019.07.08
백준 1018번(python) (0)	2019.07.08

python xml parsing(xml.etreeElementTree)

(데이터는 사진과 다릅니다)

import xml.etree.ElementTree as elemTree
tree = elemTree.parse(annotation_list[0])
object_a = tree.find('object')
print(object_a.find('bndbox')[0], object_a.find('bndbox')[0].text)
print(object_a.find('bndbox')[1], object_a.find('bndbox')[1].text)
print(object_a.find('bndbox')[2], object_a.find('bndbox')[2].text)
print(object_a.find('bndbox')[3], object_a.find('bndbox')[3].text)

'# 기타 공부한 것들 > 파이썬_etc.' 카테고리의 다른 글

데이터 분석 시 쓰이는 plot(코드x, plot별 메소드 이름) (0)	2019.07.23
Dataframe memory 줄이기(IEEE-CIS Fraud Detection) (0)	2019.07.21
파이썬 asterisk(*) 사용 용도 (0)	2019.06.18
파이썬을 이용한 네이버 크롤링 + 자바스크립트형 (0)	2019.06.15
Tensorflow object detection api 학습전까지 과정 (0)	2019.05.31

대학원생이 쉽게 설명해보기