๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

728x90

sklearn

10/12 ์ˆ˜ 1. LogisticRegression sklearn.linear_model ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ ML ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ชจ๋“ˆ ์•ˆ์˜ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๋ชจ๋ธ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ๋Š” ์˜ต์…˜๋“ค์ด ์žˆ๋‹ค. max_iter๋Š” ๊ธฐ๋ณธ 100์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ๋Š”๋ฐ, ๊ฐ„ํ˜น ๋ฐ˜๋ณต ํšŸ์ˆ˜๊ฐ€ ์ ๋‹ค๋Š” ๊ฒฝ๊ณ (ConvergenceWarning: lbfgs failed to converge (status=1))๊ฐ€ ๋œฐ ๋•Œ solver(์ตœ์ ํ™”์— ์‚ฌ์šฉํ•  ์•Œ๊ณ ๋ฆฌ์ฆ˜. default='lbfgs')๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ˆ˜๋ ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐ˜๋ณต ํšŸ์ˆ˜๋ฅผ ๋Š˜๋ ค์ฃผ๋ฉด ๋œ๋‹ค. ๋ณดํ†ต ์ž‘์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์ด์ง„ ๋ถ„๋ฅ˜์ธ ๊ฒฝ์šฐ liblinear๊ฐ€ ์„ฑ๋Šฅ์ด ์ข‹๊ณ , ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ํฌ๊ณ  ๋‹ค์ค‘ ๋ถ„๋ฅ˜์ธ ๊ฒฝ์šฐ lbfgs๊ฐ€ ์ ํ•ฉํ•˜๋‹ค. ์ถœ์ฒ˜ ์ถœ์ฒ˜ 2. ๊ฒฐ์ • ํŠธ๋ฆฌ ๋ชจ๋ธ์˜ ์‹œ๊ฐํ™” Graphviz ํŒจํ‚ค์ง€ .. ๋”๋ณด๊ธฐ
4/8 ๊ธˆ ๊ธˆ์š”์ผ! ๐Ÿฑ‍๐Ÿ ์˜ค๋Š˜์€ Regression์„ ๋๋‚ธ๋‹ค~~ 4/11 ์›”์š”์ผ์€ ๋จธ์‹ ๋Ÿฌ๋‹ ํ•„๋‹ต ํ‰๊ฐ€, 4/17 ์ผ์š”์ผ์€ ์ˆ˜ํ–‰ํ‰๊ฐ€ 4๊ฐ€์ง€ ์ œ์ถœ์ด ์žˆ๋‹ค. ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ๋Š” ์‚ญ์ œํ•˜๊ฑฐ๋‚˜, imputation(๋ณด๊ฐ„, ๋Œ€์ฒด) - ํ‰๊ท ํ™” ๊ธฐ๋ฒ•(๋…๋ฆฝ๋ณ€์ˆ˜๋ฅผ ๋Œ€ํ‘œ๊ฐ’์œผ๋กœ ๋Œ€์ฒด), ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•(์ข…์†๋ณ€์ˆ˜๊ฐ€ ๋Œ€์ƒ. KNN) KNN(K-Nearest Neighbors, K-์ตœ๊ทผ์ ‘ ์ด์›ƒ) : hyperparameter๋Š” k(=1์ผ ๋•Œ ์–ด๋Š ์ •๋„์˜ ์„ฑ๋Šฅ ๋ณด์žฅ)์™€ ๊ฑฐ๋ฆฌ์ธก์ • ๋ฐฉ์‹(์ฃผ๋กœ ์œ ํด๋ผ๋””์•ˆ ์‚ฌ์šฉ) ๋ฐ˜๋“œ์‹œ ์ •๊ทœํ™”๋ฅผ ์ง„ํ–‰ํ•ด์•ผ ํ•จ. ๋ชจ๋“  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ด์•ผ ํ•˜๋ฏ€๋กœ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Œ 1. Logistic Regression + KNN - BMI data import numpy as np import pandas as pd fro.. ๋”๋ณด๊ธฐ
4/7 ๋ชฉ ๋ชฉ์š”์ผ! ์˜ค๋Š˜๋„ Multinomial Classification๋ฅผ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์ œ(MNIST)๋ฅผ ํ†ตํ•ด ๋ฐฐ์šด๋‹ค~ ์†์œผ๋กœ ์“ด ์ˆซ์ž๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ ๋Œ€ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค~ MNIST ์ด๋ฏธ์ง€๋Š” ๊ทธ ์ž์ฒด๊ฐ€ 2์ฐจ์›์ด๊ณ  ๊ทธ๋Ÿฐ ๊ฒŒ ์—ฌ๋Ÿฟ์ด๊ธฐ ๋•Œ๋ฌธ์— 3์ฐจ์›. ์ด๋ฏธ์ง€๋ฅผ 1์ฐจ์›์œผ๋กœ ravel() ํ•ด์•ผ ํ•จ https://www.kaggle.com/competitions/digit-recognizer/data?select=test.csv Digit Recognizer | Kaggle www.kaggle.com Tensorflow Ver. 1.15์€ ๋ฐฐ์šด ์ด๋ก ์„ ์ฝ”๋“œ๋กœ ์ดํ•ดํ•˜๊ธฐ์—๋Š” ์ข‹์ง€๋งŒ ์ฝ”๋“œ๊ฐ€ ๋„ˆ๋ฌด ์–ด๋ ต๋‹ค. 1. Multinomial Classification by Tensorflow Ver. 1.15 - MNIST import nump.. ๋”๋ณด๊ธฐ
4/6 ์ˆ˜ ์ˆ˜์š”์ผ! ์˜ค๋Š˜์€ Multinomial Classification์„ ๋ฐฐ์šด๋‹ค. Linear Regression(์—ฐ์†์ ์ธ ์ˆซ์ž ๊ฐ’ ์˜ˆ์ธก)์ด ๋ฐœ์ „ํ•œ ๊ฒƒ์ด Logistic Regression → Classification(๋ถ„๋ฅ˜๋ฅผ ํŒ๋‹จํ•˜๋Š” ์˜ˆ์ธก) - Binary Classification(์ดํ•ญ๋ถ„๋ฅ˜) - Multinomial Classification(๋‹คํ•ญ๋ถ„๋ฅ˜) Logistic Regression์€ ์ด์ง„ ๋ถ„๋ฅ˜์— ํŠนํ™”๋จ SKlearn์ด ์ œ๊ณตํ•˜๋Š” ๋ถ„๋ฅ˜๊ธฐ์ธ Gradient Descent(๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•)๊ฐ€ ๋ฐœ์ „ํ•œ ํ˜•ํƒœ์ธ SGD Classifier(Stochastic Gradient Descent, ํ™•๋ฅ ์  ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•) 1. Binary Classification - ์œ„์Šค์ฝ˜์‹  ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ by Gradient Descent Cl.. ๋”๋ณด๊ธฐ
4/5 ํ™” ํ™”์š”์ผ! Logistic Regression์„ ํ™œ์šฉํ•ด ๋จธ์‹ ๋Ÿฌ๋‹ ์ง„ํ–‰ ์‹œ ์ฃผ์˜์‚ฌํ•ญ์„ ์•Œ์•„๋ณธ๋‹ค. ์•ž์œผ๋กœ ์šฐ๋ฆฌ๋Š” Classification(์ดํ•ญ๋ถ„๋ฅ˜)์˜ Metrics๋กœ Accuracy๋ฅผ ์‚ฌ์šฉํ•  ์˜ˆ์ •์ด๋‹ค. ๋ชจ๋ธ ํ‰๊ฐ€ ์ „ ๊ณ ๋ คํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ๋“ค 1. learning rate(ํ•™์Šต๋ฅ ) : loss ๊ฐ’์„ ๋ณด๋ฉด์„œ ํ•™์Šต๋ฅ ์„ ์กฐ์ •ํ•ด์•ผ ํ•จ. ๋ณดํ†ต 1์˜ ๋งˆ์ด๋„ˆ์Šค 4์Šน์œผ๋กœ ์žก์Œ ํ•™์Šต๋ฅ ์ด ๋„ˆ๋ฌด ํฌ๋‹ค๋ฉด global minima(W')๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๊ฒŒ ๋จ → OverShooting ๋ฐœ์ƒ ํ•™์Šต๋ฅ ์ด ์•„์ฃผ ์ž‘๋‹ค๋ฉด local minima ์ฐพ๊ฒŒ ๋จ 2. Normalization(์ •๊ทœํ™”) : MinMax Scaling - 0 ~ 1. ์ด์ƒ์น˜์— ๋ฏผ๊ฐํ•จ Standardization - ํ‘œ์ค€ํ™”, Z-Score. ์ƒ๋Œ€์ ์œผ๋กœ ์ด์ƒ์น˜์— ๋‘”๊ฐํ•จ, ๋ชจ๋“  ์นผ๋Ÿผ์—.. ๋”๋ณด๊ธฐ
4/1 ๊ธˆ ๊ธˆ์š”์ผ! ๐Ÿ˜Ž ์–ด์ œ ์ž ๊น ์†Œ๊ฐœํ•œ Logistic Regression์„ ๋ฐฐ์šด๋‹ค~ Linear Regression(์—ฐ์†์ ์ธ ์ˆซ์ž ๊ฐ’ ์˜ˆ์ธก)์ด ๋ฐœ์ „ํ•œ ๊ฒƒ์ด Logistic Regression → Classification(๋ถ„๋ฅ˜๋ฅผ ํŒ๋‹จํ•˜๋Š” ์˜ˆ์ธก) - Binary Classification(์ดํ•ญ๋ถ„๋ฅ˜) - Multinomial Classification(๋‹คํ•ญ๋ถ„๋ฅ˜) ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋Š” ์œ ํ‹ธ๋ฆฌํ‹ฐ ๋ชจ๋“ˆ(mglearn)์„ ์ถ”๊ฐ€์ ์œผ๋กœ ์„ค์น˜ํ•˜์ž conda activate maching_TF15 pip install mglearn conda install์€ ์ด๋ฏธ ์„ค์น˜๋˜์–ด ์žˆ๋Š” ๋ชจ๋“ˆ, ํŒจํ‚ค์ง€์— ๋Œ€ํ•œ Dependency๋ฅผ ๊ณ ๋ คํ•ด์„œ ์ตœ์ ์ธ ๋ฒ„์ „์„ ์„ค์น˜, pip install์€ ๊ทธ๋ƒฅ ๊น”์•„๋ฒ„๋ฆผ Logistic Regression : L.. ๋”๋ณด๊ธฐ
3/30 ์ˆ˜ ์ˆ˜์š”์ผ! ์–ด์ œ ์‚ฌ์šฉํ•œ Ozone data๋ฅผ Python๊ณผ Sklearn์œผ๋กœ Simple Linear Regression(๋‹จ์ˆœ ์„ ํ˜• ํšŒ๊ท€)์„ ๊ตฌํ˜„ํ–ˆ์„ ๋•Œ, ์™œ ๋ชจ์–‘์ด ๋‹ค๋ฅธ์ง€ ์•Œ์•„๋ณด์ž~ ์ด์œ  1. Missing Value(๊ฒฐ์น˜๊ฐ’) ์ฒ˜๋ฆฌ - ์‚ญ์ œ : ์ „์ฒด ๋ฐ์ดํ„ฐ๊ฐ€ 100๋งŒ ๊ฑด ์ด์ƒ์ด๋ฉฐ ๊ฒฐ์น˜๊ฐ’์ด 5% ์ด๋‚ด์ผ ๋•Œ - ๋Œ€์ฒด : ๋Œ€ํ‘œ๊ฐ’์œผ๋กœ ๋Œ€์ฒด(ํ‰๊ท , ์ค‘์œ„, ์ตœ๋Œ€, ์ตœ์†Œ, ์ตœ๋นˆ) ํ˜น์€ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉ(๋” ์ข‹์€ ๋ฐฉ์‹! ๊ฒฐ์น˜๊ฐ’์ด ์ข…์†๋ณ€์ˆ˜์ผ ๋•Œ) ์ด์œ  2. ์ด์ƒ์น˜ ์ฒ˜๋ฆฌ ์ด์ƒ์น˜๋Š” ๊ฐ’์ด ์ผ๋ฐ˜์ ์ธ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์— ๋น„ํ•ด ํŽธ์ฐจ๊ฐ€ ํฐ ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ‰๊ท , ๋ถ„์‚ฐ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นจ → ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ๋‹นํžˆ ๋ถˆ์•ˆํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ์š”์†Œ - ์ง€๋Œ€๊ฐ’ : ๋…๋ฆฝ๋ณ€์ˆ˜(์›์ธ)์— ์žˆ๋Š” ์ด์ƒ์น˜ - Outlier : ์ข…์†๋ณ€์ˆ˜(๊ฒฐ๊ณผ)์— ์žˆ๋Š” ์ด์ƒ์น˜ 1. ์ด์ƒ์น˜.. ๋”๋ณด๊ธฐ

728x90