profile-img
The merit of an action lies in finishing it to the end.
slide-image

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋น„๊ต

- Power, expressibility: ์–ผ๋งˆ๋‚˜ ๋ณต์žกํ•œ ์ž‘์—…์„ ํ•  ์ˆ˜ ์žˆ๋Š๋ƒ

- Interpretability

- Ease of Use

- Training speed

- Prediction speed

  Linear Regression Nearest Neighbor Deep Learning
Power/Expressibility L L H
Interpretability H H L
Ease of Use H H L
Training speed H H L
Prediction speed H L H

cf) ๋”ฅ๋Ÿฌ๋‹์€ Foward Fast๋ฅผ ์ด์šฉํ•œ๋‹ค. Nearest Neighbor๋Š” ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ ์‹œ๊ฐ„์ด ๊ธธ๋‹ค.

 

XOR & Linear Classifier

- Linear Classifier๋Š” XOR ๊ฐ™์€ ๊ฐ„๋‹จํ•œ ๋น„์„ ํ˜•ํ•จ์ˆ˜๋ฅผ ์ ํ•ฉ์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค.

- ๋Œ€์•ˆ: Decision tree, Random forest, Support Vector Machines, Deep Learning

 

Decision Tree Classifier

- root->leaf path๋ฅผ ํ†ต๊ณผํ•˜๋ฉด์„œ ๋ถ„๋ฅ˜๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ

- ํŠธ๋ฆฌ๋Š” ํ•™์Šต ์˜ˆ์‹œ๋“ค์„ ๋น„๊ต์  ๊ท ์ผํ•œ ๊ตฌ์„ฑ์œผ๋กœ ๋ถ„ํ•ด

ํƒ€์ดํƒ€๋‹‰ ์ƒ์กด ๋ชจ๋ธ๋ง ์˜ˆ์‹œ

- Top-down manner๋กœ ๊ตฌ์„ฑ

- m๊ฐœ์˜ ํด๋ž˜์Šค๋“ค์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ •์ œํ•˜๊ธฐ ์œ„ํ•ด 1๊ฐœ์˜ ํ”ผ์ณ/์ฐจ์›์„ ๋”ฐ๋ผ ๋ถ„๋ฆฌ

* pure split: 1๊ฐœ์˜ ๋‹จ์ผ ํด๋ž˜์Šค ๋…ธ๋“œ ์ƒ์„ฑ 

* balanced split: group ํฌ๊ธฐ๊ฐ€ ๋Œ€๋žต์ ์œผ๋กœ ๋น„์Šทํ•˜๋„๋ก ํ•ญ๋ชฉ์„ ๋ถ„๋ฆฌ

Information-Theoretic Entropy

- entropy: class confusion์˜ ์–‘์„ ์ธก์ •

Split Criteria

- information gain

gain(D, A_i) = entropy(D)-entropy_{A_i}(D)

- ์ด์šฉ: ๋ฐ์ดํ„ฐ ์นผ๋Ÿผ๋ณ„๋กœ ๊ณ„์‚ฐํ•ด์„œ, ๊ฐ€์žฅ ๊ฐ’์ด ํฐ ๊ฒƒ์„ ์„ ํƒํ•ด์•ผ ์ž˜ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Œ.

 

Stopping Criteria

- information gain์ด 0์ด ๋  ๋•Œ๊ฐ€ ์•„๋‹ˆ๋ผ, ์ž…์‹ค๋ก ๋ณด๋‹ค ์ž‘๋‹ค๋ฉด ๋ฉˆ์ถฐ๋„ ๋œ๋‹ค. -> ์ด์ •๋„๋ฉด ์ถฉ๋ถ„ํ•˜๋‹ค๋Š” ๋œป

- alternate strategy: full tree๋ฅผ ๋งŒ๋“ค์–ด์„œ low value node๋ฅผ ๊ฐ€์ง€์น˜๊ธฐ ํ•˜๊ธฐ

-> subtree์ค‘ ์˜๋ฏธ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ๋ถ€๋ถ„์„ leaf๋กœ ํ†ต์ผํ•œ ํ›„, ์›๋ž˜ ํŠธ๋ฆฌ์™€ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์—ฌ ์ฑ„ํƒ

 

Decision Tree์˜ ์žฅ์ 

- ๋น„์„ ํ˜•์„ฑ

- categorical variable์„ ์ž˜ ์ ์šฉ

- ์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ

- robustness: ๋‹ค๋ฅธ ํŠธ๋ฆฌ๋“ค๊ณผ์˜ ์•™์ƒ๋ธ”์„ ์ง„ํ–‰ํ•ด์„œ ๋” ๋‚˜์€ ๊ฒƒ์„ voteํ•  ์ˆ˜ ์žˆ์Œ

 

Ensemble Methods

1. Bagging

training

- k๊ฐœ์˜ bootstrap sample ์ƒ์„ฑ

- ๊ฐ๊ฐ์˜ S[i] ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด classifier๋ฅผ ์ƒ์„ฑํ•ด์„œ k๊ฐœ์˜ classifier๋ฅผ ๋งŒ๋“ฆ (๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ด์šฉ)

testing

- k๊ฐœ์˜ classifier๋ฅผ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๋กœ ํˆฌํ‘œํ•ด์„œ ์ƒˆ๋กœ์šด ์‚ฌ๋ก€๋“ค์„ ๋ถ„๋ฅ˜ํ•ด๋ณด๊ธฐ

 

2. Boosting

training

- classifier์˜ sequence๋ฅผ ์ƒ์„ฑ (๊ฐ™์€ base learner ์ด์šฉ)

- ๊ฐ๊ฐ์˜ classifier๋Š” ์ด์ „ classifier์— ์˜์กด์ ์ด๋ฉฐ ๊ทธ๊ฒƒ์˜ ์—๋Ÿฌ๋ฅผ ์ฐพ๋Š” ๋ฐ ์ง‘์ค‘

- ์ด์ „ classifier์—์„œ ์ž˜๋ชป ์˜ˆ์ธก๋œ ์‚ฌ๋ก€๋“ค์€ ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌ

testing

- classifier๋“ค์˜ ์—ฐ์†์œผ๋กœ ํŒ๋‹จ๋œ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ test case์˜ ์ตœ์ข… ํด๋ž˜์Šค๋ฅผ ๋ถ€์—ฌ

 

Random Forest

- Bagging with decision tree + split attribute selection on random subspace

-> learning process์—์„œ ๋‚˜๋‰œ ํ›„๋ณด์ž๋“ค ๊ฐ๊ฐ์„ ์„ ํƒํ•˜์—ฌ ํ•™์Šตํ•œ ๋ณ€ํ˜• ํŠธ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ -> random subset of features

* 1๋‹จ๊ณ„: bootstrapped dataset ์ƒ์„ฑ

* 2๋‹จ๊ณ„: decision tree ์ƒ์„ฑ -> ๊ฐ ๋‹จ๊ณ„์˜ ํ”ผ์ณ์˜ random subset๋งŒ์„ ์ด์šฉํ•œ bootstrapped dataset์„ ์ด์šฉํ•ด์•ผ ํ•จ

ํŠธ๋ฆฌ์˜ ๊ตฌ์„ฑ์ด ๋‹ค์–‘ํ•ด์ง / subset size๋Š” ๋ณดํ†ต sqrt(feature ์ˆ˜)

- ์ƒˆ๋กœ์šด ๋…ธ๋“œ์—์„œ๋„ root์ฒ˜๋Ÿผ ๋žœ๋ค์œผ๋กœ ๋‘๊ฐœ์˜ ๋ณ€์ˆ˜๋ฅผ candidate๋กœ ์„ ํƒ (์ „์ฒด 3๊ฐœ์˜ column์ค‘ 1๊ฐœ๋Š” ๋ฌด์‹œ)

* 3๋‹จ๊ณ„: ๋ฐ˜๋ณต - ๋ฐ˜๋ณต์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ํŠธ๋ฆฌ๋ฅผ ๊ณ„์† ์ƒ์„ฑ

* 4๋‹จ๊ณ„: Inference

- ๊ฐ€์žฅ ํˆฌํ‘œ๋ฅผ ๋งŽ์ด ๋ฐ›์€ ์˜ต์…˜์ด ๋ฌด์—‡์ธ์ง€ ํ™•์ธ

* ์ •ํ™•๋„ ์ธก์ •

- ํ†ต์ƒ์ ์œผ๋กœ ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ 1/3์€ bootstrapped dataset์— ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์Œ

-> ์ด ๋ฐ์ดํ„ฐ(Out-Of-Bag sample)๋กœ validation์„ ์ง„ํ–‰

 

Support Vector Machines

- ๋น„์„ ํ˜•์„ฑ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๋งŒ๋“œ๋Š” ์ค‘์š”ํ•œ ๋ฐฉ๋ฒ•

- 2๊ฐœ์˜ ํด๋ž˜์Šค ์‚ฌ์ด์—์„œ maximum margin linear separator๋ฅผ ์ถ”๊ตฌ

 

SVM vs Logistic Regression

- ๊ณตํ†ต์ : seperating plane

- ์ฐจ์ด์ : LR๋Š” ๋ชจ๋“  ๊ฐ’์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜์ง€๋งŒ, SVM์€ ๊ฒฝ๊ณ„์— ์žˆ๋Š” ์ ๋งŒ ํ™•์ธํ•จ

- SVM: ๊ธฐ๋ณธ์ ์œผ๋กœ ์„ ํ˜•์ ์ด์ง€๋งŒ ๋” ๊ณ ์ฐจ์›์—๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๊ณ ์ฐจ์›์œผ๋กœ์˜ projection

- ์ฐจ์› ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋ฉด ๋ชจ๋“  ๊ฒƒ์„ linearly separableํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

Kernels and non-linear functions

Feature Engineering

- domain-dependent data cleaning์€ ์ค‘์š”ํ•˜๋‹ค

* Z-scores, normalization

* bell-shaped distribution ์ƒ์„ฑ

* missing value๋ฅผ imputing

* ์ฐจ์›์ถ•์†Œ (SVD) -> ๋…ธ์ด์ฆˆ ๋ง๊ณ  ์ค‘์š”ํ•œ, y๊ฐ’์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ์•„์ฃผ ์ž‘์€ ์‹ ํ˜ธ ์ •๋ณด๋“ค์„ ๋ญ‰๊ฐœ๋ฒ„๋ฆด ์ˆ˜ ์žˆ์–ด performance์— ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ๋‹ค.

* non-linear combination์˜ explicit incorporation (ex. products, ratios...)

 

Nerual Networks

 

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +