profile-img
The merit of an action lies in finishing it to the end.
slide-image

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋น„๊ต

- Power, expressibility: ์–ผ๋งˆ๋‚˜ ๋ณต์žกํ•œ ์ž‘์—…์„ ํ•  ์ˆ˜ ์žˆ๋Š๋ƒ

- Interpretability

- Ease of Use

- Training speed

- Prediction speed

  Linear Regression Nearest Neighbor Deep Learning
Power/Expressibility L L H
Interpretability H H L
Ease of Use H H L
Training speed H H L
Prediction speed H L H

cf) ๋”ฅ๋Ÿฌ๋‹์€ Foward Fast๋ฅผ ์ด์šฉํ•œ๋‹ค. Nearest Neighbor๋Š” ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ ๋•Œ๋ฌธ์— ์—ฐ์‚ฐ ์‹œ๊ฐ„์ด ๊ธธ๋‹ค.

 

XOR & Linear Classifier

- Linear Classifier๋Š” XOR ๊ฐ™์€ ๊ฐ„๋‹จํ•œ ๋น„์„ ํ˜•ํ•จ์ˆ˜๋ฅผ ์ ํ•ฉ์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค.

- ๋Œ€์•ˆ: Decision tree, Random forest, Support Vector Machines, Deep Learning

 

Decision Tree Classifier

- root->leaf path๋ฅผ ํ†ต๊ณผํ•˜๋ฉด์„œ ๋ถ„๋ฅ˜๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ

- ํŠธ๋ฆฌ๋Š” ํ•™์Šต ์˜ˆ์‹œ๋“ค์„ ๋น„๊ต์  ๊ท ์ผํ•œ ๊ตฌ์„ฑ์œผ๋กœ ๋ถ„ํ•ด

ํƒ€์ดํƒ€๋‹‰ ์ƒ์กด ๋ชจ๋ธ๋ง ์˜ˆ์‹œ

- Top-down manner๋กœ ๊ตฌ์„ฑ

- m๊ฐœ์˜ ํด๋ž˜์Šค๋“ค์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ •์ œํ•˜๊ธฐ ์œ„ํ•ด 1๊ฐœ์˜ ํ”ผ์ณ/์ฐจ์›์„ ๋”ฐ๋ผ ๋ถ„๋ฆฌ

* pure split: 1๊ฐœ์˜ ๋‹จ์ผ ํด๋ž˜์Šค ๋…ธ๋“œ ์ƒ์„ฑ 

* balanced split: group ํฌ๊ธฐ๊ฐ€ ๋Œ€๋žต์ ์œผ๋กœ ๋น„์Šทํ•˜๋„๋ก ํ•ญ๋ชฉ์„ ๋ถ„๋ฆฌ

Information-Theoretic Entropy

- entropy: class confusion์˜ ์–‘์„ ์ธก์ •

Split Criteria

- information gain

gain(D, A_i) = entropy(D)-entropy_{A_i}(D)

- ์ด์šฉ: ๋ฐ์ดํ„ฐ ์นผ๋Ÿผ๋ณ„๋กœ ๊ณ„์‚ฐํ•ด์„œ, ๊ฐ€์žฅ ๊ฐ’์ด ํฐ ๊ฒƒ์„ ์„ ํƒํ•ด์•ผ ์ž˜ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Œ.

 

Stopping Criteria

- information gain์ด 0์ด ๋  ๋•Œ๊ฐ€ ์•„๋‹ˆ๋ผ, ์ž…์‹ค๋ก ๋ณด๋‹ค ์ž‘๋‹ค๋ฉด ๋ฉˆ์ถฐ๋„ ๋œ๋‹ค. -> ์ด์ •๋„๋ฉด ์ถฉ๋ถ„ํ•˜๋‹ค๋Š” ๋œป

- alternate strategy: full tree๋ฅผ ๋งŒ๋“ค์–ด์„œ low value node๋ฅผ ๊ฐ€์ง€์น˜๊ธฐ ํ•˜๊ธฐ

-> subtree์ค‘ ์˜๋ฏธ๊ฐ€ ๊ฑฐ์˜ ์—†๋Š” ๋ถ€๋ถ„์„ leaf๋กœ ํ†ต์ผํ•œ ํ›„, ์›๋ž˜ ํŠธ๋ฆฌ์™€ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์—ฌ ์ฑ„ํƒ

 

Decision Tree์˜ ์žฅ์ 

- ๋น„์„ ํ˜•์„ฑ

- categorical variable์„ ์ž˜ ์ ์šฉ

- ์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ ๋†’์Œ

- robustness: ๋‹ค๋ฅธ ํŠธ๋ฆฌ๋“ค๊ณผ์˜ ์•™์ƒ๋ธ”์„ ์ง„ํ–‰ํ•ด์„œ ๋” ๋‚˜์€ ๊ฒƒ์„ voteํ•  ์ˆ˜ ์žˆ์Œ

 

Ensemble Methods

1. Bagging

training

- k๊ฐœ์˜ bootstrap sample ์ƒ์„ฑ

- ๊ฐ๊ฐ์˜ S[i] ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด classifier๋ฅผ ์ƒ์„ฑํ•ด์„œ k๊ฐœ์˜ classifier๋ฅผ ๋งŒ๋“ฆ (๊ฐ™์€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ด์šฉ)

testing

- k๊ฐœ์˜ classifier๋ฅผ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๋กœ ํˆฌํ‘œํ•ด์„œ ์ƒˆ๋กœ์šด ์‚ฌ๋ก€๋“ค์„ ๋ถ„๋ฅ˜ํ•ด๋ณด๊ธฐ

 

2. Boosting

training

- classifier์˜ sequence๋ฅผ ์ƒ์„ฑ (๊ฐ™์€ base learner ์ด์šฉ)

- ๊ฐ๊ฐ์˜ classifier๋Š” ์ด์ „ classifier์— ์˜์กด์ ์ด๋ฉฐ ๊ทธ๊ฒƒ์˜ ์—๋Ÿฌ๋ฅผ ์ฐพ๋Š” ๋ฐ ์ง‘์ค‘

- ์ด์ „ classifier์—์„œ ์ž˜๋ชป ์˜ˆ์ธก๋œ ์‚ฌ๋ก€๋“ค์€ ๋” ๋†’์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌ

testing

- classifier๋“ค์˜ ์—ฐ์†์œผ๋กœ ํŒ๋‹จ๋œ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ test case์˜ ์ตœ์ข… ํด๋ž˜์Šค๋ฅผ ๋ถ€์—ฌ

 

Random Forest

- Bagging with decision tree + split attribute selection on random subspace

-> learning process์—์„œ ๋‚˜๋‰œ ํ›„๋ณด์ž๋“ค ๊ฐ๊ฐ์„ ์„ ํƒํ•˜์—ฌ ํ•™์Šตํ•œ ๋ณ€ํ˜• ํŠธ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ -> random subset of features

* 1๋‹จ๊ณ„: bootstrapped dataset ์ƒ์„ฑ

* 2๋‹จ๊ณ„: decision tree ์ƒ์„ฑ -> ๊ฐ ๋‹จ๊ณ„์˜ ํ”ผ์ณ์˜ random subset๋งŒ์„ ์ด์šฉํ•œ bootstrapped dataset์„ ์ด์šฉํ•ด์•ผ ํ•จ

ํŠธ๋ฆฌ์˜ ๊ตฌ์„ฑ์ด ๋‹ค์–‘ํ•ด์ง / subset size๋Š” ๋ณดํ†ต sqrt(feature ์ˆ˜)

- ์ƒˆ๋กœ์šด ๋…ธ๋“œ์—์„œ๋„ root์ฒ˜๋Ÿผ ๋žœ๋ค์œผ๋กœ ๋‘๊ฐœ์˜ ๋ณ€์ˆ˜๋ฅผ candidate๋กœ ์„ ํƒ (์ „์ฒด 3๊ฐœ์˜ column์ค‘ 1๊ฐœ๋Š” ๋ฌด์‹œ)

* 3๋‹จ๊ณ„: ๋ฐ˜๋ณต - ๋ฐ˜๋ณต์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ํŠธ๋ฆฌ๋ฅผ ๊ณ„์† ์ƒ์„ฑ

* 4๋‹จ๊ณ„: Inference

- ๊ฐ€์žฅ ํˆฌํ‘œ๋ฅผ ๋งŽ์ด ๋ฐ›์€ ์˜ต์…˜์ด ๋ฌด์—‡์ธ์ง€ ํ™•์ธ

* ์ •ํ™•๋„ ์ธก์ •

- ํ†ต์ƒ์ ์œผ๋กœ ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ 1/3์€ bootstrapped dataset์— ๋‚˜ํƒ€๋‚˜์ง€ ์•Š์Œ

-> ์ด ๋ฐ์ดํ„ฐ(Out-Of-Bag sample)๋กœ validation์„ ์ง„ํ–‰

 

Support Vector Machines

- ๋น„์„ ํ˜•์„ฑ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ๋งŒ๋“œ๋Š” ์ค‘์š”ํ•œ ๋ฐฉ๋ฒ•

- 2๊ฐœ์˜ ํด๋ž˜์Šค ์‚ฌ์ด์—์„œ maximum margin linear separator๋ฅผ ์ถ”๊ตฌ

 

SVM vs Logistic Regression

- ๊ณตํ†ต์ : seperating plane

- ์ฐจ์ด์ : LR๋Š” ๋ชจ๋“  ๊ฐ’์— ๋Œ€ํ•ด์„œ ํ‰๊ฐ€ํ•˜์ง€๋งŒ, SVM์€ ๊ฒฝ๊ณ„์— ์žˆ๋Š” ์ ๋งŒ ํ™•์ธํ•จ

- SVM: ๊ธฐ๋ณธ์ ์œผ๋กœ ์„ ํ˜•์ ์ด์ง€๋งŒ ๋” ๊ณ ์ฐจ์›์—๋„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๊ณ ์ฐจ์›์œผ๋กœ์˜ projection

- ์ฐจ์› ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๋ฉด ๋ชจ๋“  ๊ฒƒ์„ linearly separableํ•˜๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

Kernels and non-linear functions

Feature Engineering

- domain-dependent data cleaning์€ ์ค‘์š”ํ•˜๋‹ค

* Z-scores, normalization

* bell-shaped distribution ์ƒ์„ฑ

* missing value๋ฅผ imputing

* ์ฐจ์›์ถ•์†Œ (SVD) -> ๋…ธ์ด์ฆˆ ๋ง๊ณ  ์ค‘์š”ํ•œ, y๊ฐ’์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ์•„์ฃผ ์ž‘์€ ์‹ ํ˜ธ ์ •๋ณด๋“ค์„ ๋ญ‰๊ฐœ๋ฒ„๋ฆด ์ˆ˜ ์žˆ์–ด performance์— ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ๋‹ค.

* non-linear combination์˜ explicit incorporation (ex. products, ratios...)

 

Nerual Networks

 

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +

Nearest Neighbor Classification

- ์–ด๋–ค training example์ด target์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด์ง€ ํ™•์ธํ•ด์„œ class label์„ ๋ถ™์ด๋Š” ๊ฒƒ

- distance function์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ •์˜ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”

- ์žฅ์ : simplicity, interpretability, non-linearity

- k-nearest neighbor

 

Distance Metrics

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ๊ฒฝ์šฐ

- d(x, y) >= 0 for all x, y (positivity)

- d(x, y) = 0 iff x = y (identity)

- d(x, y) = d(y, x) (symmetry)

- d(x, y) <= d(x, z) + d(z, y) (triangle inequality)

 

Not a Metric

๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ž์—ฐ ์œ ์‚ฌ์„ฑ ์ธก์ • ๊ฐ’๋“ค์€ distance metric์ด ์•„๋‹ˆ๋‹ค.

- correlation coefficient (-1~1)

- cosine similarity, dot product

- mutual information measures

- cheapest airfare 

 

Euclidean Distance Metric

c_i : weighted sum

- ์ตœ์†Œํ•œ ์ •๊ทœํ™”๋ฅผ ํ†ตํ•ด ์ฐจ์›์„ ๋น„๊ต๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ด์•ผ ํ•จ

 

L_k Distance Norms

- ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋ฅผ ์ผ๋ฐ˜ํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹ ์‚ฌ์šฉ

- k=1 ์ผ ๋•Œ, Manhattan distance metric

- k=∞ ์ผ ๋•Œ, maximum component 

- k๋Š” largest and total dimensional difference ์‚ฌ์ด์˜ tradeoff๋ฅผ ์กฐ์ ˆ

p1(2, 0), p2(1.5, 1.5) ์ค‘ (0, 0)์—์„œ ๋” ๋จผ ์ ์€?

- k=1์ผ ๋•Œ, k=2์ผ ๋•Œ, k=∞์ผ ๋•Œ ๋จผ ์ ์ด ๋‹ค๋ฅด๋‹ค.

- distance metric์€ ์–ด๋–ค ์ ์ด ๋” ๊ฐ€๊นŒ์šด์ง€ ์„ค์ •

 

Circles for different k

- L_k circle์˜ ๋ชจ์–‘์€ ์›์ ์— ๋Œ€ํ•ด ์–ด๋– ํ•œ ์ ์ด ๋™๋“ฑํ•œ ๊ฒƒ๋“ค์ธ์ง€ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ

 

Projections from higher dimensions

- Projection method (ex.SVD): ํ‘œํ˜„ ๋ณต์žก๋„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ฐจ์›์„ ์ถ•์†Œ

- nearest neighbor๋Š” ์›๋ž˜ ๊ณต๊ฐ„์— ๋น„ํ•ด ํ™•๊ณ ํ•ด์งˆ ๊ฒƒ

 

Regression / Interpolation by NN

- NN์ด๋ผ๋Š” ์ปจ์…‰์„ function interpolation์— ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

- k๊ฐœ์˜ ๊ทผ์ ‘ ์ ์˜ ๊ฐ’์˜ ํ‰๊ท ์„ ๋‚ด๋Š” ๋ฐฉ์‹

- weighted average scheme๋Š” (1) distance rank, (2) actual distances์— ์˜ํ•ด ๋”ฐ๋ผ ์ ๋“ค์„ ๋‹ค๋ฅด๊ฒŒ ๊ฐ’์„ ๋งค๊ธธ ์ˆ˜ ์žˆ๋‹ค.

- ๋ถ„๋ฅ˜์—๋„ ๋น„์Šทํ•œ ๋ฐฉ์‹์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Gender Classification by Height/Weight

- k๊ฐ€ ์ปค์ง€๋ฉด ๋” ๋งค๋„๋Ÿฌ์šด ๊ตฌ๋ถ„์„ ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ

 

Seeking good analogies

- ๋งŽ์€ ์ง€์‹์˜ ๋ถ„์•ผ๋Š” analogy(๋น„์œ )์— ๊ธฐ์ดˆํ•ด ์žˆ๋‹ค.

* Law: ์–ด๋–ค ๋ฒ•์  ํŒ๋ก€๊ฐ€ ์ด ์‚ฌ๋ก€์™€ ๋น„์Šทํ•œ๊ฐ€?

* Medicine: ๋น„์Šทํ•œ ์ฆ์ƒ์„ ๊ฐ€์กŒ๋˜ ํ™˜์ž๋ฅผ ๊ณผ๊ฑฐ์— ์–ด๋–ป๊ฒŒ ์น˜๋ฃŒํ–ˆ๊ณ , ๊ทธ ํ™˜์ž๊ฐ€ ์‚ด์•„๋‚จ์•˜๋Š”๊ฐ€?

* Real Estate: ์ด์›ƒ ์ง€์—ญ์— ๋น„๊ต ๊ฐ€๋Šฅํ•œ ์ž์‚ฐ์ด ์–ด๋Š์ •๋„ ๊ฐ€๊ฒฉ์— ํŒ”๋ ธ๋Š”๊ฐ€?

 

Finding Nearest Neighbors

- n๊ฐœ์˜ ์ ์ด ์ฃผ์–ด์ง„ d์ฐจ์›์—์„œ NN์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์€ O(nd)์˜ ์‹œ๊ฐ„๋ณต์žก๋„๋ฅผ ๊ฐ€์ง„๋‹ค.

- training set์ด ์ปค์ง€๊ฑฐ๋‚˜ ์ฐจ์›์ด ์ปค์ง€๋ฉด ์‹œ๊ฐ„์ด ๊ต‰์žฅํžˆ ๋งŽ์ด ๋“ ๋‹ค.

-> grid indices, kd-trees, Voronoi diagrams, locality sensitive hashing ๋“ฑ์„ ์ด์šฉ

 

Voronoi Diagrams / Kd-trees

- Voronoi Diagrams: ๊ฐ€์žฅ ๊ทผ์ ‘ํ•œ ์ด์›ƒ์„ ๊ณต์œ ํ•˜๋Š” ์ง€์—ญ์œผ๋กœ ๊ณต๊ฐ„์„ ๋ถ„ํ• 

- kd-tree ๋“ฑ์€ ์ €์ฐจ์›์—์„œ ๊ฐ€์žฅ ๊ทผ์ ‘ํ•œ ์ด์›ƒ์„ ์ฐพ๋Š” ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

- ์ •ํ™•ํ•œ NN search๋Š” ์ถฉ๋ถ„ํžˆ ๊ณ ์ฐจ์›์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” linear research๋กœ ์ถ•์†Œ๋˜์–ด์•ผ ํ•œ๋‹ค.

 

Grid files

- ์ผ์ •ํ•œ ๊ฐ„๊ฒฉ์œผ๋กœ ๋‚˜๋‰˜์–ด์ง„ ๊ฒฉ์ž์— ์ฐํžŒ ์ ์„ ๋ฒ„์ผ“ํŒ… (Bucketting) ํ•˜๋Š” ๊ฒƒ์€ ์œ ์‚ฌ์„ฑ์œผ๋กœ ์ ๋“ค์„ ๊ทธ๋ฃนํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž„

- ์ฐจ์›์ด ์ฆ๊ฐ€ํ•˜๋ฉด index๋Š” expensiveํ•ด์ง„๋‹ค. -> ์ฃผ๋ณ€ ๊ฒฉ์ž๋“ค๋„ ๋ชจ๋‘ ํƒ์ƒ‰ํ•ด์•ผ ํ•˜๋ฏ€๋กœ, exponentialํ•˜๊ฒŒ ์ฆ๊ฐ€ํ•  ์ˆ˜๋ฐ–์— ์—†์Œ

์™ผ: Voronoi Diagram / ์˜ค: Grid file

Locality Sensitive Hashing

- Hashing์€ ์ด์›ƒํ•˜๋Š” ์ ์ด ๊ฐ™์€ bucket์œผ๋กœ hash ๋˜์—ˆ์„ ๋•Œ NN search๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ์ง„ํ–‰๋  ์ˆ˜ ์žˆ๋„๋ก ํ•จ

* hashing: ๊ฒ€์ƒ‰์„ ๋น ๋ฅด๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด hash key ๊ฐ’์„ ๊ฐ€์ ธ์™€์„œ ๋น ๋ฅด๊ฒŒ ์ฐพ๋„๋ก ํ•จ

- normal hashing์€ distance bucket์œผ๋กœ ์œ ์‚ฌํ•œ ์ ๋“ค์„ ํผ๋œจ๋ฆผ

- Locality Sensitive Hashing (LSH): ์ ์ด๋‚˜ ๋ฒกํ„ฐ a, b๋ฅผ ๊ฐ€์ ธ์™€์„œ a๊ฐ€ b์™€ ์ธ์ ‘ํ•˜๋ฉด h(a)=h(b)์ผ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•จ.

 

LSH for points on a sphere

1) ์›์ ์„ ๊ฐ€๋กœ์ง€๋ฅด๋Š” ๋žœ๋ค ๋ฉด์„ ์„ ํƒ

2) ์„œ๋กœ ์ด์›ƒํ•ด ์žˆ๋‹ค๋ฉด ๋‘ ์ ์€ ๊ทธ ๋ฉด์˜ ๊ฐ™์€ ๋ฉด์— ์กด์žฌํ•  ๊ฒƒ์ž„. (์™ผ์ชฝ ๋˜๋Š” ์˜ค๋ฅธ์ชฝ)

3) d๊ฐœ์˜ ๋žœ๋ค ๋ฉด์— ๋Œ€ํ•œ L/R ํŒจํ„ด์€ d-bit LSH hash code๋ฅผ ์ƒ์„ฑ

์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด์ž! ํŒŒ๋ž€ ์ƒ‰ ์ ์— ๋Œ€ํ•ด์„œ ์ฝ”๋“œ๋ฅผ ์‚ดํŽด๋ณผ ๊ฒƒ.

1๋ฒˆ ์„ ์„ ๊ทธ์—ˆ์„ ๋•Œ, ํ•ด๋‹น ์ ์€ ์˜ค๋ฅธ์ชฝ์— ์žˆ์–ด์„œ 0

2๋ฒˆ ์„ ์„ ๊ทธ์—ˆ์„ ๋•Œ ํ•ด๋‹น ์ ์€ ์™ผ์ชฝ์— ์žˆ์–ด์„œ 1

3๋ฒˆ ์„ ์€ ์˜ค๋ฅธ์ชฝ์ด๋ฏ€๋กœ 0

4๋ฒˆ ์„ ์€ ์™ผ์ชฝ์ด๋ฏ€๋กœ 1 --> ์ตœ์ข… LSH ์ฝ”๋“œ๋Š” 0101

 

Network data

  vertices edges
social network people friendships
WWW pages hyperlinks
Product/customer networks product, customer - bipartite graph sales
genetic networks genes interactions

 

Point Sets and Graphs

- Point set: graph๋ฅผ ์ •์˜ํ•จ -> x, y ์ ์ด ์„œ๋กœ ์ด์›ƒํ•˜๋ฉด (x, y) edge๋ฅผ ์ถ”๊ฐ€ / threshold ๊ธฐ์ค€์œผ๋กœ ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€๊นŒ์šฐ๋ฉด ๊ธ‹๊ธฐ

- graph: point set์„ ์ •์˜ํ•จ -> ์ธ์ ‘ ํ–‰๋ ฌ์˜ SVD๋ฅผ ์ˆ˜ํ–‰

ex) 2์ฐจ์› SVD๋ฅผ ์ง„ํ–‰ํ•˜์—ฌ, ์ฃผ์„ฑ๋ถ„ 1/2๊นŒ์ง€ ๋ฝ‘์•„๋‚ด๊ธฐ

 

Classical Graph Algorithms

- ๋‘ vertices ์‚ฌ์ด์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋Š” ๊ณง ๊ทธ๋ž˜ํ”„์—์„œ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋œ๋‹ค.

- ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋ฅผ ์ฐพ๋Š” ํด๋ž˜์‹ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜, connected components, spanning trees, cuts, flows, matchings, topological sorting ๋“ฑ์ด ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ

 

PageRank

- PageRank(v, G): G์—์„œ์˜ random walk๊ฐ€ vertex v์— ๋ฉˆ์ถ˜๋‹ค๋Š” ๋œป

- ๋ฐ˜๋ณต์ ์œผ๋กœ ์ •์˜๋จ

- ์‹ค์ œ ์ƒํ™ฉ์—์„œ๋Š” ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•จ.

- ์˜ˆ) ์ค‘์š”ํ•œ ์นœ๊ตฌ๊ฐ€ ํ‘œ๋ฅผ ํฌ์†Œํ•˜๊ฒŒ ๋‚˜๋ˆ ์ค„ ๋•Œ -> ๊ณ„์† ์—…๋ฐ์ดํŠธ๊ฐ€ ๋˜์–ด ๋ณ€ํ•˜์ง€ ์•Š๊ฒŒ ๋จ (์ˆ˜๋ ด)

- centrality, importance ๋“ฑ์„ ์ธก์ •ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉ๋จ

- ์˜ˆ) ์œ„ํ‚คํ”ผ๋””์•„ ํŽ˜์ด์ง€

- PageRank์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์˜์‚ฌ ๊ฒฐ์ •

* ๊ด€๋ จ์„ฑ์ด ๋–จ์–ด์ ธ๋ณด์ด๋Š” vertices/edges ํŽธ์ง‘

* outdegree๊ฐ€ 0์ธ vertices ๋‹ค๋ฃจ๊ธฐ

* random jump์„ ํ—ˆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ: damping factor

 

Clustering

- ์ •์˜: ์œ ์‚ฌ๋„๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ ์„ ๊ทธ๋ฃนํ™”ํ•˜๋Š” ๋ฌธ์ œ

- ์š”์†Œ๋“ค์ด ์ ์€ ์ˆ˜์˜ source๋กœ๋ถ€ํ„ฐ ์œ ๋ž˜ํ•œ ์ƒํ™ฉ์— ์ด๋Ÿฌํ•œ origin์„ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

- ill-defined problem: ๋ณด๋Š” ์‚ฌ๋žŒ์— ๋”ฐ๋ผ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ธฐ์ค€์ด ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์Œ. ์ฃผ๊ด€์ ์ธ ๋ถ€๋ถ„

- ์˜ˆ์‹œ: ์œ ์ „์ž ๋ฐ์ดํ„ฐ ๊ทธ๋ฃนํ™”

- ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ 

* hypothesis development: ๋‚ด ๋ฐ์ดํ„ฐ์— ๊ตฌ๋ณ„๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์žˆ๋Š”๊ฐ€?

* Modeling over smaller groups: ๊ฐ๊ฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ์— ๋Œ€ํ•ด ๊ตฌ๋ถ„๋˜๋Š” ์˜ˆ์ธก ๋ชจ๋ธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ

* Data reduction: ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ centroid๋ฅผ ์ด์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋Œ€์ฒด/๋Œ€ํ‘œ

* Outlier detection: ์–ด๋–ค ํ•ญ๋ชฉ์ด ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ค‘์‹ฌ์œผ๋กœ๋ถ€ํ„ฐ ๋ฉ€๋ฆฌ ๋–จ์–ด์ ธ์žˆ๊ฑฐ๋‚˜ ์•„์ฃผ ์ž‘์€ ํด๋Ÿฌ์Šคํ„ฐ์— ๊ฐ‡ํ˜€์žˆ๋Š”๊ฐ€?

 

K-means Clustering

- ์ •์˜: k๊ฐœ์˜ ์ ์„ ์ค‘์‹ฌ์œผ๋กœ ์ •์˜ -> ๋ชจ๋“  ํ•ญ๋ชฉ๋“ค์„ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ค‘์‹ฌ์— ํ• ๋‹น -> ์ค‘์‹ฌ์„ ๋‹ค์‹œ ๊ณ„์‚ฐ -> ์ด ๊ณผ์ •์„ ๊ณ„์† ๋ฐ˜๋ณตํ•˜์—ฌ ์ถฉ๋ถ„ํžˆ ์•ˆ์ •๋˜๋„๋ก ํ•จ

- ๋ฌธ์ œ์ : local optima์— ๊ฐ‡ํž ์œ„ํ—˜์ด ์žˆ์Œ

 

Centroids or Center Points?

- Centroid: color, gender ๋“ฑ ์ˆซ์ž๋กœ ์ •์˜๋˜์ง€ ์•Š์€ ํŠน์„ฑ์— ๋Œ€ํ•ด ๊ทธ๋ฃนํ™”๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ๋Š” ๋ช…ํ™•ํ•˜์ง€ ์•Š๋‹ค.

* ๋ณดํ†ต ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์—์„œ๋Š” one-hot encoding์œผ๋กœ ์ธ์ฝ”๋”ฉ ์ง„ํ–‰

- center๋กœ input example ์ค‘ ๊ฐ€์žฅ ๊ฐ€์šด๋ฐ์— ๊ฐ€๊นŒ์šด ์ ์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์€ ์šฐ๋ฆฌ๊ฐ€ ์˜๋ฏธ์žˆ๋Š” ๊ฑฐ๋ฆฌ ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜๊ธฐ๋งŒ ํ•œ๋‹ค๋ฉด k-means๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋œป์ด๋‹ค.

 

How Many Clusters?

- ์˜ฌ๋ฐ”๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐœ์ˆ˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ง„ํ–‰ํ•˜๊ธฐ ์ „์—๋Š” ์•Œ ์ˆ˜ ์—†๋‹ค.

- ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋”ํ•ด๊ฐˆ ๋•Œ, ์ ๊ณผ ์ค‘์‹ฌ ์‚ฌ์ด์˜ MSE ๊ฐ’์€ ์ ์ง„์ ์œผ๋กœ ๊ฐ์†Œํ•ด์•ผ ํ•œ๋‹ค.

- ์˜ฌ๋ฐ”๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ์ดˆ๊ณผํ•˜๊ฒŒ ๋˜๋ฉด MSE ๊ฐ’์ด ๊ฐ์†Œํ•˜๋Š” ์†๋„๊ฐ€ ๋Š๋ ค์ง„๋‹ค.

elbow method๋กœ ์ตœ์  ํด๋Ÿฌ์Šคํ„ฐ ๊ฐœ์ˆ˜ ๊ตฌํ•˜๊ธฐ

K-means์˜ ํ•œ๊ณ„

- nested cluster, long thin cluster์— ๋Œ€ํ•ด์„œ ์ข‹์ง€ ์•Š์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„

- ๋ฐ˜๋ณต์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์€ local optima๋ฅผ ํ”ผํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•จ (random seed ๊ฐ’์„ ๋‹ฌ๋ฆฌ ํ•จ)

 

Expectation Maximization (EM)

- EM ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๊ฐ€ K-means

- E-step: ์ถ”์ •๋˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ์— ์ ์„ ํ• ๋‹น

- M-step: ํ• ๋‹น์„ ์ด์šฉํ•˜์—ฌ parameter ์ถ”์ •์„ ํ–ฅ์ƒ์‹œํ‚ด

 

Agglomerative Clustering

- ์ด๋Ÿฐ bottom up ๋ฐฉ์‹์€ ๋ฐ˜๋ณต์ ์œผ๋กœ 2๊ฐœ์˜ ๊ทผ์ ‘ํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ณ‘ํ•ฉ์‹œํ‚ด

 

๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ

- ๋ณดํ†ต Average, Centroid ๋ฐฉ์‹์„ ์ฑ„ํƒ

 

Linkage Criteria

Advantages of Cluster Hierachies

- ํด๋Ÿฌ์Šคํ„ฐ ๋ฐ ์„œ๋ธŒํด๋Ÿฌ์Šคํ„ฐ์˜ ์กฐ์งํ™”

- ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ณผ์ • ์‹œ๊ฐํ™”

- ์ƒˆ๋กœ์šด ํ•ญ๋ชฉ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ํšจ์œจ์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ์Œ

 

์–ด๋–ค ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ• ๊นŒ?

- ์˜ฌ๋ฐ”๋ฅธ ๊ฑฐ๋ฆฌ ํ•จ์ˆ˜ ์‚ฌ์šฉ

- ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ณ€์ˆ˜๋ฅผ ์ •๊ทœํ™”

- ์ตœ์ข… ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹œ๊ฐํ™”ํ•ด์„œ ์ œ๋Œ€๋กœ ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +

Linear Regression

- n๊ฐœ์˜ ์ ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ€์žฅ ๊ทผ์‚ฌ๋ฅผ ์ž˜ ํ•˜๊ฑฐ๋‚˜ ์ž˜ ๋งž๋Š” ์ง์„ ์„ ์ฐพ๋Š” ๊ฒƒ

 

Error in Linear Regression

- residual error: ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด 

- Least squares regression์€ ๋ชจ๋“  ์ ์˜ ์ž”์ฐจ์˜ ํ•ฉ์„ ์ตœ์†Œํ™”ํ•จ.

-> nice closed form, ๋ถ€ํ˜ธ ๋ฌด์‹œํ•˜๋ฏ€๋กœ ์„ ํƒ๋จ

 

Contour plots - gradient descent

 

Linear function์„ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ 

- ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์›€

- default model์— ์ ํ•ฉ

* ์ผํ•œ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๊ธ‰์—ฌ๊ฐ€ ์ฆ๊ฐ€ / ์ง€์—ญ์ด ์ปค์ง์œผ๋กœ์จ ์„ ํ˜•์ ์œผ๋กœ ์ง‘๊ฐ’์ด ์ƒ์Šน / ๋จน์€ ์Œ์‹์— ๋”ฐ๋ผ ๋ชธ๋ฌด๊ฒŒ๊ฐ€ ์„ ํ˜•์ ์œผ๋กœ ์ฆ๊ฐ€

 

๋ณ€์ˆ˜๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์ผ ๋•Œ

- ๊ฐ๊ฐ์˜ x_n ๋ณ€์ˆ˜๋“ค๊ณผ y๊ฐ’์„ ํ–‰๋ ฌ๋กœ ๋‚˜ํƒ€๋‚ด์–ด ์„ธํƒ€ ๊ฐ’์— ๋Œ€ํ•œ ํ–‰๋ ฌ์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋” ๋‚˜์€ ํšŒ๊ท€ ๋ชจ๋ธ

1. ์ด์ƒ์น˜ ์ œ๊ฑฐ

- ์ž”์ฐจ์˜ quadratic weight ๋•Œ๋ฌธ์— ์ด์ƒ์น˜๋Š” ํšŒ๊ท€ ๋ชจ๋ธ์˜ fit์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค.

- ์ด๋Ÿฌํ•œ ์ž”์ฐจ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์€ ๋” ์ ํ•ฉํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. 

 

2. nonlinear function fitting

- ๊ธฐ๋ณธ์ ์œผ๋กœ Linear regression์€ ์ง์„ ์ด์ง€๋งŒ, x^2, sqrt(x) ๋“ฑ์„ ์ด์šฉํ•˜๋ฉด ๊ณก์„ ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

- ์ž„์˜๋กœ polynomial, exponential, logarithm ๋“ฑ์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

cf) ๋”ฅ๋Ÿฌ๋‹์€ raw feature์—์„œ ์Šค์Šค๋กœ ์›ํ•˜๋Š” ๊ฒƒ์„ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ์–ด feature engineering์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ์ ๋‹ค. ์ตœ๊ทผ์—๋Š” prompt engineering์„ ์ค‘์š”ํ•˜๊ฒŒ ์—ฌ๊ธด๋‹ค.

 

3. feature/target scaling

- ๋„“์€ ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ฒŒ ๋˜๋ฉด coefficient๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ์ปค์งˆ ์ˆ˜ ์žˆ๋‹ค.

- Z-score ๋“ฑ์œผ๋กœ ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

- power law์ด ์ ์šฉ๋˜๋Š” ์ˆ˜์ž… ๋“ฑ์˜ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ํŠนํžˆ ์ค‘์š”ํ•˜๋‹ค.

- x๊ฐ’์„ log(x), sqrt(x) ๋“ฑ์œผ๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‹ค.

- feature๊ฐ€ ์ •๊ทœ๋ถ„ํฌ ํ˜•ํƒœ๋ผ๋ฉด, power law distribution์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ๋Š” linearํ•œ ์กฐํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๊ธฐ ์–ด๋ ต๋‹ค.

- Z normalization์œผ๋กœ ๋ณ€ํ˜•๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•œ ํ›„, ๊ฒฐ๊ณผ๋Š” ์›๋ž˜ ์ƒํƒœ๋กœ ๋Œ๋ ค๋‘” ํ›„ ๋‚˜ํƒ€๋‚ด๋ฉด ๋œ๋‹ค.

 

4. highly correlated variable ์ œ๊ฑฐ

- ๋‘ ๋ฐ์ดํ„ฐ๊ฐ€ ์„œ๋กœ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋†’๋‹ค๋ฉด ๋” ์ด์ƒ ์šฐ๋ฆฌ์—๊ฒŒ ์ค„ ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๊ฐ€ ์—†๋‹ค. ์˜คํžˆ๋ ค ํ˜ผ๋ž€์„ ๊ฐ€์ค‘์‹œํ‚ด.

--> ๋”ฐ๋ผ์„œ ์ œ๊ฑฐํ•ด๋„ ๋œ๋‹ค.

- covariance matrix๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ œ๊ฑฐํ•ด์•ผ ํ•˜๋Š” feature๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

 

Closed form solution์˜ ๋ฌธ์ œ

- ์„ธํƒ€ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํฐ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์—ฐ์‚ฐ ์†๋„๊ฐ€ ์—„์ฒญ ๋Š๋ ค์ง„๋‹ค. - O(n^3)

- linear algebra๋Š” ๋‹ค๋ฅธ ๊ณต์‹์— ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

- gradient descent ๋ฐฉ์‹์„ ์„ ํƒํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค.

 

Lines in Parameter Space

- error function J๋Š” convexํ•˜๋‹ค.

 

Gradient Descent Search

- convex: 1๊ฐœ์˜ local/global minima ๋ฅผ ๊ฐ–๋Š” ๊ณต๊ฐ„

- convexํ•œ ๊ณต๊ฐ„์—์„œ๋Š” minima๋ฅผ ์ฐพ๊ธฐ ์‰ฝ๋‹ค. -> ๊ทธ๋ƒฅ ๊ฒฝ์‚ฌ๋ฅผ ๋”ฐ๋ผ์„œ ๋‚ด๋ ค๊ฐ€๊ธฐ๋งŒ ํ•˜๋ฉด ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

- ์–ด๋–ค ์ ์—์„œ ๋‚ด๋ ค๊ฐ€๋Š” ๋ฐฉํ–ฅ์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์€, ๋ฏธ๋ถ„์„ ํ•ด์„œ tangent line์„ ๋”ฐ๋ผ ๊ฐ€๋ฉด ๋จ

--> (x+dx, f(x+dx))์ ์„ ์ฐพ์€ ํ›„, (x, f(x)) ์ ์— fit

 

Batch Gradient Descent

- Batch: ๊ฐ๊ฐ์˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•์—์„œ ๋ชจ๋“  training sample์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ

- ํ†ต์ƒ์ ์œผ๋กœ๋Š” batch size๋ฅผ ์ค„์—ฌ๊ฐ€๋ฉฐ ๊ฒฝ์‚ฌํ•˜๊ฐ•

 

Local Optima

- J๊ฐ€ convex๊ฐ€ ์•„๋‹ˆ๋ผ๋ฉด, ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ๋”ฐ๋ผ ๊ฐ”์„ ๋•Œ, Local optima์— ๋น ์ ธ๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋‹ค.

 

Effect of Learning Rate / Step Size

- ๋„ˆ๋ฌด ์ž‘์€ ์Šคํ…์œผ๋กœ ์›€์ง์ด๋ฉด optima์— convergenceํ•˜๋Š” ์†๋„๊ฐ€ ๋Šฆ๋‹ค.

- ๋„ˆ๋ฌด ํฐ ์Šคํ…์œผ๋กœ ์›€์ง์ด๋ฉด ๋ชฉํ‘œ์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๋‹ค.

- ์ ์ ˆํ•œ step size๋ฅผ ๊ตฌํ•˜๋ ค๋ฉด?

-> step size๊ฐ€ ์ ์ ˆํ•œ์ง€ ํŒ๋‹จํ•˜๊ณ , ๋„ˆ๋ฌด ๋Šฆ๋‹ค๋ฉด multiplicative factor (3์˜ ์ง€์ˆ˜๋ฐฐ ๋“ฑ๋“ฑ) ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋Š˜๋ ค๋ณด๊ธฐ

-> ๋„ˆ๋ฌด ํฌ๋‹ค๋ฉด (1/3์˜ ์ง€์ˆ˜๋ฐฐ ๋“ฑ) ์ค„์—ฌ๋ณด๊ธฐ

 

Stochastic Gradient Descent

- batch size๋„ hyperparameter์ด๋‹ค.

- ๋ชจ๋“  example์ด ์•„๋‹Œ ์ผ๋ถ€๋งŒ ์ด์šฉํ•˜์—ฌ derivative๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ๋„ ๋ฐฉ๋ฒ•

 

Regulation

- J ํ•จ์ˆ˜์— coefficient๊ฐ€ ์ž‘๊ฒŒ ์œ ์ง€๋˜๋„๋ก ๋žŒ๋‹ค ๊ฐ’์„ ์ถ”๊ฐ€

- ๋žŒ๋‹ค ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด error๋Š” ๊ฐ์†Œํ•˜๊ณ , ๋ฌดํ•œ๋Œ€์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด thetha_0๋งŒ ์‹์— ๋‚จ๊ฒŒ ๋œ๋‹ค.

- ๋ฐ์ดํ„ฐ์— ์ตœ๋Œ€ํ•œ ๊ฐ€๊น๊ฒŒ ์‹์„ ๋งŒ๋“ค๋ฉด error๋Š” ๊ฐ์†Œํ•˜์ง€๋งŒ, ์œ„ ๊ณต์‹์—์„œ ํŒŒ๋ž€ ๋ถ€๋ถ„์€ ์ปค์ง„๋‹ค.

 

Interpreting/Penalizing Coefficients

- Squared coefficient์˜ ํ•ฉ์„ Penalizing ํ•˜๋Š” ๊ฒƒ์€ ridge regression or Tikhonov regularization

- coefficient์˜ ์ ˆ๋Œ“๊ฐ’์„ penalizingํ•˜๋Š” ๊ฒƒ์€ LASSO regression์ด๋‹ค.

* L1 metric

* L2: ๊ฐ ์ฐจ์›์— ๋Œ€ํ•œ ์ œ๊ณฑ์˜ ํ•ฉ -> ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ 

 

LASSO (Least Absolute Shrinkage and Selection Operator)

์ฐจ์ด์ ์€ ์ ˆ๋Œ“๊ฐ’!

- sparse solution์„ ์„ ํƒํ•˜๋Š” ๊ฒฝํ–ฅ

- ๋ณ€์ˆ˜ ์„ ํƒ ๋ฐ regularization ๊ธฐ๋Šฅ

- interpretability๋ฅผ ํ–ฅ์ƒ

 

What is right Lambda?

- ๋žŒ๋‹ค๊ฐ€ ์ปค์ง€๋ฉด small parameter๋ฅผ ๊ฐ•์กฐ -> ex) set to all zeros

- ๋žŒ๋‹ค๊ฐ€ ์ž‘์•„์ง€๋ฉด training error ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ž์œ ๋กญ๊ฒŒ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Œ

- ์˜ค๋ฒ„ํ”ผํŒ…/์–ธ๋”ํ”ผํŒ… ์‚ฌ์ด ๊ท ํ˜•์„ ์œ ์ง€ํ•ด์•ผ ํ•จ

 

Normal form with regulation

- Normal form equation์€ regularization์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ์ผ๋ฐ˜ํ™”๋  ์ˆ˜ ์žˆ๋‹ค.

- ๋˜๋Š” ๊ฒฝ์‚ฌํ•˜๊ฐ•์„ ์ด์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

 

Classification

- ๋ถ„๋ฅ˜๋Š” ๋‚จ์ž/์—ฌ์ž, ์ŠคํŒธ/์ผ๋ฐ˜๋ฉ”์ผ, ์•…์„ฑ/์–‘์„ฑ ์ข…์–‘ ๋“ฑ์˜ ๊ตฌ๋ถ„์— ์ด์šฉ

- input record์— ๋ผ๋ฒจ์„ ๋ถ€์—ฌ

 

Regression for Classification

- linear regression์„ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

- ์ด๋•Œ ๊ฐ๊ฐ์˜ ๋ถ„๋ฅ˜์— ๋Œ€ํ•ด 0/1์˜ ์ด์ง„ ๋ถ„๋ฅ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

- positive = 1, negative = 0

- regression ์„ ์€ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋ฅผ ๋‚˜๋ˆŒ ๊ฒƒ์ด๋‹ค.

- ๊ทน๋‹จ์ ์ธ +, - ์‚ฌ๋ก€๋ฅผ ์ถ”๊ฐ€ํ•  ๊ฒฝ์šฐ ์„ ์ด ๋ฐ”๋€๋‹ค.

 

Decision Boundaries

- Feature space์—์„œ ์„ ์„ ํ†ตํ•ด ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

- Logistic Regression: ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ถ„๋ฅ˜ ์„ ์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

sigmoid, logistic

 

Cost for Positive/Negative Cases

- ์„ธํƒ€ ๊ฐ’์„ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž„

- ์ƒˆ๋กœ์šด x์— ๋Œ€ํ•œ ์˜ˆ์ธก

 

Logistic Regression via Gradient Descent

- loss function์ด convexํ•˜๋ฏ€๋กœ, ๊ฒฝ์‚ฌ ํ•˜๊ฐ•์„ ํ†ตํ•ด ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

-> ๋”ฐ๋ผ์„œ ๋‘ ํด๋ž˜์Šค์— ๋Œ€ํ•œ linear seperator๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

 

Logisitc Gender Classification

Red region: 229 w / 63 m, Blue region: 223 m / 65 w

Classification์˜ ๋ฌธ์ œ

1. Balanced Training Classes

- ๊ธ์ • ๋ผ๋ฒจ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ 1๊ฐœ๊ณ  ๋ถ€์ • ๋ผ๋ฒจ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ 10๋งŒ๊ฐœ ์žˆ๋‹ค๋ฉด ์˜ฌ๋ฐ”๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์—†๋‹ค.

- ๊ฐ๊ฐ์˜ ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ ์ˆ˜๋ฅผ ๋งž์ถ”์ž.

* minority class์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ๋” ๋…ธ๋ ฅํ•˜๊ธฐ

* ๋” ํฐ class์˜ ์š”์†Œ๋ฅผ ๋ฒ„๋ฆฌ๊ธฐ

* minority class์— ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ -> overfitting ์กฐ์‹ฌํ•˜๊ธฐ

* small class์— ๋Œ€ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์ œํ•˜๊ธฐ -> random perturbation (๋ณต์›์ถ”์ถœ๋กœ ์—ฌ๋Ÿฌ๊ฐœ ๋ฝ‘์•„์„œ ์•™์ƒ๋ธ” ์ง„ํ–‰)

 

2. Multi-Class Classifications

- ๋ชจ๋“  ๋ถ„๋ฅ˜๊ฐ€ ์ด์ง„์ ์ด์ง€๋Š” ์•Š์Œ.

- ordering ๊ด€๊ณ„๊ฐ€ ์—†๋Š” ๋ถ„๋ฅ˜์— ๋Œ€ํ•ด์„œ๋Š” ๋‹จ์ˆœํžˆ ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด ์•ˆ ๋œ๋‹ค.

- ordinal data์— ๋Œ€ํ•ด์„œ๋งŒ ์ˆซ์ž๋กœ ๋ผ๋ฒจ๋ง ๊ฐ€๋Šฅ. ์•„๋‹Œ ๊ฒฝ์šฐ ์› ํ•ซ ์ธ์ฝ”๋”ฉ ์ด์šฉ.

 

cf) One Versus All Classifiers

- ๋‹ค์ค‘ ๋…๋ฆฝ ์ด์ง„๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ด์šฉํ•˜์—ฌ multiclass classifier๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

- ๊ฐ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์˜ˆ์ธกํ•œ ๊ฐ€๋Šฅ์„ฑ ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฒƒ์„ ์ฑ„ํƒ.

 

3. Hierarchical Classification

- ์œ ์‚ฌ์„ฑ์„ ์ด์šฉํ•ด ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„๊ณ  taxonomy๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ํšจ์œจ์ ์ธ class ๊ฐœ์ˆ˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

- top-down tree๋ฅผ ์ด์šฉํ•ด ๋ถ„๋ฅ˜

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +

Linear Algebra / ์„ ํ˜•๋Œ€์ˆ˜ํ•™

- matrix์˜ ์ˆ˜ํ•™

- ๋ฐ์ดํ„ฐ ๊ณผํ•™์—์„œ ์ค‘์š”ํ•œ ์—ญํ• 

 

n * m matrix๊ฐ€ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ

  row column
Data object features
Geometric Point Sets point dimensions
Systems of equations equations ๊ฐ ๋ณ€์ˆ˜์˜ coefficient

- Graphs/Networks: M[i, j] = vertex i -> vertex j edge ๊ฐœ์ˆ˜

- Vectors: any row, column or d*1 matrix

 

Vector ์‚ฌ์ด์˜ ๊ฐ 

- ๋ฒกํ„ฐ A์™€ B ์‚ฌ์ด์˜ ๊ฐ๋„

- cos(0) = 1 ---> perfect similarity = 0

- cos(pi/2) = 0 ---> ๊ด€๋ จ์ด ์—†๋‹ค

- cos(pi) = -1 ---> perfect anticorrelation

==> cos = correlation of mean zero variables

- unit vector์— ๋Œ€ํ•ด์„œ ๊ทธ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ๋Š” 1์ด๋ฏ€๋กœ, dot product๋กœ ์ •์˜๋œ๋‹ค.

 

Transpose

- ์ •์˜: a*b matrix -> b*a matrix๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ

- addition and transposition

-> B = A^T๋ผ๊ณ  ๊ฐ€์ •ํ–ˆ์„ ๋•Œ ํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•

- a ๊ฐ’์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Œ.

 

Matrix multiplication & Dot Products

- A*B๋Š” ๊ฐ™์€ ๋‚ด๋ถ€ ์ฐจ์›์„ ๊ณต์œ ํ•ด์•ผ๋งŒ ๊ณ„์‚ฐ ๊ฐ€๋Šฅํ•˜๋‹ค.

- ๊ฒฐ๊ณผํ–‰๋ ฌ์˜ ๊ฐ ์š”์†Œ๋Š” row/column ๋ฒกํ„ฐ์˜ dot product 

- dot product๋Š” ๋‘ ๊ฐœ์˜ ๋ฒกํ„ฐ๊ฐ€ ์–ผ๋งˆ๋‚˜ ์œ ์‚ฌํ•œ์ง€ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

- ํ–‰๋ ฌ์˜ ๊ณฑ์…ˆ์€ ๊ฒฐํ•ฉ๋ฒ•์น™์€ ์„ฑ๋ฆฝํ•˜๋‚˜ ๊ตํ™˜๋ฒ•์น™์€ ์„ฑ๋ฆฝํ•˜์ง€ ์•Š๋Š”๋‹ค.

 

Multiplying Feature Matrices

- ํ–‰๋ ฌ A๊ฐ€ n*d data matrix๋ผ๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž.

- n: ๋ฌธ์„œ, d: ์šฉ์–ด

- C ํ–‰๋ ฌ์€ n*n matrix of dot products - ์ ๋“ค ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ–‰๋ ฌ์ด ๋œ๋‹ค.

- D ํ–‰๋ ฌ์€ d*d matrix of dot products - ํŠน์„ฑ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ–‰๋ ฌ์ด ๋œ๋‹ค.

- covariance matrix ๋กœ ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค.

์˜ˆ์‹œ) car - automobile : D matrix ๊ณ„์‚ฐ ํ›„ ๋‘ ๊ฐ’์„ ๋‚˜ํƒ€๋‚ด๋Š” cell์„ ํ™•์ธํ•ด๋ณด๋ฉด ๋‹ค๋ฅธ cell๋ณด๋‹ค ๊ฐ’์ด ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. - ์œ ์‚ฌ๋„๊ฐ€ ๋†’๋‹ค.

 

Interpreting Matrix Multiplication

- 0/1 adjacency matrices ๋ฅผ ๊ณฑํ•˜๋Š” ๊ฒƒ์€ ๋‘ ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ํ–‰๋ ฌ์ด ๋œ๋‹ค.

- Multiplication by permutation matrices๋Š” ํ–‰/์—ด์„ ์žฌ์ •๋ ฌํ•œ๋‹ค.

 

Matrix Rank

- ์ •์˜: ์„ ํ˜• ๋…๋ฆฝ์ ์ธ row์˜ ์ˆ˜๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ

- n*n matrix๋Š” ์ˆœ์œ„๊ฐ€ n์ด ๋˜์–ด์•ผ ํ•œ๋‹ค.

Rank 1 ์ธ Matrix - ๋‘ ์‹์ด ์‚ฌ์‹ค์ƒ ๊ฐ™์€ ์˜๋ฏธ์ด๋ฏ€๋กœ ํ’€ ์ˆ˜ ์—†๋‹ค.
Rank 2์ธ matrix

* ๋žญํฌ๋ž€ ํ–‰๋ ฌ์˜ ์—ด๋“ค๋กœ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋Š” ๋ฒกํ„ฐ ๊ณต๊ฐ„์˜ ์ฐจ์›.

์ฐธ๊ณ ๋ฌธํ—Œ: https://blog.naver.com/sw4r/221416614473

 

[๊ธฐ์ดˆ ์„ ํ˜•๋Œ€์ˆ˜] ํ–‰๋ ฌ์—์„œ Rank (๋žญํฌ) ๋ž€?

์„ ํ˜•๋Œ€์ˆ˜์—์„œ ๋“ฑ์žฅํ•˜๋Š” Rank ๋ผ๋Š” ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ ๊ฐ„๋žตํ•˜๊ฒŒ ์•Œ์•„๋ณด์ž.  ์œ„ํ‚ค์˜ ์ •์˜๋ฅผ ์šฐ์„  ํ™•์ธํ•ด๋ณด...

blog.naver.com

 

2=1 ์ฆ๋ช…์œผ๋กœ๋ถ€ํ„ฐ ์•Œ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ

- ์ฆ๋ช…์— ์˜ค๋ฅ˜๊ฐ€ ์ƒ๊ธฐ๋Š” ์›์ธ: 0์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๊ฐ„๊ณผํ•จ

- ์„ ํ˜• ๋Œ€์ˆ˜์—์„œ singular matrix๋„ ํฌํ•จ๋œ๋‹ค.

 

Matrix๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒƒ

- inverse operation: x๋ฅผ identity element๋กœ ๋‚ด๋ฆฌ๋Š” ๊ฒƒ

- ๊ณฑ์…ˆ์˜ inverse๋Š” ๋‚˜๋ˆ—์…ˆ.

- ๋ง์…ˆ์˜ inverse๋Š” ๋บ„์…ˆ.

 

Matrix Inversion

- A^-1 : A * A^-1 = I ์ธ matrix (I = identity matrix)

- A ํ–‰๋ ฌ์ด inverse๋ฅผ ๊ฐ€์ง„๋‹ค๋ฉด, Gaussian elimination์„ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ๋‹ค.

* ๊ฐ€์šฐ์Šค ์†Œ๊ฑฐ๋ฒ•

https://m.blog.naver.com/siri0831/222033492473

 

์„ ํ˜•๋Œ€์ˆ˜ํ•™(1) - ๊ฐ€์šฐ์Šค ์†Œ๊ฑฐ๋ฒ•(Gauss Elimination)

์•ˆ๋…•ํ•˜์„ธ์š”! ์˜ค๋Š˜๋ถ€ํ„ฐ ์„ ํ˜•๋Œ€์ˆ˜ํ•™์„ ์กฐ๊ธˆ์”ฉ ์˜ฌ๋ ค ์ •๋ฆฌํ•ด๋ณผ๊นŒ ํ•ด์š”~ ๋Œ€ํ•™๊ต ์ž…ํ•™ํ•˜์—ฌ ๋ชจ๋‘๋“ค ์ฒ˜์Œ ์ ‘ํ•˜๊ฒŒ ๋ ...

blog.naver.com

 

Matrix Inversion and Linear Systems

- Ax=b ์‹์— A์˜ ์—ญํ–‰๋ ฌ A^-1์„ ๊ณฑํ•˜๋ฉด ๋‹ค์Œ์˜ ์‹์ด ๋‚˜ํƒ€๋‚œ๋‹ค.

- ์„ ํ˜•์‹์˜ ํ•ด๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์€ matrix์˜ ์—ญํ–‰๋ ฌ์„ ๊ณฑํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™๋‹ค.

Principle Component Analysis (PCA)

- ๋ฐ์ดํ„ฐ๋ฅผ 2๊ฐœ์˜ ์ถ•์œผ๋กœ projectionํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

- ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์ถ•์„ ๊ณจ๋ผ์•ผ ํ•  ๊ฒƒ์ธ๊ฐ€?

1) 1์ฐจ์› projection์„ ํ–ˆ์„ ๋•Œ ๋ฐ์ดํ„ฐ ๋ถ„์‚ฐ์ด ๊ฐ€์žฅ ํฐ ๊ฒƒ์„ ์„ ํƒํ•œ๋‹ค.

* ์ด์œ : ์ •๋ณด ์œ ์‹ค์„ ์ตœ์†Œํ™”ํ•ด์•ผํ•˜๋ฏ€๋กœ

2) ์ฒซ๋ฒˆ์งธ ์ถ•๊ณผ ์ง๊ตํ•˜๋ฉด์„œ ๋ถ„์‚ฐ์ด ๊ฐ€์žฅ ํฐ ์ถ•์„ ์„ ํƒํ•œ๋‹ค. --> ์ƒˆ๋กœ์šด ๊ณต๊ฐ„์„ ๋งŒ๋“ค์–ด๋‚ธ๋‹ค.

 

Singular Value Decomposition (SVD) : ํŠน์ด๊ฐ’ ๋ถ„ํ•ด

- ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ๋“ฑ์— ์‚ฌ์šฉ๋จ

- V*, U*๋Š” ๊ฐ ํ–‰๋ ฌ์˜ ์—ญํ–‰๋ ฌ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ

- https://darkpgmr.tistory.com/106

 

[์„ ํ˜•๋Œ€์ˆ˜ํ•™ #4] ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(Singular Value Decomposition, SVD)์˜ ํ™œ์šฉ

ํ™œ์šฉ๋„ ์ธก๋ฉด์—์„œ ์„ ํ˜•๋Œ€์ˆ˜ํ•™์˜ ๊ฝƒ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” ํŠน์ด๊ฐ’ ๋ถ„ํ•ด(Singular Value Decomposition, SVD)์— ๋Œ€ํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ๋ณดํ†ต์€ ๋ณต์†Œ์ˆ˜ ๊ณต๊ฐ„์„ ํฌํ•จํ•˜์—ฌ ์ •์˜ํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ด์ง€๋งŒ ์ด ๊ธ€์—์„œ๋Š” ์‹ค์ˆ˜(real

darkpgmr.tistory.com

 

Reconstructing Lincoln

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +

Data Science Analysis Pipeline

- Modeling: ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ๋กœ ์ •๋ณด๋ฅผ ๊ฐ์‹ธ๋Š” ๊ณผ์ •

- ํ•ต์‹ฌ ๊ณผ์ •: building, fitting, validating the model

 

Philosophies of Modeling

1. Occam's Razor

- 14์„ธ๊ธฐ ์˜๊ตญ ์ˆ˜๋„์Šน

- ๋œป: ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ์„ค๋ช…์ด ๊ฐ€์žฅ ์ข‹๋‹ค.

- ๊ฐ€์žฅ ์ ์€ ๊ฐ€์ •์„ ๋งŒ๋“œ๋Š” ๋‹ต์„ ์„ ํƒํ•ด์•ผ ํ•œ๋‹ค. -> ๋ชจ๋ธ์—์„œ parameter์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ์•ผ ํ•จ์„ ์˜๋ฏธ

- LASSO/ridge regression ๋“ฑ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์€ ํ”ผ์ณ๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด penalty function์„ ์‚ฌ์šฉ -> ๋ถˆํ•„์š”ํ•œ coefficient๋ฅผ ์ตœ์†Œํ™”

 

2. Bias-Variance Tradeoffs

- "๋ชจ๋“  ๋ชจ๋ธ์€ ํ‹€๋ฆฌ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ์–ด๋–ค ๋ชจ๋ธ์€ ์œ ์šฉํ•˜๋‹ค."

- Bias: ๋ชจ๋ธ์—์„œ ์—๋Ÿฌ๊ฐ€ ์žˆ๋Š” ๊ฐ€์ •์œผ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋œ ์˜ค๋ฅ˜ -> underfitting; ๊ฐ€์ • ์ž์ฒด๊ฐ€ ์ •๋‹ต์—์„œ ๋ฉ€์–ด์ ธ ์žˆ๋Š” ๊ฒฝ์šฐ -> linear๋กœ ๋ณ€ํ™˜๋จ

- Variance: training set ์•ˆ์˜ ์ž‘์€ ๋ณ€ํ™”์— ๋ฏผ๊ฐํ•จ์œผ๋กœ์จ ์ƒ๊ธฐ๋Š” ์˜ค๋ฅ˜ -> overfitting; ๊ฐ€์ • ์ž์ฒด๋Š” ๋งž์œผ๋‚˜ ๋ณ€์ˆ˜์˜ ํผ์ง„ ์ •๋„๊ฐ€ ๋„ˆ๋ฌด ํฐ ๊ฒฝ์šฐ

 

3. Principles of Nate Silver

- ํ™•๋ฅ ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ƒ๊ฐํ•˜๊ธฐ

- ์ƒˆ๋กœ์šด ์ •๋ณด์— ๋ฐ˜์‘ํ•˜์—ฌ ์˜ˆ์ธก์„ ๋ฐ”๊พธ๊ธฐ

- consensus๋ฅผ ์ถ”๊ตฌํ•˜๊ธฐ

 

๋‚˜์˜ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ

- ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ 1๊ฐœ์˜ ๊ฒฐ์ •๋ก ์ ์ธ ์˜ˆ์ธก์„ ์š”๊ตฌํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ผ์ด๋‹ค.

- ์ข‹์€ ์˜ˆ์ธก ๋ชจ๋ธ์€ ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์ œ๊ณต

- logistic regression, k-nearest neighbor ๋“ฑ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ

 

Live Models

- ์ƒˆ๋กœ์šด ์ •๋ณด์— ๋ฐ˜์‘ํ•˜์—ฌ ์—์ธก์„ ๊พธ์ค€ํžˆ ๋ณ€๊ฒฝํ•˜๋ฉด Liveํ•œ ๋ชจ๋ธ์ด๋ผ๊ณ  ํ•œ๋‹ค.

- ์—์ธก์ด ์ •ํ™•ํ•œ ๋‹ต์œผ๋กœ ์ˆ˜๋ ดํ•˜๋Š”๊ฐ€? / ์ด์ „ ์˜ˆ์ธก์„ ๋ณด์—ฌ์ค˜์„œ ๋ชจ๋ธ์˜ ์ผ๊ด€์„ฑ์„ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š”๊ฐ€? / ๋ชจ๋ธ์ด ๋” ์ตœ๊ทผ ๋ฐ์ดํ„ฐ๋กœ ๋‹ค์‹œ ํ•™์Šต๋˜๋Š”๊ฐ€?

 

Consensus ์ถ”๊ตฌ

- Google Flue Trends: illness term์— ๋Œ€ํ•œ ๊ฒ€์ƒ‰ ๋นˆ๋„๋ฅผ ์ด์šฉํ•˜์—ฌ flu ๋ฐœ์ƒ์„ ์˜ˆ์ธก -> ๊ตฌ๊ธ€์ด ๊ฒ€์ƒ‰ ์ œ์•ˆ ๊ธฐ๋Šฅ์„ ๋„์ž…ํ•˜๊ณ  ์‹คํŒจํ•จ

- ๋น„๊ต ๊ฐ€๋Šฅํ•œ ๊ฒฝ์Ÿ ์˜ˆ์ธก์ด ์žˆ๋Š”๊ฐ€? / baseline model์ด ํ•˜๋Š” ๋ง์ด ๋ฌด์—‡์ธ๊ฐ€? / ์˜ˆ์ธก์„ ํ•˜๋Š” ์ ‘๊ทผ ๋ฐฉ๋ฒ•์ด ์„œ๋กœ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€?

- Boosting, Bagging: ๋ถ„๋ฅ˜๊ธฐ์˜ ensemble์„ ๋ช…์‹œ์ ์œผ๋กœ ํ•ฉํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ์ˆ 

 

Taxonomy of Models

1. Linear vs. Non-linear Model

- Linear: ๊ฐ๊ฐ์˜ ํ”ผ์ณ ๋ณ€์ˆ˜๋ฅผ coefficient๋กœ ์ค‘์š”๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์‹์œผ๋กœ ๋‚˜ํƒ€๋ƒ„ -> ์ด๋Ÿฌํ•œ ๊ฐ’๋“ค์„ ๋”ํ•ด ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•จ

- Non-linear: ๊ณ ์ฐจ์›์˜ polynomial, logarithm, exponential ๋“ฑ์„ ํฌํ•จํ•˜๋Š” ์‹

 

2. Blackbox vs. Descriptive Model

- ๋ชจ๋ธ์€ ๋Œ€๋ถ€๋ถ„ ์„ค๋ช…์ ์ž„ = ์™œ ๊ทธ๋Ÿฌํ•œ ๊ฒฐ์ •์„ ๋‚ด๋ ธ๋Š”์ง€ ์„ค๋ช…ํ•จ

- Linear regression model์ด ๊ทธ ์˜ˆ์‹œ, ํŠธ๋ฆฌ๋„ ์˜ˆ์‹œ์ผ ์ˆ˜ ์žˆ์Œ.

- Neural Network Model์€ ์™œ ๊ทธ๋Ÿฌํ•œ ๊ฒฐ์ •์„ ๋‚ด๋ ธ๋Š”์ง€ ์•Œ ์ˆ˜ ์—†์Œ.

-> opaque

 

3. First principle vs. Data-driven Model

- First principle: ์‹œ์Šคํ…œ์ด ์ž‘๋™ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ด๋ก ์ ์ธ ์„ค๋ช…์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ -> ์‹œ๋ฎฌ๋ ˆ์ด์…˜, ๊ณผํ•™ ๊ณต์‹

- Data-driven: input parameter์™€ outcome variables ์‚ฌ์ด์˜ ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ ์ƒ๊ด€์„ฑ -> ๋Œ€๋ถ€๋ถ„์˜ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ

- ์ข‹์€ ๋ชจ๋ธ์€ ๋‘˜์„ ์ ์ ˆํžˆ ์„ž์—ˆ์„ ๋•Œ ๋‚˜์˜ด

 

4. General vs. Ad Hoc Model

- ๋ถ„๋ฅ˜๋‚˜ ํšŒ๊ท€ ๋“ฑ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์€ general = ๋ฌธ์ œ์— ํŠน์ •๋˜๋Š” ์•„์ด๋””์–ด๊ฐ€ ์•„๋‹Œ ํŠน์ • ๋ฐ์ดํ„ฐ๋งŒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ

- Ad Hoc Model์€ ํŠน์ • ๋„๋ฉ”์ธ์˜ ์ง€์‹์„ ์ด์šฉ - ํŠน๋ณ„ํ•œ ๋ชฉ์ 

 

5. Stochastic vs. deterministic

6. Flat vs. Hierarchical

 

๋ชจ๋ธํ‰๊ฐ€ : Baseline

- simplest reasonable models to compare against

- ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ (๋ณต์žก๋„ ์˜ค๋ฆ„์ฐจ์ˆœ)

* uniform/random selection among labels

* training data์˜ ๊ฐ€์žฅ ํ”ํ•œ label

* ๊ฐ€์žฅ ์„ฑ๋Šฅ์ด ์ข‹์€ single-variable model

* ์ด์ „ ์‹œ๊ฐ„ ํฌ์ธํŠธ์™€ ๊ฐ™์€ label

* mean, median

* linear regression

* existing model: SOTA model...

- baseline ๋ชจ๋ธ์€ ๊ณต์ •ํ•ด์•ผํ•œ๋‹ค.

 

๋ถ„๋ฅ˜๊ธฐ ํ‰๊ฐ€

- ์ด์ง„ ๋ถ„๋ฅ˜๊ธฐ์— ์˜ํ•œ ๊ฒฐ๊ณผ๋Š” ์ด 4๊ฐ€์ง€๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Œ

 

Threshold Classifiers

- ์ ์ ˆํ•œ threshold๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”

Accuracy

- ์ „์ฒด ์˜ˆ์ธก์— ๋Œ€ํ•œ ์˜ณ์€ ์˜ˆ์ธก์˜ ๋น„์œจ

- Accuracy๊ฐ€ ๋†’๋‹ค๊ณ  ํ•˜๋”๋ผ๋„ ์‹ค์ œ์˜ ๊ฒฐ๊ณผ์™€๋Š” ์ฐจ์ด๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ ์ •ํ™•๋„๋งŒ์œผ๋กœ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์€ ์˜ณ์ง€ ์•Š๋‹ค

- |P| << |N| ์ผ ๋•Œ ํŠนํžˆ ๋ถ€์ •ํ™•

 

Precision

Recall

F-score

- ์กฐํ™”ํ‰๊ท ์€ ์‚ฐ์ˆ ํ‰๊ท ๋ณด๋‹ค ํ•ญ์ƒ ์ž‘๊ฑฐ๋‚˜ ๊ฐ™์œผ๋ฏ€๋กœ F-score๊ฐ€ ๋†’์•„์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€

- ๋‘ ๊ฐœ์˜ ์ ์ˆ˜๊ฐ€ ๋†’์•„์•ผ ์ตœ์ข…์ ์œผ๋กœ F ๊ฐ’๋„ ๋†’์•„์ง

 

Sensitivity vs. Specificity

- Sensitivity: True Positive Rate - ๋ฏผ๊ฐ๋„

- Specificity: True Negative Rate - ํŠน์ด๋„

- ์•”ํ™˜์ž, ์ŠคํŒธ ๋“ฑ์—์„œ true label์ด ๋ฌด์—‡์ธ์ง€๋Š” ๋‚ด์šฉ์˜ ๊ธ๋ถ€์ •์ด ์•„๋‹ˆ๋ผ ์šฐ๋ฆฌ๊ฐ€ ํƒ€๊ฒŸ์œผ๋กœ ํ•˜๋Š” ๋Œ€์ƒ์˜ ์—ฌ๋ถ€์— ๋‹ฌ๋ฆฐ ๊ฒƒ

 

์ข…ํ•ฉ์ ์œผ๋กœ๋Š”!

- Accuracy: class size๊ฐ€ ๋Œ€์ฒด๋กœ ๋‹ค๋ฅผ ๋•Œ ์ž˜๋ชป๋œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ณ์Œ

- Recall: ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ๊ท ํ˜•์ด ์žกํ˜€์žˆ์„ ๋•Œ ์ •ํ™•๋„์™€ ๊ฐ™์€ ๊ฐ’์ด ๋จ

- High precision: ๊ท ํ˜• ์žกํžˆ์ง€ ์•Š์€ class size์— ๋Œ€ํ•ด์„œ๋Š” ๋งค์šฐ ๋‹ฌ์„ฑํ•˜๊ธฐ ์–ด๋ ค์›€

- F-score: ์–ด๋–ค ๋‹จ์ผ ํ†ต๊ณ„๋Ÿ‰์— ๋Œ€ํ•ด์„œ๋Š” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋ƒ„, but 4๊ฐœ ๊ฐ’์ด ๋ชจ๋‘ ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ฑ๋Šฅ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ์Œ

 

Receiver-Operator Characteristic (ROC) curves

- threshold ๊ฐ’์ด ๋ฐ”๋€Œ๋ฉด recall/precision, TPR/FPR ๊ฐ’์ด ๋ณ€ํ•œ๋‹ค.

- ROC curve ์•„๋ž˜์˜ ๋ฉด์  = AUC๋Š” ์ •ํ™•๋„ ํŒ๋‹จ ๊ธฐ์ค€์ด ๋จ

 

๋ณต์ˆ˜ ๋ถ„๋ฅ˜ ์‹œ์Šคํ…œ ํ‰๊ฐ€

- class๊ฐ€ ๋งŽ์•„์ง€๋ฉด ๋ถ„๋ฅ˜๊ฐ€ ๋” ์–ด๋ ค์›Œ์ง

- top-k success rate: first k๊ฐœ์˜ ์˜ˆ์ธก์— ๋Œ€ํ•ด ์„ฑ๊ณตํ•  ๊ฒฝ์šฐ ๋ณด์ƒ์„ ์ œ๊ณต

(๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ: ์˜ˆ์ธกํ•œ ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ N๊ฐœ์˜ ํด๋ž˜์Šค์™€ ๋™์ผํ•œ ์‹ค์ œ ํด๋ž˜์Šค์˜ ํ‘œ์ค€ ์ •ํ™•๋„)

 

Summary statistics: Numerical Error

- f: forecast, o: observation

- absolute error: (f-o)

- relative error: (f-o)/o -> ๋” ๋‚˜์€ ๊ฒฐ๊ณผ

- ์˜ค๋ฅ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•

* Mean or median squared error (MSE)

* Root mean squared error 

Error histograms

 

Evaluation Data

- out-of-sample prediction: ๋ชจ๋ธ์„ ๋งŒ๋“ค ๋•Œ ๋ณธ ์  ์—†์—ˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœ

- training 60%, validation 20%, testing 20% ์ด๋Ÿฐ ์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• 

- testing data๋Š” ๋‹ค๋ฅธ ๊ฒฝ์šฐ์— ์ ˆ๋Œ€ ๋ณด๋ฉด ์•ˆ ๋˜๊ณ  ๋”ฑ ํ…Œ์ŠคํŠธ ํ•  ๋•Œ๋งŒ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ํ•ด์•ผํ•จ.

 

Sins in evaluation

- training / test data๋ฅผ ์„œ๋กœ ์„ž์—ˆ๋Š”๊ฐ€?

- ๋‚˜์˜ implementation์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋Š”๊ฐ€?

- ์ด๋Ÿฌํ•œ ์งˆ๋ฌธ์„ ๋˜์ ธ์„œ ์˜ค๋ฅ˜๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

 

Cross-validation

- ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ, ๋ฐ์ดํ„ฐ๋ฅผ k๊ฐœ๋กœ ์ชผ๊ฐœ์„œ k-1๋ฒˆ์งธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ํ•™์Šต์„ ์ง„ํ–‰, ๋‚˜๋จธ์ง€๋กœ ํ‰๊ฐ€, ๊ทธ๋ฆฌ๊ณ  ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์กฐ๊ฐ์œผ๋กœ ๊ณ„์† ๋ฐ˜๋ณต 

- limiting case: leave one out validation 

- k-fold

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +
1 2 3 4