profile-img
The merit of an action lies in finishing it to the end.
slide-image

Linear Regression

- n๊ฐœ์˜ ์ ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ€์žฅ ๊ทผ์‚ฌ๋ฅผ ์ž˜ ํ•˜๊ฑฐ๋‚˜ ์ž˜ ๋งž๋Š” ์ง์„ ์„ ์ฐพ๋Š” ๊ฒƒ

 

Error in Linear Regression

- residual error: ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด 

- Least squares regression์€ ๋ชจ๋“  ์ ์˜ ์ž”์ฐจ์˜ ํ•ฉ์„ ์ตœ์†Œํ™”ํ•จ.

-> nice closed form, ๋ถ€ํ˜ธ ๋ฌด์‹œํ•˜๋ฏ€๋กœ ์„ ํƒ๋จ

 

Contour plots - gradient descent

 

Linear function์„ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ 

- ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์›€

- default model์— ์ ํ•ฉ

* ์ผํ•œ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๊ธ‰์—ฌ๊ฐ€ ์ฆ๊ฐ€ / ์ง€์—ญ์ด ์ปค์ง์œผ๋กœ์จ ์„ ํ˜•์ ์œผ๋กœ ์ง‘๊ฐ’์ด ์ƒ์Šน / ๋จน์€ ์Œ์‹์— ๋”ฐ๋ผ ๋ชธ๋ฌด๊ฒŒ๊ฐ€ ์„ ํ˜•์ ์œผ๋กœ ์ฆ๊ฐ€

 

๋ณ€์ˆ˜๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์ผ ๋•Œ

- ๊ฐ๊ฐ์˜ x_n ๋ณ€์ˆ˜๋“ค๊ณผ y๊ฐ’์„ ํ–‰๋ ฌ๋กœ ๋‚˜ํƒ€๋‚ด์–ด ์„ธํƒ€ ๊ฐ’์— ๋Œ€ํ•œ ํ–‰๋ ฌ์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋” ๋‚˜์€ ํšŒ๊ท€ ๋ชจ๋ธ

1. ์ด์ƒ์น˜ ์ œ๊ฑฐ

- ์ž”์ฐจ์˜ quadratic weight ๋•Œ๋ฌธ์— ์ด์ƒ์น˜๋Š” ํšŒ๊ท€ ๋ชจ๋ธ์˜ fit์— ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค.

- ์ด๋Ÿฌํ•œ ์ž”์ฐจ๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์€ ๋” ์ ํ•ฉํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. 

 

2. nonlinear function fitting

- ๊ธฐ๋ณธ์ ์œผ๋กœ Linear regression์€ ์ง์„ ์ด์ง€๋งŒ, x^2, sqrt(x) ๋“ฑ์„ ์ด์šฉํ•˜๋ฉด ๊ณก์„ ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

- ์ž„์˜๋กœ polynomial, exponential, logarithm ๋“ฑ์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

cf) ๋”ฅ๋Ÿฌ๋‹์€ raw feature์—์„œ ์Šค์Šค๋กœ ์›ํ•˜๋Š” ๊ฒƒ์„ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ์–ด feature engineering์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ์ ๋‹ค. ์ตœ๊ทผ์—๋Š” prompt engineering์„ ์ค‘์š”ํ•˜๊ฒŒ ์—ฌ๊ธด๋‹ค.

 

3. feature/target scaling

- ๋„“์€ ๋ฒ”์œ„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ฒŒ ๋˜๋ฉด coefficient๊ฐ€ ์ง€๋‚˜์น˜๊ฒŒ ์ปค์งˆ ์ˆ˜ ์žˆ๋‹ค.

- Z-score ๋“ฑ์œผ๋กœ ์Šค์ผ€์ผ์„ ์กฐ์ •ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

- power law์ด ์ ์šฉ๋˜๋Š” ์ˆ˜์ž… ๋“ฑ์˜ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ํŠนํžˆ ์ค‘์š”ํ•˜๋‹ค.

- x๊ฐ’์„ log(x), sqrt(x) ๋“ฑ์œผ๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‹ค.

- feature๊ฐ€ ์ •๊ทœ๋ถ„ํฌ ํ˜•ํƒœ๋ผ๋ฉด, power law distribution์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ๋Š” linearํ•œ ์กฐํ•ฉ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๊ธฐ ์–ด๋ ต๋‹ค.

- Z normalization์œผ๋กœ ๋ณ€ํ˜•๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šตํ•œ ํ›„, ๊ฒฐ๊ณผ๋Š” ์›๋ž˜ ์ƒํƒœ๋กœ ๋Œ๋ ค๋‘” ํ›„ ๋‚˜ํƒ€๋‚ด๋ฉด ๋œ๋‹ค.

 

4. highly correlated variable ์ œ๊ฑฐ

- ๋‘ ๋ฐ์ดํ„ฐ๊ฐ€ ์„œ๋กœ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ๋†’๋‹ค๋ฉด ๋” ์ด์ƒ ์šฐ๋ฆฌ์—๊ฒŒ ์ค„ ์ˆ˜ ์žˆ๋Š” ์ •๋ณด๊ฐ€ ์—†๋‹ค. ์˜คํžˆ๋ ค ํ˜ผ๋ž€์„ ๊ฐ€์ค‘์‹œํ‚ด.

--> ๋”ฐ๋ผ์„œ ์ œ๊ฑฐํ•ด๋„ ๋œ๋‹ค.

- covariance matrix๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ œ๊ฑฐํ•ด์•ผ ํ•˜๋Š” feature๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

 

Closed form solution์˜ ๋ฌธ์ œ

- ์„ธํƒ€ ๊ฐ’์„ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํฐ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์—ฐ์‚ฐ ์†๋„๊ฐ€ ์—„์ฒญ ๋Š๋ ค์ง„๋‹ค. - O(n^3)

- linear algebra๋Š” ๋‹ค๋ฅธ ๊ณต์‹์— ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

- gradient descent ๋ฐฉ์‹์„ ์„ ํƒํ•˜๊ฒŒ ๋งŒ๋“ ๋‹ค.

 

Lines in Parameter Space

- error function J๋Š” convexํ•˜๋‹ค.

 

Gradient Descent Search

- convex: 1๊ฐœ์˜ local/global minima ๋ฅผ ๊ฐ–๋Š” ๊ณต๊ฐ„

- convexํ•œ ๊ณต๊ฐ„์—์„œ๋Š” minima๋ฅผ ์ฐพ๊ธฐ ์‰ฝ๋‹ค. -> ๊ทธ๋ƒฅ ๊ฒฝ์‚ฌ๋ฅผ ๋”ฐ๋ผ์„œ ๋‚ด๋ ค๊ฐ€๊ธฐ๋งŒ ํ•˜๋ฉด ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

- ์–ด๋–ค ์ ์—์„œ ๋‚ด๋ ค๊ฐ€๋Š” ๋ฐฉํ–ฅ์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์€, ๋ฏธ๋ถ„์„ ํ•ด์„œ tangent line์„ ๋”ฐ๋ผ ๊ฐ€๋ฉด ๋จ

--> (x+dx, f(x+dx))์ ์„ ์ฐพ์€ ํ›„, (x, f(x)) ์ ์— fit

 

Batch Gradient Descent

- Batch: ๊ฐ๊ฐ์˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•์—์„œ ๋ชจ๋“  training sample์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ

- ํ†ต์ƒ์ ์œผ๋กœ๋Š” batch size๋ฅผ ์ค„์—ฌ๊ฐ€๋ฉฐ ๊ฒฝ์‚ฌํ•˜๊ฐ•

 

Local Optima

- J๊ฐ€ convex๊ฐ€ ์•„๋‹ˆ๋ผ๋ฉด, ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ๋”ฐ๋ผ ๊ฐ”์„ ๋•Œ, Local optima์— ๋น ์ ธ๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋‹ค.

 

Effect of Learning Rate / Step Size

- ๋„ˆ๋ฌด ์ž‘์€ ์Šคํ…์œผ๋กœ ์›€์ง์ด๋ฉด optima์— convergenceํ•˜๋Š” ์†๋„๊ฐ€ ๋Šฆ๋‹ค.

- ๋„ˆ๋ฌด ํฐ ์Šคํ…์œผ๋กœ ์›€์ง์ด๋ฉด ๋ชฉํ‘œ์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ๋‹ค.

- ์ ์ ˆํ•œ step size๋ฅผ ๊ตฌํ•˜๋ ค๋ฉด?

-> step size๊ฐ€ ์ ์ ˆํ•œ์ง€ ํŒ๋‹จํ•˜๊ณ , ๋„ˆ๋ฌด ๋Šฆ๋‹ค๋ฉด multiplicative factor (3์˜ ์ง€์ˆ˜๋ฐฐ ๋“ฑ๋“ฑ) ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋Š˜๋ ค๋ณด๊ธฐ

-> ๋„ˆ๋ฌด ํฌ๋‹ค๋ฉด (1/3์˜ ์ง€์ˆ˜๋ฐฐ ๋“ฑ) ์ค„์—ฌ๋ณด๊ธฐ

 

Stochastic Gradient Descent

- batch size๋„ hyperparameter์ด๋‹ค.

- ๋ชจ๋“  example์ด ์•„๋‹Œ ์ผ๋ถ€๋งŒ ์ด์šฉํ•˜์—ฌ derivative๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ๋„ ๋ฐฉ๋ฒ•

 

Regulation

- J ํ•จ์ˆ˜์— coefficient๊ฐ€ ์ž‘๊ฒŒ ์œ ์ง€๋˜๋„๋ก ๋žŒ๋‹ค ๊ฐ’์„ ์ถ”๊ฐ€

- ๋žŒ๋‹ค ๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด error๋Š” ๊ฐ์†Œํ•˜๊ณ , ๋ฌดํ•œ๋Œ€์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด thetha_0๋งŒ ์‹์— ๋‚จ๊ฒŒ ๋œ๋‹ค.

- ๋ฐ์ดํ„ฐ์— ์ตœ๋Œ€ํ•œ ๊ฐ€๊น๊ฒŒ ์‹์„ ๋งŒ๋“ค๋ฉด error๋Š” ๊ฐ์†Œํ•˜์ง€๋งŒ, ์œ„ ๊ณต์‹์—์„œ ํŒŒ๋ž€ ๋ถ€๋ถ„์€ ์ปค์ง„๋‹ค.

 

Interpreting/Penalizing Coefficients

- Squared coefficient์˜ ํ•ฉ์„ Penalizing ํ•˜๋Š” ๊ฒƒ์€ ridge regression or Tikhonov regularization

- coefficient์˜ ์ ˆ๋Œ“๊ฐ’์„ penalizingํ•˜๋Š” ๊ฒƒ์€ LASSO regression์ด๋‹ค.

* L1 metric

* L2: ๊ฐ ์ฐจ์›์— ๋Œ€ํ•œ ์ œ๊ณฑ์˜ ํ•ฉ -> ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ 

 

LASSO (Least Absolute Shrinkage and Selection Operator)

์ฐจ์ด์ ์€ ์ ˆ๋Œ“๊ฐ’!

- sparse solution์„ ์„ ํƒํ•˜๋Š” ๊ฒฝํ–ฅ

- ๋ณ€์ˆ˜ ์„ ํƒ ๋ฐ regularization ๊ธฐ๋Šฅ

- interpretability๋ฅผ ํ–ฅ์ƒ

 

What is right Lambda?

- ๋žŒ๋‹ค๊ฐ€ ์ปค์ง€๋ฉด small parameter๋ฅผ ๊ฐ•์กฐ -> ex) set to all zeros

- ๋žŒ๋‹ค๊ฐ€ ์ž‘์•„์ง€๋ฉด training error ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ž์œ ๋กญ๊ฒŒ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Œ

- ์˜ค๋ฒ„ํ”ผํŒ…/์–ธ๋”ํ”ผํŒ… ์‚ฌ์ด ๊ท ํ˜•์„ ์œ ์ง€ํ•ด์•ผ ํ•จ

 

Normal form with regulation

- Normal form equation์€ regularization์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ์ผ๋ฐ˜ํ™”๋  ์ˆ˜ ์žˆ๋‹ค.

- ๋˜๋Š” ๊ฒฝ์‚ฌํ•˜๊ฐ•์„ ์ด์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

 

Classification

- ๋ถ„๋ฅ˜๋Š” ๋‚จ์ž/์—ฌ์ž, ์ŠคํŒธ/์ผ๋ฐ˜๋ฉ”์ผ, ์•…์„ฑ/์–‘์„ฑ ์ข…์–‘ ๋“ฑ์˜ ๊ตฌ๋ถ„์— ์ด์šฉ

- input record์— ๋ผ๋ฒจ์„ ๋ถ€์—ฌ

 

Regression for Classification

- linear regression์„ ์ด์šฉํ•˜์—ฌ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.

- ์ด๋•Œ ๊ฐ๊ฐ์˜ ๋ถ„๋ฅ˜์— ๋Œ€ํ•ด 0/1์˜ ์ด์ง„ ๋ถ„๋ฅ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

- positive = 1, negative = 0

- regression ์„ ์€ ์ด๋Ÿฌํ•œ ๋ถ„๋ฅ˜๋ฅผ ๋‚˜๋ˆŒ ๊ฒƒ์ด๋‹ค.

- ๊ทน๋‹จ์ ์ธ +, - ์‚ฌ๋ก€๋ฅผ ์ถ”๊ฐ€ํ•  ๊ฒฝ์šฐ ์„ ์ด ๋ฐ”๋€๋‹ค.

 

Decision Boundaries

- Feature space์—์„œ ์„ ์„ ํ†ตํ•ด ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

- Logistic Regression: ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ถ„๋ฅ˜ ์„ ์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•

sigmoid, logistic

 

Cost for Positive/Negative Cases

- ์„ธํƒ€ ๊ฐ’์„ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž„

- ์ƒˆ๋กœ์šด x์— ๋Œ€ํ•œ ์˜ˆ์ธก

 

Logistic Regression via Gradient Descent

- loss function์ด convexํ•˜๋ฏ€๋กœ, ๊ฒฝ์‚ฌ ํ•˜๊ฐ•์„ ํ†ตํ•ด ๊ฐ€์žฅ ์ ํ•ฉํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

-> ๋”ฐ๋ผ์„œ ๋‘ ํด๋ž˜์Šค์— ๋Œ€ํ•œ linear seperator๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

 

Logisitc Gender Classification

Red region: 229 w / 63 m, Blue region: 223 m / 65 w

Classification์˜ ๋ฌธ์ œ

1. Balanced Training Classes

- ๊ธ์ • ๋ผ๋ฒจ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ 1๊ฐœ๊ณ  ๋ถ€์ • ๋ผ๋ฒจ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ 10๋งŒ๊ฐœ ์žˆ๋‹ค๋ฉด ์˜ฌ๋ฐ”๋ฅธ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์—†๋‹ค.

- ๊ฐ๊ฐ์˜ ๋ผ๋ฒจ ๋ฐ์ดํ„ฐ ์ˆ˜๋ฅผ ๋งž์ถ”์ž.

* minority class์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ๋” ๋…ธ๋ ฅํ•˜๊ธฐ

* ๋” ํฐ class์˜ ์š”์†Œ๋ฅผ ๋ฒ„๋ฆฌ๊ธฐ

* minority class์— ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ -> overfitting ์กฐ์‹ฌํ•˜๊ธฐ

* small class์— ๋Œ€ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์ œํ•˜๊ธฐ -> random perturbation (๋ณต์›์ถ”์ถœ๋กœ ์—ฌ๋Ÿฌ๊ฐœ ๋ฝ‘์•„์„œ ์•™์ƒ๋ธ” ์ง„ํ–‰)

 

2. Multi-Class Classifications

- ๋ชจ๋“  ๋ถ„๋ฅ˜๊ฐ€ ์ด์ง„์ ์ด์ง€๋Š” ์•Š์Œ.

- ordering ๊ด€๊ณ„๊ฐ€ ์—†๋Š” ๋ถ„๋ฅ˜์— ๋Œ€ํ•ด์„œ๋Š” ๋‹จ์ˆœํžˆ ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด ์•ˆ ๋œ๋‹ค.

- ordinal data์— ๋Œ€ํ•ด์„œ๋งŒ ์ˆซ์ž๋กœ ๋ผ๋ฒจ๋ง ๊ฐ€๋Šฅ. ์•„๋‹Œ ๊ฒฝ์šฐ ์› ํ•ซ ์ธ์ฝ”๋”ฉ ์ด์šฉ.

 

cf) One Versus All Classifiers

- ๋‹ค์ค‘ ๋…๋ฆฝ ์ด์ง„๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ด์šฉํ•˜์—ฌ multiclass classifier๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

- ๊ฐ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์˜ˆ์ธกํ•œ ๊ฐ€๋Šฅ์„ฑ ์ค‘ ๊ฐ€์žฅ ํฐ ๊ฒƒ์„ ์ฑ„ํƒ.

 

3. Hierarchical Classification

- ์œ ์‚ฌ์„ฑ์„ ์ด์šฉํ•ด ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„๊ณ  taxonomy๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์€ ํšจ์œจ์ ์ธ class ๊ฐœ์ˆ˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค.

- top-down tree๋ฅผ ์ด์šฉํ•ด ๋ถ„๋ฅ˜

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +