profile-img
The merit of an action lies in finishing it to the end.
slide-image

μ„ ν˜• νšŒκ·€ (Linear Regression) λž€ 무엇인가?

νšŒκ·€λž€ x값에 λŒ€μ‘ν•˜λŠ” yκ°’κ³Ό κ°€μž₯ κ°€κΉŒμš΄ κ°’(y hat)λ₯Ό 좜λ ₯ν•˜λŠ” ν•¨μˆ˜λ₯Ό μ°ΎλŠ” 과정이닀. μ—¬κΈ°μ—μ„œ μ‚¬μš©ν•˜λŠ” ν•¨μˆ˜κ°€ μ„ ν˜• ν•¨μˆ˜λΌλ©΄ μ„ ν˜•νšŒκ·€λΌκ³  λ³Ό 수 μžˆλ‹€.

 

μ„ ν˜•νšŒκ·€λ₯Ό μ‚¬μš©ν•˜λŠ” μ΄μœ λŠ” κ·Έ λ‹¨μˆœν•¨μ— μžˆλ‹€. 각 ν”Όμ³μ˜ κ³„μˆ˜μ™€ λͺ©ν‘œν•˜λŠ” λ³€μˆ˜ μ‚¬μ΄μ˜ 관계λ₯Ό 보닀 더 μ§κ΄€μ μœΌλ‘œ ν‘œν˜„ν•  수 있기 λ•Œλ¬Έμ΄λ‹€.

 

μ„ ν˜• νšŒκ·€μ˜ μ’…λ₯˜

- λ‹¨μˆœ μ„ ν˜• νšŒκ·€

피쳐(x)κ°€ ν•˜λ‚˜λΏμΈ λͺ¨λΈμ΄κ³  λͺ©ν‘œ λ³€μˆ˜μ™€ 피쳐가 같은 μ’Œν‘œν‰λ©΄μ— λ‚˜νƒ€λ‚˜λŠ” 직선 ν˜•νƒœμ΄λ‹€. λ‹¨μˆœ μ„ ν˜• νšŒκ·€λŠ” μ„ ν˜• 사상에 λΆ€ν•©ν•œλ‹€. 

 

- 닀쀑 μ„ ν˜• νšŒκ·€

피쳐(x)κ°€ μ—¬λŸ¬ 개인 λͺ¨λΈλ‘œ 닀차원 κ³΅κ°„μ—μ„œ μ‘΄μž¬ν•˜λŠ” μ§μ„ μ˜ ν˜•νƒœλ‘œ ν‘œν˜„λœλ‹€. λ§ˆμ°¬κ°€μ§€λ‘œ μ„ ν˜• 사상에 λΆ€ν•©ν•œλ‹€.

 

- λ‹€ν•­ νšŒκ·€ (Polynomial)

ν”Όμ³λŠ” 1κ°œλΏμ΄μ§€λ§Œ λ‹€ν•­μ‹μ˜ ν˜•νƒœλ‘œ λ‚˜νƒ€λ‚˜λ―€λ‘œ κ³‘μ„ μ˜ ν˜•νƒœλ‘œ ν‘œν˜„λœλ‹€. λ‹€ν•­ νšŒκ·€μ˜ 경우 μ§€μˆ˜κ°€ λŠ˜μ–΄λ‚  λ•Œλ§ˆλ‹€ κΈ‰κ²©νžˆ λ³΅μž‘ν•΄μ§€λ―€λ‘œ νšŒκ·€ λͺ¨λΈμ—μ„œ μ˜€λ²„ν”ΌνŒ…(overfitting, 과적합)이 λ°œμƒν•  κ°€λŠ₯성이 높아진닀. 

 

λΉ„μš©ν•¨μˆ˜

λ¨Έμ‹ λŸ¬λ‹μ—μ„œλŠ” 주어진 데이터λ₯Ό λͺ¨λΈμ— νˆ¬μž…ν•¨μœΌλ‘œμ¨ λͺ¨λΈμ„ μ ν•©ν•˜κ²Œ λ§Œλ“œλŠ” ν•™μŠ΅κ³Όμ •μ„ κ±°μΉœλ‹€. μ΄λ•Œ μ œλŒ€λ‘œ 적합이 되렀면 λͺ¨λΈμ˜ μ˜ˆμΈ‘κ°’κ³Ό μ‹€μ œ 값을 쀄여야 ν•œλ‹€. λͺ¨λΈμ˜ μ˜ˆμΈ‘κ°’κ³Ό μ‹€μ œ κ°’μ˜ 차이λ₯Ό λΉ„κ΅ν•˜λŠ” ν•¨μˆ˜λ₯Ό λΉ„μš©(cost) ν•¨μˆ˜λΌκ³  λΆ€λ₯Έλ‹€. λΉ„μš©ν•¨μˆ˜λŠ” λͺ¨λΈμ˜ νŠΉμ„±μ— 맞게 였차λ₯Ό μΈ‘μ •ν•˜λŠ” 역할을 ν•˜λ©°, 보톡 λΉ„μš©ν•¨μˆ˜ 값이 μž‘μ„μˆ˜λ‘ μ˜€μ°¨κ°€ μž‘μœΌλ©° λͺ¨λΈμ΄ 잘 ν•™μŠ΅λ˜μ—ˆλ‹€λŠ” 것을 μ˜λ―Έν•œλ‹€.

λ‹€μŒκ³Ό 같은 λΉ„μš©ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•  수 μžˆλ‹€.

 

λ‹¨μˆœ μ„ ν˜• νšŒκ·€ κ΅¬ν˜„

sklearn (μ‹Έμ΄ν‚·λŸ°) 을 μ΄μš©ν•œλ‹€. λͺ¨λΈ μ½”λ“œλ₯Ό κ΅¬ν˜„ν•  λ•Œ λ‹€μŒκ³Ό 같은 λͺ¨λ“ˆμ„ μ‚¬μš©ν•˜μ˜€λ‹€.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score, mean_squared_error

ν•™μŠ΅, 검증 데이터λ₯Ό λΆ„λ¦¬ν•˜μ—¬μ•Ό ν•˜λ―€λ‘œ λ‹€μŒκ³Ό 같은 ν•¨μˆ˜λ₯Ό μ΄μš©ν•œλ‹€.

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 156)

x, yλŠ” λ°μ΄ν„°μ—μ„œ 뢈러온 각각의 x, y 값듀이닀.

test_sizeλŠ” 검증 λ°μ΄ν„°λ‘œ 뢄리할 λ°μ΄ν„°μ˜ λΉ„μœ¨μ„ 지정해쀀닀. random_stateλŠ” 아무 μˆ«μžλ‚˜ μ λŠ”λ‹€.

λ°μ΄ν„°μ…‹λ“€λ‘œ ν•™μŠ΅μ„ ν•˜λ €λ©΄ λ‹€μŒκ³Ό 같은 과정을 κ±°μΉœλ‹€.

lr = LinearRegression(fit_intercept = True)
lr.fit(X_train, y_train)

fit_interceptλ₯Ό True둜 μ„€μ •ν•˜μ—¬ λ‹¨μˆœ μ„ ν˜• νšŒκ·€ λͺ¨λΈμ— b0값도 ν¬ν•¨μ‹œν‚¨λ‹€.

이후 lr.fit을 톡해 ν•™μŠ΅μ‹œν‚¨λ‹€.

lr.coef_ # bκ°’λ“€μ˜ array
lr.coef_[0] # b1 값에 μ ‘κ·Ό
lr.intercept_ # b0 κ°’ (절편)에 μ ‘κ·Ό

λͺ¨λΈμ—μ„œ coef_λ₯Ό μ΄μš©ν•˜λ©΄ bκ°’λ“€μ˜ arrayλ₯Ό ꡬ할 수 있고, indexλ₯Ό μ§€μ •ν•˜μ—¬ 각각에 μ ‘κ·Όν•  수 있으며, intercept_λŠ” 절편, 즉 b0 값을 μ•Œλ €μ£ΌλŠ” λ©”μ„œλ“œλ‹€.

y_train_pred = lr.predict(X_train)
y_test_pred = lr.predict(X_test)

train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)

lr.score(X_test, y_test)

predict λ©”μ„œλ“œλ₯Ό μ΄μš©ν•˜μ—¬ κ²°κ³Όλ₯Ό ν™•μΈν•œλ‹€.

r2_scoreμ•ˆμ˜ νŒŒλΌλ―Έν„°λŠ” μ‹€μ œ yκ°’κ³Ό λͺ¨λΈμ΄ λ„μΆœν•œ y hat 값을 λ°›λŠ”λ‹€. r2_scoreκ°€ λ°˜ν™˜ν•˜λŠ” 값이 클수둝 잘 μ˜ˆμΈ‘ν•˜μ˜€λ‹€λŠ” λœ»μ΄λ‹€.

lr.score와 같은 λ°©λ²•μœΌλ‘œλ„ test κ²°κ³Όλ₯Ό 확인할 수 μžˆλ‹€.

 

λ‹€ν•­ νšŒκ·€ κ΅¬ν˜„ - 2nd order Polynomial Regression

quad = PolynomialFeatures(degree = 2)
X_quad = quad.fit_transform(x)

quadratic regression은 2μ°¨ λ‹€ν•­ νšŒκ·€λ₯Ό μ˜λ―Έν•œλ‹€.

PolynomialFeaturesλŠ” νŒŒλΌλ―Έν„°λ₯Ό degree둜 λ°›λŠ”λ°, 여기에 λ„£μ–΄μ£ΌλŠ” μˆ«μžμ— λ”°λΌμ„œ λ‹€ν•­νšŒκ·€μ˜ μ°¨μˆ˜κ°€ κ²°μ •λœλ‹€.

X_train, X_test, y_train, y_test = train_test_split(X_quad, y, random_state = 156)

plr = LinearRegression()
plr.fit(X_train, y_train)

y_train_pred = plr.predict(X_train)
y_test_pred = plr.predict(X_test)

plr.score(X_test, y_test)

train_test_split을 톡해 ν•™μŠ΅, 검증데이터λ₯Ό λΆ„λ¦¬ν•˜κ³  데이터λ₯Ό λͺ¨λΈμ— ν•™μŠ΅μ‹œν‚¨λ‹€. λ‹€μŒμœΌλ‘œ λͺ¨λΈμ΄ μ–Όλ§ˆλ‚˜ μ ν•©ν•œμ§€ 평가할 수 μžˆλ‹€.

λ§Œμ•½ 3μ°¨, 4μ°¨ λ“±μ˜ λ‹€ν•­νšŒκ·€λ₯Ό 해보고 μ‹Άλ‹€λ©΄ μœ„μ—μ„œ degreeκ°’λ§Œ λ³€κ²½μ‹œν‚€κ³  λ‚˜λ¨Έμ§€ 과정은 λ˜‘κ°™μ΄ ν•˜λ©΄ λœλ‹€.

'CS study/λ¨Έμ‹ λŸ¬λ‹' Related Articles +