profile-img
The merit of an action lies in finishing it to the end.
slide-image

Central Dogma of Statistics

 

 

Statistical Data Distributions

- ๋ชจ๋“  random variable์€ ํŠน์ • ๋นˆ๋„/ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๊ฐ–๋Š”๋‹ค.

- ์ข…๋ฅ˜: binomial distribution, normal distribution, poisson distribution, power law distribution

 

Classical Distribution์˜ ์ค‘์š”์„ฑ

- ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ์Œ

- Closed-form formula(cdf, pdf), test(t-test) ๋“ฑ์„ ์ด์šฉ ๊ฐ€๋Šฅ

- ๋ชจ์–‘์ด ๋น„์Šทํ•˜๋‹ค๊ณ  ์ด๋Ÿฌํ•œ ๋ถ„ํฌ์™€ ๊ฐ™๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์•ˆ ๋œ๋‹ค.

 

Binomial Distribution

- n๊ฐœ์˜ independent trial๋กœ ์ด๋ฃจ์–ด์ง„ ์‹คํ—˜ -> 2๊ฐ€์ง€์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฐ€์ ธ์•ผ ํ•จ

- ์˜ˆ: ๋™์ „ ๋˜์ง€๊ธฐ

- ๋ถ„ํฌ: ์ด์‚ฐ์ ์ด๋‚˜ ์ข… ๋ชจ์–‘์ž„ (๋˜๋Š” half-bell shape)

 

Normal Distribution

- ์ข… ๋ชจ์–‘์„ ๊ฐ€์ง

- ํ‚ค, IQ ๋“ฑ.

- ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Œ.

- ๋ชจ๋“  ์ข… ๋ชจ์–‘ ๋ถ„ํฌ๊ฐ€ normal distribution์€ ์•„๋‹ˆ๋‹ค.

- n์ด ๋ฌดํ•œ๋Œ€๋กœ ๋ฐœ์‚ฐํ•˜๋Š” binomial distribution

- normal distribution์„ ๊ฐ–๋Š” ๋ณ€์ˆ˜๋“ค์˜ ํ•ฉ์€ normalํ•˜๋‹ค.

- normal distribution์˜ mixture๋Š” normalํ•˜์ง€ ์•Š๋‹ค.

-> ์—ฌ์„ฑ์˜ ํ‚ค, ๋‚จ์„ฑ์˜ ํ‚ค ๊ฐ๊ฐ์€ ์ •๊ทœ๋ถ„ํฌ์ง€๋งŒ, ์ „์ฒด ์ธ๊ตฌ์˜ ํ‚ค๋Š” ์ •๊ทœ๋ถ„ํฌ๊ฐ€ ์•„๋‹ˆ๋‹ค.

 

Lifespan Distribution

- ๋งค์ผ๋งค์ผ์˜ ์ƒ์กด ํ™•๋ฅ ์ด p๋ผ๋ฉด, n์ผ๋™์•ˆ ์ƒ์กดํ•  ํ™•๋ฅ ์€ p^(n-1)*(1-p)

 

Poisson Distribution

- rare event์—์„œ์˜ interval์˜ ๋นˆ๋„

 

Power law Distribution

- ์ •์˜: P(X=x) = cx^(-a)

* c: normalization ์ƒ์ˆ˜, a: exponent

* c: a๊ฐ€ ์ฃผ์–ด์ง€๋ฉด, ํ™•๋ฅ ๋“ค์˜ ํ•ฉ์€ 1์ด์–ด์•ผ ํ•˜๋ฏ€๋กœ ํ•˜๋‚˜์˜ ๊ฐ’์œผ๋กœ ์ •ํ•ด์ง„๋‹ค.

- ์ •๊ทœ๋ถ„ํฌ์ฒ˜๋Ÿผ ์ค‘์•™์— ๋ฐ€์ง‘ํ•˜์ง€ ์•Š๊ณ  ์ง€์†์ ์œผ๋กœ ์•„์ฃผ ํฐ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค.

- 80-20 rules: 20%์˜ X๊ฐ€ 80%์˜ Y๋ฅผ ๊ฐ–๋Š”๋‹ค. -> ๋นˆ๋ถ€ ๊ฒฉ์ฐจ ๋“ฑ์—์„œ ์†Œ์ˆ˜์˜ ๋ถ€์ž๋“ค์ด ๋ถ€๋ฅผ ๋…์‹ํ•œ๋‹ค ๋“ฑ๋“ฑ

ex) City Population - Power Law

: ๋ถ€๋ฅผ ๊ฐ€์ง„ ์‚ฌ๋žŒ์ด ์ ์  ๋” ๋ถ€์ž๊ฐ€ ๋œ๋‹ค.

- x๊ฐ€ 2๋ฐฐ๊ฐ€ ๋  ๋•Œ, ํ™•๋ฅ ์€ 2^a๋งŒํผ ์ค„์–ด๋“ ๋‹ค.

- Power Law์˜ ์˜ˆ์‹œ

1) x๊ฐœ์˜ ๋งํฌ๋ฅผ ๊ฐ€์ง„ ์ธํ„ฐ๋„ท ์‚ฌ์ดํŠธ

2) ๋ฆฌํžˆํ„ฐ ๊ทœ๋ชจ์™€ ์ง€์ง„์˜ ๋นˆ๋„

3) ๋‹จ์–ด ์‚ฌ์šฉ ๋นˆ๋„

4) x๋ช…์„ ์ฃฝ์ธ ์ „์Ÿ์˜ ์ˆ˜

 

Word Frequencies and Zipf's Law

- Zipf's law: k๋ฒˆ์งธ๋กœ ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ๋‹จ์–ด๋Š” ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ๋‹จ์–ด์˜ 1/k๋ฒˆ์งธ ๋งŒํผ ์‚ฌ์šฉ๋œ๋‹ค.

-> a = 1์ธ power law

-> 2x๋ฒˆ์งธ์ธ ๋‹จ์–ด๋Š” x๋ฒˆ์งธ ๋‹จ์–ด์˜ ์‚ฌ์šฉ ๋นˆ๋„ * 1/2

 

Power Law์˜ ํŠน์„ฑ

- ํ‰๊ท ์€ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค.

- ํ‘œ์ค€ํŽธ์ฐจ๋„ ์˜๋ฏธ๊ฐ€ ์—†๋‹ค. ํ‰๊ท ๋ณด๋‹ค ํ›จ์”ฌ ํฐ ๊ฐ’์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—

- ์ค‘์•™๊ฐ’์€ ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค.

- ๋ถ„ํฌ๋Š” scale invariantํ•˜๋‹ค = ํ™•๋Œ€ํ•œ ๋ถ„ํฌ์˜ ์ผ๋ถ€๊ฐ€ ์ „์ฒด ๋ถ„ํฌ์˜ ๋ชจ์–‘๊ณผ ๋น„์Šทํ•ด ๋ณด์ธ๋‹ค. 

 

ํ†ต๊ณ„ํ•™์ž vs ๋ฐ์ดํ„ฐ๋งˆ์ด๋„ˆ

- ํ†ต๊ณ„ํ•™์ž: ๋ฐ์ดํ„ฐ์—์„œ ๋ฐœ๊ฒฌํ•œ ๊ฒƒ์ด ์ค‘์š”ํ•œ์ง€์— ๊ด€์‹ฌ

- ๋ฐ์ดํ„ฐ๋งˆ์ด๋„ˆ: ๋ฐ์ดํ„ฐ์—์„œ ๋ฐœ๊ฒฌํ•œ ๊ฒƒ์ด ํฅ๋ฏธ๋กœ์šด์ง€ ๊ด€์‹ฌ

- meaningfulํ•œ ๋ฐœ๊ฒฌ: ํฐ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ•ํ•œ ์ƒ๊ด€๊ด€๊ณ„๋Š” ๊ฒ‰์œผ๋กœ๋Š” ์ค‘์š”ํ•ด๋ณด์ผ ์ˆ˜ ์žˆ์œผ๋‚˜, ๋ฌธ์ œ๋Š” ๋” ์„ธ๋ฐ€ํ•˜๊ฒŒ ๋ด์•ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Comparing Population Means

- T-test: ๋‘ ์ƒ˜ํ”Œ์˜ ํ‰๊ท  ์‚ฌ์ด์˜ ์ฐจ์ด์— ๋Œ€ํ•ด ํ‰๊ฐ€

- ํ‰๊ท ์ด ๋‹ค๋ฅด๊ฑฐ๋‚˜, ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ ๋‹ค๋ฅผ ๊ฒฝ์šฐ ์ฐจ์ด๋ฅผ ๊ทธ๋ƒฅ ํŒ๋‹จํ•˜๊ธฐ ์‰ฝ๋‹ค.

 

T-test

- 2๊ฐœ์˜ ํ‰๊ท ์€ ์ƒ๋‹นํžˆ ๋‹ค๋ฅด๋‹ค. ์–ด๋–จ ๋•Œ?

* ํ‰๊ท ์˜ ์ฐจ์ด๊ฐ€ ๋น„๊ต์  ํด ๋•Œ

* ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ž‘์„ ๋•Œ

* ์ƒ˜ํ”Œ์ด ์ถฉ๋ถ„ํžˆ ํด ๋•Œ

- Welch's t-statistic

* s^2: sample variance

- t distribution table

* df: ์ž์œ ๋„ (๋ชจ์ง‘๋‹จ์˜ ๊ฐœ์ˆ˜๊ฐ€ n๊ฐœ์ผ ๋•Œ ์ž์œ ๋„๋Š” n-1์ด๋‹ค)

* one tail์ผ ๋•Œ t๊ฐ’์˜ ๊ธฐ์ค€์ด ๋‚ฎ์•„์ง„๋‹ค.

* 120์ผ ๋•Œ 1.98, ๋ฌดํ•œ๋Œ€๋กœ ๋ฐœ์‚ฐํ•˜๋Š” ๊ฒฝ์šฐ 1.96์ด ๋œ๋‹ค.

 

Kolmogorov-Smirnov Test (KS-test)

- ๋‘ ๊ฐœ์˜ cdf ์‚ฌ์ด์˜ ์ตœ๋Œ€ y-distance ์ฐจ์ด๋ฅผ ์ด์šฉํ•ด ํ™•๋ฅ  ๋ถ„ํฌ์˜ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋ƒ„.

- max distance between two cdfs: D(C1, C2) = max|C1(x)-C2(x)| (-∞ <= x <= ∞)

- ์œ ์˜์ˆ˜์ค€ a์ผ ๋•Œ, D(C1, C2) > c(a)* √((n1+n2)/n1n2) ์ด๋ฉด ๋‹ค๋ฅธ ๋ถ„ํฌ์ด๋‹ค.

* c(a): table lookup์œผ๋กœ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ

 

Normality Testing

- ์ด๋ก ์ ์ธ ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ๋ง๋œ ๋ถ„ํฌ์— ๋Œ€ํ•ด KS-test๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Bonferroni Correction

- 0.05์˜ ํ†ต๊ณ„์  ์œ ์˜์ˆ˜์ค€์€ ์šฐ์—ฐํžˆ ์ด ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜ํƒ€๋‚ฌ์„ ํ™•๋ฅ ์ด 1/20์ด๋ผ๋Š” ๋œป์ด๋‹ค.

-> ๋” ๋†’์€ ๊ธฐ์ค€์œผ๋กœ ํ‰๊ฐ€๋˜์–ด์•ผ ํ•˜๋Š” ์ด์œ 

- n๊ฐœ์˜ ๊ฐ€์„ค ๊ฒ€์ •์„ ํ•  ๋•Œ, p-value๋Š” a/n ์ˆ˜์ค€์ด ๋˜์–ด์•ผ ํ•œ๋‹ค. -> ๊ทธ๋ž˜์•ผ a ์ˆ˜์ค€์—์„œ ์œ ์˜ํ•˜๊ฒŒ ๊ณ ๋ ค๋  ์ˆ˜ ์žˆ๋‹ค.

 

Significance of Significance

- ์ถฉ๋ถ„ํžˆ ํฐ ์ƒ˜ํ”Œ ์‚ฌ์ด์ฆˆ๋ผ๋ฉด ๊ทน๋„๋กœ ์ž‘์€ ์ฐจ์ด๋Š” ๋งค์šฐ ์œ ์˜ํ•˜๊ฒŒ ์—ฌ๊ฒจ์งˆ ์ˆ˜ ์žˆ๋‹ค.

- significance(์œ ์˜์„ฑ)์€ ๋ถ„ํฌ ์‚ฌ์ด์— ์ฐจ์ด๊ฐ€ ์žˆ๋‹ค๊ณ  ํ™•์‹ ํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ด์ง€, effect size๋‚˜ importance/magnitude of difference๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค.

 

Effect Size ์ธก์ •

- Pearson correlation coefficient: 0.2 = ์ž‘์€ effect, 0.5 = ์ค‘๊ฐ„, 0.8 = ํผ

- Percentage of overlap between distribution: 53% = ์ž‘์Œ, 67% = ์ค‘๊ฐ„, 85% = ํผ

- Cohen's d = (u- u')/sigma : small > 0.2, medium > 0.5, large > 0.8

 

Permutation test, p-values

- ๋ฐ์ดํ„ฐ๋กœ ๊ฐ€์„ค์„ ์ž…์ฆํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๋žœ๋คํ•˜๊ฒŒ ์„ž์€ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ๋„ ๊ฐ€์„ค์ด ์ž…์ฆ๋˜์–ด์•ผ ํ•œ๋‹ค.

- random permutation ์‚ฌ์ด์˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ˆœ์œ„๊ฐ€ significance๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค.

- (์ตœ์†Œ 1000๋ฒˆ ์ด์ƒ-p value๊ฐ€ ์†Œ์ˆ˜์  ์„ธ์ž๋ฆฌ๊นŒ์ง€ ๋‚˜์˜จ๋‹ค) permutation์„ ๋งŽ์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜๋ก, significance๊ฐ€ ๋” ์ค‘์š”ํ•ด์ง„๋‹ค. 

์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์„ค๋ช…ํ•  ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

for i=1 to n do a[i]=i;
for i=1 to n-1 do swap(a[i], a[Random[i,n]);

*Random ํ•จ์ˆ˜์—์„œ 1~n๋ฒˆ์งธ ์‚ฌ์ด์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณจ๋ผ์„œ ์„ž์œผ๋ฉด uniformํ•˜๊ฒŒ ์„ž์ด์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๋œ๋‹ค.

 

Sampling from distributions

- ์›ํ˜• ๋ฐ์ดํ„ฐ์—์„œ ๋ฐ˜์ง€๋ฆ„ ๊ธธ์ด์™€ ๊ฐ์œผ๋กœ ๋žœ๋คํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ฅด๋ฉด ์ค‘์•™์— ๋ชฐ๋ฆฌ๋Š” ๊ฐ’์ด ๋งŽ์•„์ง„๋‹ค.

- (x, y)๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐ์–ด์•ผ ํ•œ๋‹ค.

 

Sampling in One dimension

- ์–ด๋– ํ•œ ํ™•๋ฅ ๋ถ„ํฌ์—์„œ๋“  ์ƒ˜ํ”Œ๋ง์„ ํ•˜๋ ค๋ฉด cdf ํ˜•ํƒœ๋กœ ๋ฐ”๊ฟ”์„œ ์ถ”์ถœํ•˜๋ฉด ๋œ๋‹ค.

 

Statistical Hypothesis Testing

Central Limit Theorem

- random variable: ๋…๋ฆฝ์ ์ด๊ณ  ๋™์งˆ์ ์œผ๋กœ ๋ถ„ํฌ๋œ ํฐ ์ˆ˜์˜ ํ‰๊ท  (i.i.d.)

- random variable์€ ์ •๊ทœ๋ถ„ํฌ ๋˜์–ด์žˆ๋‹ค๊ณ  ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ๋‹ค.

- x1, ..., xn์ด μ, σ^2๋กœ ๋œ random variable์ด๊ณ  n์ด ์—„์ฒญ ํฌ๋‹ค๋ฉด

Z = 1/n * (x1+...+xn)์€ ๊ทผ์‚ฌ์ ์œผ๋กœ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๊ฐ–๋Š”๋‹ค. (ํ‰๊ท  μ, ํ‘œ์ค€ํŽธ์ฐจ σ^2/n)

- n์ด ์—„์ฒญ ์ปค์ง€๋ฉด Binomial(n, p)~Normal(np, np(1-p))์œผ๋กœ ๊ทผ์‚ฌ ๊ฐ€๋Šฅํ•˜๋‹ค.

(n ๋ฒˆ์˜ ๋…๋ฆฝ์ ์ธ ๋ฒ ๋ฅด๋ˆ„์ด ์‹œํ–‰์˜ ํ•ฉ์œผ๋กœ ๋œ ๋žœ๋ค ๋ณ€์ˆ˜)

- ์ค‘์‹ฌ ๊ทนํ•œ ์ •๋ฆฌ์˜ ์ค‘์š”์„ฑ: ๋‹ค๋ฅธ ํ˜•ํƒœ์˜ ๋ถ„ํฌ๋ฅผ ํฌํ•จํ•œ ๋ฌธ์ œ๋ฅผ ์ •๊ทœ๋ถ„ํฌ ํ˜•ํƒœ๋กœ ์ ์šฉ ๊ฐ€๋Šฅ

 

Statistical hypothesis testing

๋™์ „ ๋˜์ง€๊ธฐ์—์„œ ๋™์ „์˜ ์–‘๋ฉด์ด ๊ณตํ‰ํ•˜์ง€ ์•Š์„ ๋•Œ๋ฅผ ๊ฐ€์ •ํ•ด๋ณด๋ฉด,

H0 (๊ท€๋ฌด๊ฐ€์„ค, null hypothesis): ๋™์ „์€ ๊ณตํ‰ํ•˜๋‹ค. == p=0.5์ด๋‹ค.

H1 (๋Œ€๋ฆฝ๊ฐ€์„ค, alternative hypothesis): ๋™์ „์€ ๊ณตํ‰ํ•˜์ง€ ์•Š๋‹ค. -> ์šฐ๋ฆฌ๊ฐ€ ์ž…์ฆํ•˜๊ณ  ์‹ถ์€ ๊ฒƒ

 

๊ฒ€์ฆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

- ๋™์ „์„ n๋ฒˆ ๋˜์ ธ์„œ ์•ž๋ฉด์ด ๋‚˜์˜ค๋Š” ํšŸ์ˆ˜๋ฅผ ์„ผ๋‹ค.

- ๋งค๋ฒˆ์˜ ๋™์ „ ๋˜์ง€๊ธฐ๋Š” Bernoulli trial์ด๋ฏ€๋กœ, X๋Š” Binomial(n, p)์ด๋‹ค.

- CLT์— ์˜ํ•ด X๋Š” Normal(np, np(1-p))๋กœ ๊ทผ์‚ฌ๋  ์ˆ˜ ์žˆ๋‹ค.

- ์œ ์˜ ์ˆ˜์ค€์„ ๊ฒฐ์ •: 1์ข… ์˜ค๋ฅ˜ (False Positive) ๋ฅผ ํ—ˆ์šฉํ•  ๋ฒ”์œ„

- ์œ ์˜์ˆ˜์ค€์„ 0.05๋กœ ๊ฒฐ์ •ํ–ˆ๋‹ค๊ณ  ํ–ˆ์„ ๋•Œ, ๋งŒ์•ฝ ์•ž๋ฉด์ด 532๋ฒˆ ๋‚˜์™”๋‹ค๊ณ  ํ•˜๋ฉด, 4.63%์— ์†ํ•˜๋Š” ๋ฒ”์œ„์ด๋‹ค. -> ์œ ์˜ํ•œ ์ˆ˜์ค€์˜ ๊ฒฐ๊ณผ์ด๋ฏ€๋กœ H0์„ ๊ธฐ๊ฐํ•˜๊ณ  H1์„ ์ฑ„ํƒํ•œ๋‹ค.

 

Error์˜ ์ข…๋ฅ˜

  ๊ท€๋ฌด๊ฐ€์„ค์˜ ์ฐธ/๊ฑฐ์ง“ ์—ฌ๋ถ€
์ฐธ ๊ฑฐ์ง“
H0(๊ท€๋ฌด๊ฐ€์„ค)์— ๋Œ€ํ•œ ํŒ๋‹จ ๊ธฐ๊ฐ (positive call) 1์ข… ์˜ค๋ฅ˜ (False Positive) ์ •๋‹ต (True Positive)
๊ธฐ๊ฐ ์‹คํŒจ (negative call) ์ •๋‹ต (True Negative) 2์ข… ์˜ค๋ฅ˜ (False Negative)

Statistical Hypothesis Testing with p-value

- P-value: ํ™•๋ฅ (๊ท€๋ฌด๊ฐ€์„ค(H0)์ด ์˜ณ๋‹ค๊ณ  ํŒ๋‹จ)

- ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ ์œ ์˜์ˆ˜์ค€์€ 0.05 ๋˜๋Š” 0.01๋กœ ์„ค์ •

- X = 530 ์ผ ๋•Œ, p-value = 0.062, X = 532์ผ ๋•Œ p-value = 0.0463

 

Confidence interval : ์‹ ๋ขฐ๊ตฌ๊ฐ„

1000๋ฒˆ ์ค‘ 529๋ฒˆ ์•ž๋ฉด์„ ๋ณด์•˜์„ ๋•Œ, p=0.529์ด๋‹ค.

p_hat = 0.529
sigma_hat = math.sqrt(p_hat*(1-p_hat)/1000)
print normal_two_sided_bounds(0.95, p_hat, sigma_hat) # 0.95 = ์œ ์˜์ˆ˜์ค€

>>> [0.498, 0.560]
# ์‹ค์ œ p๋Š” ์ด ๊ตฌ๊ฐ„ ์•ˆ์— 95% ํ™•๋ฅ ๋กœ ์กด์žฌํ•œ๋‹ค.

 

Example: Running an A/B test

- 2๊ฐœ์˜ ๊ด‘๊ณ  ์ค‘ ํด๋ฆญ์„ ๋” ๋งŽ์ด ์œ ๋„ํ•˜๋Š” ๊ด‘๊ณ ๋ฅผ ์„ ํƒํ•ด์•ผ ํ•จ.

- Na = A ๊ด‘๊ณ ๋ฅผ ๋ณด๋Š” ์‚ฌ๋žŒ์˜ ์ˆ˜, na = A ๊ด‘๊ณ ๋ฅผ ํด๋ฆญํ•˜๋Š” ์‚ฌ๋žŒ์˜ ์ˆ˜, pa = A ๊ด‘๊ณ ๋ฅผ ํด๋ฆญํ•  ํ™•๋ฅ 

-> na/Na๋Š” Normal(pa, pa(1-pa)/Na)๋กœ ๊ทผ์‚ฌ๋  ์ˆ˜ ์žˆ๋‹ค. (์ด์œ : ๋ฒ ๋ฅด๋ˆ„์ด ์‹œํ–‰์„ n ๋ฒˆ ๋ฐ˜๋ณตํ•œ ๊ฒƒ์ด๋ฏ€๋กœ)

-> nb/Nb๋Š” Normal(pb, pb(1-pb)/Nb)๋กœ ๊ทผ์‚ฌ๋  ์ˆ˜ ์žˆ๋‹ค.

- 2๊ฐœ์˜ ๋ถ„ํฌ๋Š” ๋…๋ฆฝ์ ์ด๋ฏ€๋กœ ๋‘ ๊ฐœ์˜ ์ฐจ์ด๋„ normal ํ•ด์•ผ ํ•œ๋‹ค.

- H0์„ pa=pb๋ผ๊ณ  ๊ฐ€์ •ํ•ด์„œ ๊ฒ€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

- ์œ ์˜์ˆ˜์ค€์„ 0.05๋ผ๊ณ  ๊ฐ€์ •ํ•˜๊ณ  ํ•ด๋ณด์ž.

'School/COSE471 ๋ฐ์ดํ„ฐ๊ณผํ•™' Related Articles +