Utilisateur
med ols drar regression curve en linje där avståndet till observationerna är deg lägsta (man målar fyrkabter och arean av dessa är least ordinary squares)
om fitted är less steep than the true value, then we have underestimated the true causal effect.
fitted reg. curve is for our sample. we want to be as close as possibke to the true reg. curve to capture the true causal effects as much as possible.
law of large numbers suggest that if we run a simulation many times, we can count the estimates that fall into this range as a proportion of the number of simulations we run. that would give us an estimate of this probability that wil be very accurate if we run a lot of simulations.
etc: varje simulaton. består av ett sample med n=40 och vi har gjort 400 simulations (dragningar). det betyder att vi har 400 B1^ som represents each simulated sample. sometimes we will under or overshoot. variabiliteten can be sunmarized in the variance of the ols estimator.
random sampling
expgeneity assumption holds
larger n => decreased error => sampling distribution of the ols estimator becomes more concentrated around its average value.
so over- and under shooting decreases and randomness decreases so the ols estimator becomes more precise.
it restricts the variance of the error term. residualerna när man plottar dom får inte variera. etc variansen får inte bli större när vi lkar värdet på x-axeln. obseevationerna/residuals ska vara normally/equally distributed around the value 0. homoscedasticity är typically not satisfied in reality
var(B1^) = var(u) / n(1-r)•var(x1)
var(u) är large, så är variance of B1 large
stort n, liten varians av B1
(1-r) är only relevant if we have more than one regressor in our model. r är ett värde mellan 0-1 och visar to what degree x1 is predicted by other regressors (correlation). so large r gives larger variance of B1.
larger var(x1) giver smaller variance of B1.
variance inflation. om man adderar flera regressors som correlerar med x1 till en modell, så kommer r vara högre och påverka (1-r) som ljar variansen av B1
när det inte ör homoscedastic. the variance of u (error term) will increase or decrase as we move along x-axis, increasing the value of x1.
standard error, och den estimates the standard deviation of B1^ which is a measure of the precision of B1^. stata compute SE
classical: assumes homoscedastic errors and SE will only estimate the correct population value if the error term (u) is homoscedastic.
robust SE: corrected standards errors for heteroscedasticity.
alltid bäst att köra regressions med robust för att vara säker. SE works in bort cases of homo or hetero
clustered standard errors: if etc the students have interacted with eachother and therefore may share some if their u. clustered allow us to take such dependecies into account
more n
=> the fitted curve and population curve move closer ti each other
as we incease n, the OLS estimator converges to a normal distribution with a density curve, tails and probability area under density curve
(B1^ - B1) / sd (B1^)