when we observe many observation unit over several time periods.
we can remove omitted variables that breakes the exogeneity assumption to be able to do causal analysis
wide is where we have a row for each observation. Common in coss-sectional data. Long is where we have sorted the observetaions for each name, etc A A A, B B B and each time period
it describes the evolution of unit characteristics over time for many units
it is the process of rearraging data
it means that the rows are statistically independent which means that period 2 has no "memory" of what happened in period 1. They have zero correlation.
it means that every row is sampled from the same population.
there are omitted variables that are important, but they are unomserved. Since they are important, are in U and correlate with the regressor and putput variable, we break U into two parts: A and U. A is the unobserved variables that remain fixed and does not change over time. U does now not correlate with the regressors and our exogeneity assumption holds. We can then use the two different methods to get rid of A and then be able to estimate the causal effect.
it is "i" that contains every same "i" in t periods. It is called observing one unit at two different times.
by jumping from unit to unit.
there can be correlation within the units, etc in unit A periods t t t can correlate. But A can not correlate with unit B. We can therefore think of one unit as on cross-sectional observation.
When we subtract the regression model for period 1 from period 2. The variables without time definition disappear
change in Y = B1*change in variable of interest + change in u
1) A is eliminated, so we can observe the model with OLS
2) B1 reflects the causal effect that we get from OLS. we have to check if the assumptions are good:
- exogeneity assumption E[change in u | change in x1] = 0
- full rank assumption var(change in x1) > 0 which states that there is positive probability that the x1 will change between the two time periods
- random sampling: we observe the change in Y for every cross-sectional which are independent due to the random sampling = so it holds
because of potential serial correlation, we could observe more than two time periods to provide more information
we want a large cross-section dimension so we wants lots of units
that the fixed effect assumption is plausible. We have to study an economic environment where the fixed effect assumption is plausible.
generate new variables for the difference in output and the change of the variable of interest. Then we would regress with these variables instead.
gen d_tax = tax1998-tax1970
gen d_fr = fr1998-fr1970
man beräknar averages inom varje unit och sen tar man och subtarherar medelvärdet inom varje unit med varje time.
formeln blir då:
fr = B1tax + A + U
-
fr-streck = B1tax-streck + A + U-streck
=
fr~ = B1tax~ + U~
vi blir av med fixed effect A och vi får en B1 som kan estimate the caustal effect, men vi måste checka så att det är en good estimate
The first difference transofmation only had one regression equation for each unit (change in Y on change in change in x1), while the fixed effect transformation have two regression equations for every unit, dvs for each time period. Vi tar ju vanlig regression - average regression
stata will compute standard errors with the assumption that all observations from one unit are statistically indipendent over time periods, t =/ t´ which also says that cov(U,U´) = 0 so there is no correlation between the error terms
att en postive shock idag inte ökar sannolikheten för en (fortsatt) positiv shock imorgon. Detta händer alltså inte och uttrycket cov(U,U´) = 0 utesluter serial correlation
det är bättre to assume cross-sectional dependence, dvs att U och U´är statistically indipendent mellan i =/ i´ iställer för times inom en unit. Så antar man nu istället att det inte förekommer någon serial correlation mellan units, men att det får förekomma inom units.
det är när standard errrors are computed under the assumption that certain blocks of observations uppvisar correlation. Dessa blocks of correlated observations kallas för "clusters"
att alla observations (t) of one unit formar ett cluster. Så varje unit består av ett cluster som består av alla observations för en unit.
an additional time period may correlate with the already observed time periods which would not provide a lot of new information.
Stata will detect how much correlation there is!!!
less information -> less percise -> larger SE
då skulle vi kunna bli too optimistic with our relatively small SE since OLS assumes statistically independet observations. Och då tro att om vi adderar mer time periods så skulle vi få mer information, vilket vi egentligen inte får.