1. summarize sample/population data with numbers/tables
2. make predictions about population parameters based on data.
it's removed or kept constant. is called control variable.
correlation between A and B, b takes place after A, correlation not explained by third factor.
RCT where time order is manipulated and randomization excludes third explanations.
examine relationship between x and y within subgroups, or include alternative explanations to the model.
not included in the study, but explain the association investigated.
when both x and y are related to a third variable, their association disappears when controlling for this third variable.
we find no association between x and y, until we control for a third variable.
the relationship between x and y is reversed within levels of a third variable.
mediation: x1indirectly causes y, as x2 intervenes as mediator. when controlling for x2, the association between x1 and y disappears.
the association between x1 and y differs across levels of x2.
no association, positive, or negative.
differences in criterion have multiple causes; correlated-confounding (when adding extra x, y-x relationship changes) or not-correlated (x-y correlation does not change).
association between x1 and y changes, not disappear.
formulate. study variables. descriptive analysis. inferential statistics. interpret and report.
non-directional, directional (positive/negative).
random selection, no manipulation, two variables.
1. check for individual variable's: shape, location, scale.
2. analyse variable together: scatterplot.
if from the scatterplot we see a linear regression fits, we calculate the best straight line falling closest to all data point in the scatterplot.
y=a+bx
Y= predicted criterion
a= expected y when x=0, intercept
b= change in y for a one-unit increase in x, slope.
create a scale.free measure by using SD of both x and y.
r=(sx/sy)b
-1<r<1
variations around the predicted score: is the vertical distance between observed and predicted value: e = y-(y ) ̂
y= a+bx+e
1. TSS= total sum squares, diff. observed and mean score.
2. SSE= sum squared errors, diff. observed and predicted score.
3. RSS= regression sum squares, diff. predicted and mean score.
to predict how well the prediction model works.
proportion of variation in y that is explained by the model.
R2= TSS-SSE/TSS.
0<R2<1.
R2=0, b=0, predictor no explanatory power.
R2=1, perfect prediction.
the larger R2 the better the model.
use H0: by inspecting the probability of finding b, when H0 was true.
how likely is it we find such strong b when H0 is true.
H0: β=0, t=b/se, df=n-2.
F= (RSS/1)/SSE/(n-2), where RSS= df1, SSE/(n-2)= n-k-1, k=number of b.
1. rejecting H0 when is correct (α of .05 means you're ok making the wrong decision 5% of the times).
2. not rejecting H0 when wrong.
1. random sample.
2. linear relation x-y
3. conditional mean around b is = for all x.
4. conditional variance of y is normal for all x.
multiple predictors in the model that explain one outcome variable. we investigate how much the predictors TOGETHER explain variation in y.
y= a + bk*xk + e
a=intercept
b= regression slope of each predictor; effect of one predictor on y, when controlling for all other predictors in the model
eliminate variation explained by PFM or CS, and keep the residuals, between which you investigate the relationship.
it's identical for all predictors.
1. when predicting y without any x: TSS (sample mean)
2. when predicting y with any x: SSE (prediction equation).
tells if the predictors collectively explain variation in y.
uses H0 and HA.
F=MSR/MSE, MSR= variation explained per predictor, MSE= average variation explained by each predictor we could add.
F= ration between MSR and MSE.
F>1, predictors explain more variation than expected from any additional predictor.
inspecting the effect size of a single predictor on y.
3 ways: b*, r^2p, ΔR^2
scaling each b using SD of respective predictor and outcome.
b1*= b1(sx1/sy1), etc.. same formula for pearson correlation in linear regression. tells amounts of sds y should change when xi increases of 1 SD.
partial correlation between x1 and y, while controlling for x2; proportion of variation in y not yet explained by x2, but by x1.
e.g. PFM explained 0.8% of variation in AP not explained by CS.
difference in explained variation as we compare two models:
1. complete: all predictors, yc= a+b1x1+b2x2, Rc^2= RSSc/TSS
2. reduced: without x2, yr=a+b1x1, Rr^2= RSSr/TSS.
ΔR^2= Rc^2-Rr^2.
for two models differing by 1 parameter: is the proportion in y uniquely explained by xi.
1. predicting with x1: SSEr, reduced model.
2. predicting with both x1 and x2: SSEc, complete model.
F= (SSEr-SSEc/df1)/(SSEc/df2), df1= dfr-dfc, df2=dfc.
comparing complete and reduced models differing in 1 b, we test for H0 (partial effect=0). because the complete model reflect HA (partial effect not 0).