The probability of obtaining a test statstic at least as unusual / extreme as the one obtained from our sample, given that the null hypothesis is true.
Bc they might have an undue effect on the regression coefficients
For 2 Sample analysis the Null hypothesis is that Mu1 - Mu2 = 0 .
therefore Mu1 and Mu2 are equal and have no difference.
The Null hypothesis is essentially saying that both groups have the same mean.
It is when we are interested in comparing measurements between 2 populations about the topic.
- There are ussually 2 box plots. 1/group.
The grand mean is the average of the 2 means from each group.
i.e. (Mu1 + Mu2) / 2 = Grand Mean
1. Group Means Model
yij = yiBar + rij
yij = Observation
yiBar = the typical value for its sample (i.e. the mean for that sample)
rij = residual / error. This is the difference between the observation and typical value for its sample (mean).
2. Effects Model
yij = yBar + Alpha + rij
yij = observvation
yBar = grand mean
Alpha = This is the group effect. It is the difference between the grand mean and the mean of the sample.
rij = residual / error. This is the difference between the observation and typical value for its sample (mean).
The Alternative hypothesis is saying that the mean for group 1 is not the same as group 2. Hence Group effect is not zero.
1. Independence
- the observations in each sample is independent of each observation in that sample. (same as 1 sample analysis)
- Also observations between the 2 samples are also independent of each other as well.
2. Equal Variance
- Both sample groups have the same variance. i.e. average squared distance an observed point has to the mean is the same for both groups.
3. Normality
- Each group has a normal distribution of data.
A paired sample analysis is an analysis between 2 dependent samples. Whereas the 2 sample analysis is an analysis between 2 independent samples.
- paired also includes same number of observations/sample
i.e. analysis between a group's Blood pressure before drugs and a groups blood pressure after drugs. The before and after are directly dependent as it is the BP of the same person. This is a paired sample analysis.
i.e. How much money do Asian Engineers earn compared to English Engineers in USA. The 2 samples have no relation to each other whatsoever hence are independent of each other. This is a 2 sample analysis.
- Intro
- CI explanation
- State sample size
- State what difference is being measured
- Conclusion
- State which group has the higher mean
- State Confidence interval of difference. Which group higher/lower.
- Intro
- CI
- comment on which mean higher
- comment on CI and state which group is higher/lower
1. Random sample of independent observations
2. Random variable with constant mean and constant variance
1. By looking at the Q-Q plot and seeing of all points roughly lie on the line.
2. By looking at the histogram under the bell curve.
it is the difference between an observed point and the median of that sample.
The Central limit theorem states that if the sample size is 30 or more than the data comes from a normally distributed sample regardless of anything Q-Q plots/histogram.
1. Null Hypothesis
- this is what we test
- this is what we try to disprove
- but we can never prove it to be ever true
2. Alternative Hypothesis
- this is the researchers hypothesis
- try to support this by saying Null hypothesis is crap
y= Mu + Epsilon
50= 38 + Epsilon
Epsilon = 50-38 = 12
Residual is 12.
The pi value is the probability of the Null hypothesis being true. If the p value is less than 5% (0.05) then we have evidence against the Null hypothesis.
i.e. p = 0.01.
This means there is less than 1% chance that the Null Hypothesized mean will contain the true mean. Hence we have evidence against the Null hypothesis and the test is statically significant.
Therefore 99% of the confidence interval will not contain the Null Hypothesized mean.
a 95% CI = chance that the true mean of the whole population is within this range.
No, as we only examined a sample and the interval may or may not contain the true mean. It is impossible to be sure. Under repeated sampling there is still a 5% chance that the true mean is not contained within this interval.
1. Intro
2. Confidence Interval: estimate the average interval (etc 2.5% and 95% values from table)
3. The model explains for R^2 % value of variation
4. Comment on weather this is greater/lower or supporting / not supporting the Null Hypothesis.
Variance square rooted
No evidence against Ho
Evidence against Ho
H = mu1 = mu2 = mun...
- difference between means
estimate between the average Tukey values
F-test for ANOVA we compare the between-group variation with the within-group variation to assess whether there is a difference in the population means.
Thus, by comparing these two measures of variation (spread) with each other, we are able to detect if there are true differences among underlying group population means.
- is the mean of the quantitative measurement the same between all levels of the categorical variable, or are they different?
- what are the differences in measurement between the levels, if any?
- also uses "dummy" variables
- overly right skewed
- residuals "fan" out
- compare estimates
- if larger than 1 standard error then leave out
= better fits data
- p-value in an ANOVA test helps us assess the likelihood that the observed differences between groups are real and not due to random variation.
eg) The lower the p value is for a given ratio, the more reliably we can reject the null hypothesis that a particular source or model or parameter is not significant.
- somethingi = y value
- B1 x (x value)
- Always add the epilson i where epilson =...
- might need to state baselin & dummy variables as well
- non constant scatter
- "fan" effect
-> can log / transform data from thereon
- If estimating "average" use "CI"
- If estimating point, use "Prediction"