Stat 151

what is a census?

special sample that includes everyone in the entire population

what are the problems with a census (3)

too expensive, undercoverage, time consuming

what is a population parameter, how do can they get it? what do they use instead to estimate the population parameter

numerical summary of a population , with a census but they use sample statisitcs instead

there are two inferences made from sample statistics how do they differ?

population inference: results from the sample can be generalized to an entrie population (as estimate)

causal inference: the difference in the responses is caused by the difference in treatments when comparing the results from two treatment groups

when only should we make pop inf's

when we have random sampling

what does randomizing help with

eliminates the effect of unknown extraneous factors, makes sure that on avg the sample looks like rest of the pop

non random sampling leads to

biased results

what are the 4 random sampling methods

1. simple random samples
2. stratified random sampling
3. systematic random sampling
4. cluster random sampling

how does a simple random sample work

each sample of size n in the pop has the same chance of being selected

what do we call sample-to- sample differences

sampling variability

how does stratified random sampling work? what does it reduce?

population is divided into homogenous groups called strata then take an SRS within each stratum before the results are combined. can reduce bias and variabilty of results

how does systematic random sampling work

starts with a random indivudal then sample every x^th person, make sure the order of the list is not associated with the responses sought

how does cluster random sampling work

split pop into clusters (similar groups), then select 1 or a few clusters at random and perform a census

bias means

the tendency for a sample to differ from the corresponding population in some systematic way

what are the 4 sources of bias?

1) selection bias (undercoverage)
2) response bias
3) voluntary response bias
4) nonresponse bias

selection bias (undercoverage) is when

some portion of the population is not sampled at all or has a smaller representation in the sample than it has in the population (usually these poeple differ from the rest of the pop)

response bias means

anything in the survery design that influences the responses (ex. being asked about illegal or unpopular behavior)

voluntary response bias

when indivudals can choose on their own wheter to participate in the sample

nonresponse bias

when a large proportion of those sampled fail to respond

when should we make causal (cause and effect) inference

we should only make causal inf when we have random allocation, when there is no random allocation the difference in responses could be caused by lurking variables

what are lurking variables

variables that are related to both group memberships and to the responses

what are two types of study designs

1) observational studies
2) randomized experiment

an observational study is when , what are the two types

the investigator observes indiviudals and measures variables of interest but does NOT attempt to influence the response, good for trends and possible relationships

1) retrospective
2) prospective

retrospective - observational study

identify subjects and collect data at that moment in time of caused effects

prospective- observational study

identify subjects in advance and collect data as events unfolded over the next period of time (future)

a randomized comparative experiment

allows us to prove a cause and effect relationship by doing the following

a) manipulates factor levels to create treatments
b) randomly assigns subjects to these treatment levels
c) compares the responses of the subject groups across treatment levels

can we make causal inf from observational studies

random allocation allows for what inf

causal inf are allowed

random selection of individuals allows for what inferences

population inf are allowed

categorical variables get what type of graph

bar chart and pie chart

numerical variables get what type of graphs

dot plots, stem plots, histograms, time blots, box plots, scatterplots

after categorical data has been sample it should be summarized to provide what following information

what values have been observed? how often did every value occur?

distribution of acategorical variable is given in form a table poviding what following info

each possible category, frequency of individuals who fall into each category or relative frequency of individuals who fall into each category

how do frequency and relative frequency differ

freuqency- number, relative frequency- percentage

how to calculate relative frequency

frequency/ number of observations

what should a relative freuqency table add up to

100%

what will the bar chart effectively show

the frequencies or percent in different categories

what will the pie chart effectively show

the relationship between parts and the whole

the shape of frequency bar chart and a realtive freq bar chart will be the same, true or false?

true

how to calculate slice size for a pie chart

category relative freq X 360 degrees

how can we explore the relationship between TWO categorical values

contingency table

what is a margin in a contingency table

the margins of the table give totals and the frequency distributions for each of the variables

each frequency distribution in a contingency table is called what

a marginal distribution of its respective variable

how to calculate marginal distribution

total for the variable/ total of the sample

what is a conditional distribution

shows the distribution of one variable for just the observations that satisy a condition on another variable. for example instead of looking at the total of two variables you only look at one variable ex. instead of arts AND science students in total for full time and part time you look at art and science alone

what does a conditional distribution tell us

if two variables are dependent because theres a difference in their distributions between the variables

dependent variables are

if the conditional distribution of one variable is not the same for each category for another, there is an association between these variables

indepedent variables are

if the conditonal distribution of one variable is the same for each category of another, no association between these variables

n is ?

sample size, number of observations of the variable y

variable y is

the variable of interest, what we have sample data of

y1 vs y2

y1 is the first sample observation of the variable y where as y2 is the second sample observation of the variable y

the three most common measures for centre are

the mean (center), the median and the mode

y bar is what

sample mean

how to calculate sample mean

sum of all observations/ number of observations

when there is outliers is the mean a good measure of centre

no because it is not resistant to outliers

what is the alternative value to measure the centre when there is outliers

median

what is the median (M)

the value that divides the ordered sample into two sets of the same size, one hald below M and the other half above M

how to find the median, if n is odd? if n is even?

order data, smallest to largest. If n is odd use the single middle value is n is even use the avg of the middle 2 values

what is mode

an alternative measure for the centre, it is the value that occurs with the highest frequency in a data set

pro and con of mode

pro- easy to locate, con- data may have none or more than 1 mode whereas it will only have one mean and one median

on a graph the mode is the value where the distribution is the

tallest

on a graph the median is the point where the distribution is

cut into two parts of the same area

on a graph the mean is

the balance point of the distribution

in a symmetric distribution what do the measures of centre equal relative to each other.

mean= median= mode

in a positively (right) skewed distribution what do the measures of centre equal relative to each other.

mean > median> mode

in a negatively (left) skewed distribution what do the measures of centre equal relative to each other.

mean < median < mode

what is range

measure of spread. max- min

what are the 3 measures of spread

range, SD, IQR

what stands true about the sum of standard deviations

sum of deviations is equal to 0 always

hwo to calculate IQR

Q3 (75th percentile) - Q1 (25th percentile)

what are upper and lower fences

they are used to determine outliers.

upper fence = Q3 + 1.5 X IQR , every measure above is an outlier

lower fence = Q1- 1.5 X IQR, every measure below is an outlier

symmetric distribution

median line in center of box and whiskers of equal length

skewed right

median line left of center and long right whisker

skewed left

median line right of center and long left whisker

Quiz
science
bio test on who knows really
personalidad
chemistry
gcse biology paper 1 and 2
NIK400
Geography countries
hg
ecosystems
Geografia
cranix
tabarie
Preperations for the battle of hastings
English
vokabeln
The battle of gate fulford
چین
labai skanu- to like, ordering, asking
atmung/lunge
anatomie (bewegungssystem)
litauiska family de 3
NIK 400
anatomia locomotore
Ukrainaka
Nederlands Se1
Stress
Robotic
Virituell production Robotic
T. 6. El sonido
T.5. El tiempo
RTE 003- roots, prefix, suffix- cell structures
RTE 003
unit 3
patient care
Italienisch
excell
franska glossor 1
vocabulaire
inss banco do brasil
Physics 10-1 formulas
anatomia
verbes
Electromagnetic Induction
Cristallographie première
Italian
spanska - kopia
nl 1.3
spanska
Magnetic Fields
Bio-1030