getting to know a situation through and through in order to be able to decide

getting to know an individual's psychological functioning

testing validity and reliability of a test, that is ideally repeatable.

-confirmation bias: interpreting new evidence as confirmation of one's existing beliefs or theories

-availability heuristics: tendency to check for symptoms that are related to disorders with a high prevalence, assumed before the experiment.

is a standardized procedure for sampling behavior and describing it with categories or scores.

-standardized procedures

-behavior sample

-scores/categories

-norm or standard

- prediction of nontest behaviors

problem analysis, classification and diagnosis, treatment planning, program/treatment evaluation, self-knowledge, and scientific research

intelligence, aptitude, achievement, creativity, personality, interest inventory, behavioral procedures, and neuropsychological/cognitive

every test must rely on some degree of measurement error, expressed by the formula: X=T+e

X observed score

T true score

e positive or negative error

to keep low error probability, through: repeatability, integrality, use of scores or categories, interpretation of scores, prediction of nontest behavior

appraising or estimating the magnitude of one or more attributes in a person

COTAN psychometric criteria

NIP guideline for test use

committee on test and testing in the Netherlands, informs users of the quality of the test and help check systematic errors of instruments.

X=T+e

determines the correlation between tests scores after repeated assessment.

t=consistency of constructs

e=inconsistency of errors

information about the relationship between test scores after related assessment or between items in the test.

behavioral differences

measurement errors: item selection, test administration, test scoring

1. systematic errors; measure of validity, cause it tells how close to the actual behavior, either positive or negative.

2. unsystematic errors; measure of reliability, affects constancy of scores, can be positive and negative.

are + and -, with an average of 0. are not related to T, not related to each other, are normally distributed.

the measure of consistency over multiple assessments.

r=T/X

the closer T is to X, the closer r is to 1.

test-retest

alternate forms

split half & spearman Brown

coefficient alpha

KR-20

inter-scorer/rater

assess the relation between scores of a group on a test with repeated assessments. assumptions:

true scores are correlated, measurement errors are not correlated between them and with true scores.

estimates: random fluctation within individuals and due to environment.

assess relationship between scores of a group on a test with 2 alternate assessments, 2 versions of the same test. estimation of: random fluctuation (indiviudual and due to environment, and due to sample of items).

assess the relation between scores of a group on two test halves. split the test in two halves then calculate the correlation between them.

rsb=2rhh/1+rhh

rsb= estimated reliability complete test using spearman brown

rhh= correlation between 2 test halves

estimation of: random fluctuation due to sample of items.

assess the relation between scores on all possible split halves, with Spearman brown correction.

rα=(N/N-1)(1-Σ σj^2/σ^2)

α = reliability in terms of coefficient alpha

N = number of items test

σj^2= variance of item

Σσj^2 = sum of variances all items

σ^2= variance of the total test score

increases when you have more items and the variance in e increase.

questions only answered with yes or no, measures internal consistency, when questions are alike-> overestimation of reliability.

assess relation between scores of different examiners. when using more subjective methods or diagnostic interviews.

3 assumptions:

1. variance of measurement error on a test is the same for all individuals taking the test

2. the measurement error is normally distributed

3. is a measure of the expected deviation of X relative to T.

=SDe= standard deviation of measurement error

for a perfectly reliable test R is 1 and SEM is 0.

depends on:

-context: what you'll do with the scores

-level of the decision, individual needs higher reliability, group less.

-- Important decisions:

Good ≥ .90

Sufficient ≥ .80 - .90

Insufficient < .80

- Less important decisions:

Good ≥ .80

Sufficient ≥ .70 - .80

Insufficient < .70

- Group level:

Good ≥ .70

Sufficient ≥ .60 - .70

Insufficient < .60

the more important the decision is, the more strict the reliability needs to be.

item response theory: measurement error depends on test score. is more about test construction, whether the relevant items are included or not.

68%CI-Xi +/- 1SDe

90%CI-Xi+/- 1.65SDe

95%CI-Xi+/- 1.96SDe

90%CI-Xi+/- 3SDe

= summary of distribution of characteristics in a representative sample.

two types:

1. relative: classify on a continuum set of data. compare the single to the norm group.

2. absolute: determine if a specific requirement is reached.

percentiles: represents cumulative frequency distributions divided in groups of the same size. is a measure of ranking: means that a % of the norm group score lower than X.

P50 =median

P2= individual with scores more than 2SD below the mean

P16= <-1SD

P84=<+1SD

P98= <+2SD

also referred to as z-score, M=0 and SD=1, reflect raw scores, they have the same distribution. can be positive and negative.

t-scores, to avoid negative/decimated scores. is a normalized score that tells you about your score relative to the norm group.

in order to compare norm scores. you need participants scores, transform them into percentiles, then to equivalent z-score= normalized standard score, and then use it as the normalized standard score. if data is skewed is harder to compare. it's a non-linear transformation. don't assume that is always normally distributed.

other types of standard scores: M=5 and SD=2

summative assessment: did you learn enough

formative assessment: where you at, what you need to improve.

sex, age, class, educational level, etnicity, regional variance, city/rural areas. 3 ways to build one: random sampling, stratified random sample, arbitrary sample. size of sample>400 good, N<300 insufficient. plus durability: norms should be timely.

to give them meaning, when behavior is not anymore typical or normal, and to find the reason for this.

it is a criterion, a number, that can be qualified as weak, acceptable, or good. it reflects the individual functioning in the behavior you want to assess.

content validity

criterion validity

construct validity

the extent to which the contents of the test are representative of the behavior/construct you're assessing. to check it you ask two experts and divide the number of items they both consider relevant and divide them by the total of items: if it's high, high validity.

to know the correlation between test scores and behavior you want to predict. 2 types: concurrent (predicting something that is at the same time assessed), predictive (predicting something that will be assessed in the future). acceptable level of validity: decision theory (hits, false positive, false negatives).

hits, false positive, false negatives

sensitivity, specificity, test prevalence, actual prevalence.

the extent to which the test score is a good reflection of the construct you want to assess.

a variable that cannot be measured directly but is based on theoretical assumptions.

research on the relation between the test scores and scores on other variables.

1- convergent: same construct, different tests

2- discriminant: different construct, different tests

statistical analysis of the components of a test. exploratory factor and confirmatory factors.

theory consistent group difference and theory consistent intervention effects. multimethod and multitrait.

low validity leads to less useful test results.

-not reliable

-theory is incorrect

-test does not measure the right thing but something else

-to assess construct validity, both convergent and discriminant

-reliability> convergent validity> discriminant validity

- a>c>b>d

-if discriminant validity is higher than the convergent, your test is not good.

1. define test purpose

2. choose scaling method

3. item construction and analysis

4. revision

only with a defined and specific purpose you can determine psychometric properties (validity, reliability..). the goal also determines which scale you'll use.

scale= is a collection of items to which the response is scored and combined into a scale score, which tells you about the bahevior of that person.

-unidimentional scaling method: opinions on a topic, individual difference, make predictions

-expert ranking (Glasgow Coma Scale)

-Thurstone scale

-Absolute scale (categorizes items based on the absolute deviation from one reference group)

- Likert scale

-Guttman scale

-Empirical scale

to optimize the item quality, is applicable to all test and has an item-characteristic curve.

you define 4 properties:

-difficulty index

-discrimination index

-reliability index

-validity index

represents (pi) proportion of participants who correctly answered an item (i). 0<pi<1, best value is 0.3<pi<07.

pi= 1+g/ 2

g=chance success level

-item characteristic curve

describes the relation between the value of the characteristic and the likelihood of a correct answer.

how efficiently does an item discriminate (di) between person who obtain high and low scores on the entire test. in the graph, the most discriminat is the steepest line.

di= Uc-Lc / 100

Uc: % who answered correctly in the upper range

Lc: % who answered correctly in the lower range

di>0: is a discriminatory item

di<0: negative discriminatory item

di=0: not a discriminatory item

ideally: 0.3<di<0.6, so positive.

which item lower the reliability of the test.

SDi x riT

riT= correlation between item score and the rest of the test

SDi= standard deviation of item score

higher=better-> more variance, stronger relation between item and test

which item causes low validity of the test

SDi x riC

riC= correlation between item score and criterion

SDi= standard deviation of item scores

higher=better-> more variance, stronger relation between item and criterion

re-assess the quality of the items in a new try-out sample of the test's target population

it is what the test for it assesses. BUT IQ tests are not for defining it, rather for measuring it.

it reflects global capacity of the individual to act purposefully, to think rationally and to deal effectively with the environment.

to map mental skills, predict academic outcomes, explain why some have difficulties in some stuff, assesses relationship between disorders and intelligence.

Galton-intelligence from senses

Spearman-global capacity G and specific factors s

Thurstone- 7 individual factors

Luria-simultaneous and successive processing

Guilford-creative thinking

Cattel-Horn-Carrol

Gardner

Sternberg

intelligence has 3 levels:

level3- overall capacity G

level2- broad cognitive capabilities (fluid vs crystallized)

level1- narrow cognitive abilities

critiques the factor g, basing his theory on evidence from the brain studies and the 'savant': mentally deficient people but with one developed talent.

critiques to g. 3 levels:

analytical

experimental

contextual

is a statistical method to investigate how many relatively independent constructs the test consists of.

2 types:

1. exploratory factors analysis (for developing theories)

2. confirmatory factors analysis (the test theories)

1-factor solution and 5 factor-solution, between which there should not be strong wide difference.

verbal comprehension index

ability to access and apply word knowledge

similarities and vocabulary

visual spatial index

spatial relationships, speed is important.

block design and visual puzzle

fluid reasoning index

fluid intelligence, use reasoning to identify and apply rules.

matrix reasoning and figure weights

working memory index

ability to register, maintain and manipulate information

digit span and picture span

processing speed index

how quickly you identify something

coding and symbol search

full scale IQ M=100 SD=15

individual subtests M=10 SD=3

- arbitrary cutoffs

- specific clinical interpretation

reliability- coefficient α and test-retest reliability

validity- factor analysis, convergent, discriminant, predictive validity.

1. difference scores

2. SEdiff

3. 95% CI around difference score.

detecting relations between figures by means of perceptual similarity or analogy.