Utilisateur
transparency, verification and collaboration
research integrity: dont waste rescources
• standards change over time and people will have to relearn/ keep up-to-date
• mentor/leadership to transfer the rules
• reflection now to deal with dilemmas later
• make values explicit
responsible scholarship
conducting work with integrity and meeting the needs for better quality and efficiency in science.
• safeguard quality
• enable trust within scientific community
• safeguard reputation of science
• equality and equity in opportunities
• prevent waste
• robust cumulative science
1. infrastructure
2. user experience
3. communities
4. incentives
5. policy
• scientific misconduct
• questionable research practice (QRP)
• poor research practice (competence)
• honest errors (fallibility)
same method, different data
replication
same data, different analysis
robustness
same data, same analysis
reproducibility
fragility
• theory maturity
o old/ well-established theories replicate better than new ones with (still) abstract variables
• features of the original study
• features of the replication
small sample, poorly controlled design, false positive, low statistical power, low transparency
small sample, poorly controlled design, false negative, not adhering to the original design
- specifeis the population
- is quantifiable
- is testable
(- mentions the direction)
1. to generate a hypothesis
2. to test the hypothesis
uses the data twice:
- This inflates type I error (false positive), as you would likely find the hypothesis you came up with to be true.
- also makes it less likely the findings will replicate
- sin of bias
- sin of hidden flexibility
- sin of unreliability
- sin of data hoarding
- sin of corruptibility
- sin of internment
- sin of bean counting
favouring studies that confirm your hypothesis
sin of bias
torturing data/ QRP to achieve sign. results
sin of hidden flexibility
low statistical power which leads to false positives
sin of unreliability
not sharing raw data
sin of data hoarding
allowing professional incentives to encourage fraud or ethical breaches
sin of corruptability
publishing behind a paywal
sin of internment
obsessive reliance on specific metrics over research quality
sin of bean counting
1. data citation
2. reporting guidelines
3. data transparency
4. replication
5. analysis transparency
6. badges
7. publication bias
8. material transparency
9. preregistration of analysis
10. preregistration of study
a framework for journals and institutions to improve the transparency and replicability of research
constraints on generalization - who the study can or cannot extend to.
the strength and direction of the relationship between a predictor and the outcome
β (beta)
1. random sampling cases from a population
2. random allocation of treatment
- observations are independent of one another
- observations are identically distributed
• Western
• Educated
• Industrialized
• Rich
• Democratic
- sample size
- effect size
- significance level
- power
false positive
false negative
You accept a 5% chance of finding a false positive
The probability of finding an effect when it is really there
meaning you're willing to miss a real effect 20% of the time. (have a false negative)
The expected magnitude of the phenomenon being studied
small sample sizes might give a distorted view of the population
- sample might not be random
- leads to smaller power (might not detect it)
- which leads to more Type II errors
- so, low replicability
- smallest effect of interest
- desired power and significance level
- distribution of observables
- statistical tests
- one or two-tailed
- anticipated drop-out
- precision of measurements
1. have an (almost) entire population
2. resource constraints
3. power analysis
4. planning for desired accuracy
5. using heuristics
6. acknowledging there is no justification
bootstrapping
a resampling technique. instead of assuming a normal distribution this method (by resampling many different times) lets you estimate power and significance level from the data itself.
your study is either wastefully large or underpowered.
Sequential analysis (no set sample size, no waste of rescources based on a wrong power analysis)
• Collect data in stages
• Analyse results at intermediate points
• Stop when you have enough evidence
results are beyond the control of the scientist, but are what is most important (for publication).
1. unreviewed
2. reviewed
in priciple acceptance
- hypotheses
- methods
o design
o planned sample
o exclusion criteria
o procedure
- analysis plan
o confirmatory analyses
o contingencies and assumptions
• prioritizing theory and method, rather than just results
• distinguishes confirmatory and exploratory research
• transparency
• reduces positive results bias/ publication bias
• reduces reporting bias
• more thoroughness, review, and input
• less dependence on chance
• more opportunity to show skill
• faster dissemination
• help with research
o more collaboration (adversarial collaboration)
• more work?
• too restrictive? loose flexibility?
• null literature?
• idea theft?
• it does not stop fraud
• difficult to prespecify full analysis plan
• difficult to avoid all unambiguity
• difficult to know what effect size to set the power for
- fully exploratory
- studies seeking to capture the effects of unpredictable events
- students working with deadlines for their thesis
Type I, as you want to confirm your hypothesis and make changes based on it and a false positive might lead to wrong changes being made
Type II as you are trying to generate new hypotheses and you will not find anything if you have a false negative
higher power
stricter significance level (0.01)
- transparency (publish both stages)
- standardization
- efficiency
Consistent and repeatable over time
reliability
reliability
Measures what you want it to
validity
validity
reliability and validity
within the same researcher
test-retest reliability
consistency across items
internal consistency
between researchers
interrater reliability
results are genuinely caused by the independent variable rather than confounding factors.
determines if results can be generalized.
- lack of transparency
- ignorance
- negligence
- misrepresentation of evidence
not mentioning why you chose that measure
not researching a validated measure, just making one up
using the measure for an unintended population
claiming a measure is valid when its not
- what is my construct?
- how do I operationalise my construct?
- why did I select my measure?
- did I modify my measure?
- did I create my measure (on a whim)?
When you test a theory, you're not just testing the theory. You're also relying on assumptions:
• That your manipulation actually manipulates what you think it does
• That your measure actually measures what you think it does
Outcome-neutral tests check those assumptions independently.
• That your manipulation actually manipulates what you think it does
• That your measure actually measures what you think it does
Something that should produce an effect if your procedure is working correctly.
It verifies that your measurement or manipulation is sensitive enough to detect a real effect.
If your positive control doesn't work, you know your procedure is broken — not that your hypothesis is wrong.
Something that should not produce an effect if your procedure is working correctly.
It verifies that your measurement isn't picking up noise, confounds, or non-specific responses.
If your negative control shows an effect, something other than your intended manipulation is driving your results.
a scientist's expectations, beliefs, or preferences unintentionally influence the results of a study
o blinding
o maintaining a lab notebook with all choices (and publishing this aswell)
o reflexivity – embracing and acknowledging the researchers influence
embracing and acknowledging the researchers influence
