Numerical - Continous Data & Discrete Data
Categorial Data - Nominal & Original
Any data on any Value, no gaps
Only differences have meaning, no fixed zero points
- 20 degrees is not twice as hot as 10 degrees
Does have a zero point
- 2km is twice as long as 1 km
Only certain value (integers), has gaps
No numbers as values
- can somtimes have numbers such as value score 1-5, but this is no numerical data (can't calculate with these numbers)
2 or more outcomes that have no natural order
2 or more outcomes that have a natural order
- Getting to know data before future analysis
- Use graphs
- Generate questions
- Detect errors -> Don't take data for granted
Used to investigate relations
The average of values
- Is sensitive to outliers
Odd number of observations
- The middle
- Not sensitive to outliers
The most frequent value
The cut off point for P% of data.
> 1st quartile (Q1): cut off point for 25% of data
> 2nd quartile (Q2): cut off point for 50% of data
> 3rd quartile (Q3): cut off point for 75% of data
Max-Min (sensitive for outliers)
(IQR) = Q3-Q1
How much the values differ/vary
- Sensitive to outliers
How values are spread out from the average
- Describes variability
- Sensitive to outliers
Median of the absolute deviation from the median.
- The higher the above statisitcs are, the more spread/variability there is in the data
Transforms data into universal statisical unit of standard deviation from the mean.
- Negative z-score: Value is below mean
- Positive z-score: Value is above mean
Observations with a z-score larger than 2.5 are considered to be outliers.
Captures how strong the relation between two quantities is.
- Positive assossiation -> Higher budget = Higher profit
- Negative assossiation ->Lower budget = Lower profit
Must be scaled into sample correlation to be useful
- 'No' relation = r(xy) close to 0
- 'Perfect' relation = r(xy) close to -1 or +1
Only measures the strength of linear relations
1 peak
2 peaks
More than 2 peaks
Mean = Median
mean > median
mean < median
- Shows the summary statistics of subets of a data.
- Great to compare groups, but not show the shape of the data
- Combination of boxplot and density plot
- The thickness of the violin shows the amount of data for the corresponding y-value
- Empirical Cumulative Distribution Function
- A function that for a given value (x) returns the fraction of observations that are equal to or smaller than x.
- There are no bins, data speaks for itself