Ovido
Sprache
  • Englisch
  • Spanisch
  • Französisch
  • Portugiesisch
  • Deutsch
  • Italienisch
  • Niederländisch
  • Schwedisch
Text
  • Großbuchstaben

Benutzer

  • Anmelden
  • Konto erstellen
  • Auf Premium upgraden
Ovido
  • Startseite
  • Einloggen
  • Konto erstellen

Data Analytics Chapter 1 - EDA

What are the types of representation?

Numerical - Continous Data & Discrete Data
Categorial Data - Nominal & Original

Continuous Data

Any data on any Value, no gaps

Continous Data (Interval)

Only differences have meaning, no fixed zero points
- 20 degrees is not twice as hot as 10 degrees

Continous Data (Ratio)

Does have a zero point
- 2km is twice as long as 1 km

Discrete data

Only certain value (integers), has gaps

Categorical data

No numbers as values
- can somtimes have numbers such as value score 1-5, but this is no numerical data (can't calculate with these numbers)

Categorical data (Nominal)

2 or more outcomes that have no natural order

Categorical data (Ordinal)

2 or more outcomes that have a natural order

What are key features of EDA

- Getting to know data before future analysis
- Use graphs

- Generate questions

- Detect errors -> Don't take data for granted

Scatter plots

Used to investigate relations

Mean

The average of values
- Is sensitive to outliers

Median

Odd number of observations
- The middle

- Not sensitive to outliers

Mode

The most frequent value

Percentile

The cut off point for P% of data.
> 1st quartile (Q1): cut off point for 25% of data

> 2nd quartile (Q2): cut off point for 50% of data

> 3rd quartile (Q3): cut off point for 75% of data

Range

Max-Min (sensitive for outliers)

Interquartile range

(IQR) = Q3-Q1

Sample Variance

How much the values differ/vary
- Sensitive to outliers

Sample Standard Deviation

How values are spread out from the average
- Describes variability

- Sensitive to outliers

Median Absolute Deviation

Median of the absolute deviation from the median.
- The higher the above statisitcs are, the more spread/variability there is in the data

Standardization (= z-score normalization)

Transforms data into universal statisical unit of standard deviation from the mean.
- Negative z-score: Value is below mean

- Positive z-score: Value is above mean

Standardization - Rule of thumb

Observations with a z-score larger than 2.5 are considered to be outliers.

Association statisitve: using scatter plots

Captures how strong the relation between two quantities is.
- Positive assossiation -> Higher budget = Higher profit

- Negative assossiation ->Lower budget = Lower profit

Sample Covariance

Must be scaled into sample correlation to be useful

Sample Correlation

- 'No' relation = r(xy) close to 0
- 'Perfect' relation = r(xy) close to -1 or +1

Correlation

Only measures the strength of linear relations

How many peaks does Unimodal distribution have?

1 peak

How many peaks does Bimodal distribution have?

2 peaks

How many peaks does Multimodal distribution have?

More than 2 peaks

How to prove symmetric distribution

Mean = Median

When is a graph right-skewed?

mean > median

When is a graph left skewed?

mean < median

Box-and-Whisker Plot (Boxplot)

- Shows the summary statistics of subets of a data.
- Great to compare groups, but not show the shape of the data

Violin Plot

- Combination of boxplot and density plot
- The thickness of the violin shows the amount of data for the corresponding y-value

What is ECDF

- Empirical Cumulative Distribution Function
- A function that for a given value (x) returns the fraction of observations that are equal to or smaller than x.

- There are no bins, data speaks for itself

Quiz
The Eucharist
final test
Living Midterm
Optical Instruments
Motor and Generators
décolonialisme d'Afrique
bio ch 2 & 3
spanish speaking
citations
Enchantment 2
The danish girl
Enchantment
Drug use (Group 1)
Biological approach
Travel extra information
Drug Names (Group 1)
NGO toets paragraaf 2.1 & 2.2
SEJARAH
Economics Final Exam Part 1
combining forms
Chapter 1- Organization of the body
Economics Final Exam Part 2
KOREAN
quant quiz 1
Dance
Key terms 11.1
Spanish
World Geography
Masonry 1st Degree Section 3
c
Masonry 1st Degree Section 2
maatschappijleer
escalator
Technology
My vocab
econ
r
test 2
scheikunded
Sports
scheikunde min ionen
U.S States
History
French going down the hill :(
german
Movies and Tv
Anglais
engels
eng
Biologie vragen bss 3.4