Data Analytics Chapter 1 - EDA

What are the types of representation?

Numerical - Continous Data & Discrete Data
Categorial Data - Nominal & Original

Continuous Data

Any data on any Value, no gaps

Continous Data (Interval)

Only differences have meaning, no fixed zero points
- 20 degrees is not twice as hot as 10 degrees

Continous Data (Ratio)

Does have a zero point
- 2km is twice as long as 1 km

Discrete data

Only certain value (integers), has gaps

Categorical data

No numbers as values
- can somtimes have numbers such as value score 1-5, but this is no numerical data (can't calculate with these numbers)

Categorical data (Nominal)

2 or more outcomes that have no natural order

Categorical data (Ordinal)

2 or more outcomes that have a natural order

What are key features of EDA

- Getting to know data before future analysis
- Use graphs
- Generate questions
- Detect errors -> Don't take data for granted

Scatter plots

Used to investigate relations

Mean

The average of values
- Is sensitive to outliers

Median

Odd number of observations
- The middle
- Not sensitive to outliers

Mode

The most frequent value

Percentile

The cut off point for P% of data.
> 1st quartile (Q1): cut off point for 25% of data
> 2nd quartile (Q2): cut off point for 50% of data
> 3rd quartile (Q3): cut off point for 75% of data

Range

Max-Min (sensitive for outliers)

Interquartile range

(IQR) = Q3-Q1

Sample Variance

How much the values differ/vary
- Sensitive to outliers

Sample Standard Deviation

How values are spread out from the average
- Describes variability
- Sensitive to outliers

Median Absolute Deviation

Median of the absolute deviation from the median.
- The higher the above statisitcs are, the more spread/variability there is in the data

Standardization (= z-score normalization)

Transforms data into universal statisical unit of standard deviation from the mean.
- Negative z-score: Value is below mean
- Positive z-score: Value is above mean

Standardization - Rule of thumb

Observations with a z-score larger than 2.5 are considered to be outliers.

Association statisitve: using scatter plots

Captures how strong the relation between two quantities is.
- Positive assossiation -> Higher budget = Higher profit
- Negative assossiation ->Lower budget = Lower profit

Sample Covariance

Must be scaled into sample correlation to be useful

Sample Correlation

- 'No' relation = r(xy) close to 0
- 'Perfect' relation = r(xy) close to -1 or +1

Correlation

Only measures the strength of linear relations

How many peaks does Unimodal distribution have?

1 peak

How many peaks does Bimodal distribution have?

2 peaks

How many peaks does Multimodal distribution have?

More than 2 peaks

How to prove symmetric distribution

Mean = Median

When is a graph right-skewed?

mean > median

When is a graph left skewed?

mean < median

Box-and-Whisker Plot (Boxplot)

- Shows the summary statistics of subets of a data.
- Great to compare groups, but not show the shape of the data

Violin Plot

- Combination of boxplot and density plot
- The thickness of the violin shows the amount of data for the corresponding y-value

What is ECDF

- Empirical Cumulative Distribution Function
- A function that for a given value (x) returns the fraction of observations that are equal to or smaller than x.
- There are no bins, data speaks for itself

Quiz
The Eucharist
final test
Living Midterm
Optical Instruments
Motor and Generators
décolonialisme d'Afrique
bio ch 2 & 3
spanish speaking
citations
Enchantment 2
The danish girl
Enchantment
Drug use (Group 1)
Biological approach
Travel extra information
Drug Names (Group 1)
NGO toets paragraaf 2.1 & 2.2
SEJARAH
Economics Final Exam Part 1
combining forms
Chapter 1- Organization of the body
Economics Final Exam Part 2
KOREAN
quant quiz 1
Dance
Key terms 11.1
Spanish
World Geography
Masonry 1st Degree Section 3
c
Masonry 1st Degree Section 2
maatschappijleer
escalator
Technology
My vocab
econ
r
test 2
scheikunded
Sports
scheikunde min ionen
U.S States
History
French going down the hill :(
german
Movies and Tv
Anglais
engels
eng
Biologie vragen bss 3.4
Biologie vragen bss 3.3
Biologie vragen bss 3.2
a-level bio❤️‍🔥
Biologie vragen bss 3.1
Dekolonisatie
Israël en Palestina
Aardrijkskunde vragen par 3.3
Computer Science
Masonry 1st Degree Section 1
Le produit de l'ettiquetage er de la stigmatisation
la déviance
middeleeuwen namen
oudheid namen
scheikunde plus ionen
FACTOR NATURALEZA Y TRABAJO
german school subjects
internationaal recht - ARW 2
ATENCION Y MEMORIA
CIUDADANIA
TAHUANTINSUYO
materials and their structure
Space
Presidents Quiz 19-46
Weimar study guide
psychology
FILOSOFIA PERUANA
RELIEVES ANDINOS
Facing a Fear - Fiction
RELIEVES DE LA COSTA
CRISIS DE LA REPUBLICA
DERECHO ROMANO
SOCIEDAD ROMANA
MAGISTRATURAS ROMANAS
The Magic Book -Fiction
factores productivos
Cranberries
Matter
Photosynthesis
unit 2 civics
Kända titlar från de 4 perioderna
Carrots
Jellyfish
EU-recht - ARW 2
french vocab
steriod metabolism
Cats
Science 🧪🧬🔬😭
Coagulation
engelska
Final exam
Show me Tell me
AS PSYCH \| mock exam revision
Eco H3 pa 1 t 3 + H4
S.C. Younger and older you
s
Liz sociology
De nouveaux espaces de conquette (Theme 1)
The brain parts and functions
Duits B dieren (hfd 4)
Angleščina (Harrison Bergeron)
Verpleegtechnische Handelingen
Law revision
maatschapij toetseweek 2
The new right
Flashcards
Geschiedenis vragen par 2.4
L'environnement (Theme 5)
Geschiedenis vragen par 2.3
no begrepp v 3
latijn
Vmodel
Module 4
seb
Greek
aarderijkskunde
Beställning, kontrollering och utlämning av glasögon
mth kennistoets
biologie transport begrippen
sav
Thai
italian
bill
farsi
ActSci
Science Quiz- Jan 16th
Spainish Weather🎀
privatjuridik avtalsrätt
Air Brake NL (2024)
Tekniska landvinningar inom glasögon
Glaskunskap & Glasdesign
Spanish Revision
bea
Japanese
What means
Periodic table
PHO7 MENU
Geographie humaine
spanish
Latijn Woorden Kwartaal 2
géo humaine
french
Maatschappij tentamen Politiek & Nederland en de Wereld
Latin prepositions
unit 2 ac 1.1 crime
ggg
Instuderingsfrågor nervsystemet
law
Language of Science (Part 2) Biology
Grundläggande optik
svt
Business extra info
maatschapijleer
noms/verbes
Symbole physique
Unité physique
formules
grade 10 bio
grade 10 bio
Nouns
Anthro week 1 flashcards:
Appendicular muscles and actions
Test science dynamique - statique
Geschiedenis h5/6
han \|\| greetings
Science end of unit solvents
Goverment names test
science - copy.
franska revolutionen
school stuff innit spanish
Meningar
Effectonderzoek in de gedragswetenschappen
Spanish
Vital Signs and Observations
Injuries
Economics - Aggregate supply and demand, inflation, macroeconomic policies (12)
Periodic table
test
Atomic structure
tyska bajs
Conservation of energy
Conservation of energy
동사 (verbs)
ACCT 674 - Chapter 1 - The Accountant's Vital Role in Decision Making
Economics - Market structure (CHAPTER 6)
biology quiz on 1/17/2024
spanish vocab one
cna chapter 7 and 10
vocab lesson 15
cello
Vocab
Human Evolution - Biodiversity
geschiedenis hoofdstuk 5 en 6
Bio
Cell structure and Transport
history years
history statistics
Science Cells test revision
liverpool base destinations
science physique
french test
History: Crime and punishment
kracht en beweging
Psych
Victorian railway v coultas
entreprenuer ship word terms
frans
Onderzoeksmethoden deel 3
Cybercrime
tule sisään lämphöön
Plant evolution - Biodiversity
Physique-chimie révision chapitre 1&2
Ideal and non ideal solutions -Chemistry of life
Reproduction
Boeing 777
french
french
Onderzoeksmethoden deel 2
Vocabulary LPIC
Extra Words from EX
Word Roots and Combining Forms Indicating Color
Spainish 🎀💵
human geo vocab 1 and 2
Important words (french)
What's your understanding of Software
Ne Pas
Kvantitativ och kvalitativ metod
Tort Law- defences to private nuisance
Vetenskaplig metod tenta 13/1-23
The orgin of life and evolution - Biodiversity
Spanish hw
FINANCE
abdomen the third section of an insect’s body adaptations special features, lik
https://docs.google.com/file/d/1wPxjVJXbH9_ScG8gmprdDmxy178BLyVK/edit?usp=docsli
french vocab
business finance formulae
recap of term one 2023
history quiz 1
buisiness
Duits h2 Werkwoorden 3e of 4e naamval
Bio- interactions of nanomaterials
History Test 2 Quiz 3
pe theory
history test two number 2
KNSS 307(Fundamental Concepts)
nobela
Phonetics
english
Thermodynamics - Chemistry of Life
nervsystemet
reductionism and holism
Onderzoeksmethoden deel 1
Jarentallen Duitserijk So
Cognition
Perception
what do you do in the summer
Spanish Preterite tense
한글
ruska avantgarda
ogl202
english reviewer (comparing and constrasting)
linux ogl202
Cell organelle and the cytoskeleton- Cell Biology