itm 618 week 3

What is data mining?

- The process of extracting interesting (non-trivial, implicit, previously unknown and potentially useful) knowledge or patterns from data in large databases

What are the objectives of data mining?

- Discover knowledge that characterizes general properties of data
- Discover patterns on the previous and current data in order to make predictions on future data

What is an alternative name for data mining?

Knowledge discovery in databases (KDD)

In the CRISP-DM process, what do you do under the busines understanding process?

- Determining business objectives: Gathering background information, compiling the business background, and defining business objectives

- Assessing the situation: Requirements, assumptions, and constraints, What sort of data are available for analysis? Do you have access?

- Determining data science goals: Data science goals, Data science success criteria

In the CRISP-DM process, what do you do under the data understanding process?

- Collect initial data: Existing data, purchased data, and additional data

- Describe data: Amount of data and value types

- Verify data quality: Missing data and data errors

In the CRISP-DM process, what do you do under the data preparation process?

- Select Right data: Select training examples and featurs, is a given attribute relevant to your data mining goals

- Clean data: Fill in missed data, correct data errors

- Format data: Put data in a format for training the model

In the CRISP-DM process, what do you do under the modelling process?

- Select modelling techniques: Select data types available for analysis, select an algorithm or a model, define modelling goals, state specific modeling requirements

- Set up hyper parameters and build the model: Train the model, describe the result

- Asses the model: Overfitting and under fitting

In the CRISP-DM process, what do you do under the evaluation process?

- Evaluate the results: Are results presented clearly? Are there any novel findings? Can models and findings be applicable to business goals? How well do the models and findings answer business goals? What additional questions the modeling results have risen?

- Review the process: Did the stage contribute to the value of the results? What went wrong and how it can be fixed? Are there alternative decisions which could have been executed?

- Determine the next steps

In the CRISP-DM process, what do you do under the deployment process?

- Planning for deployment: Summarize models and findings, For each model create a deployment plan, Identify any deployment problems and plan for contingencies

- Plan Monitoring and maintenance: Identify models and findings which require support, How can the accuracy and validity be evaluated?, How will you determine that a model has expired?, What to do with the expired models?

- Conduct a final project review

What is a model?

A simplified representation of reality created to serve a purpose. Examples include maps, prototypes, black-scholes model, etc.

What is a prediction?

An estimate of an unknown value

What is a predictive model?

- A formula for estimating the unknown value of interest: the target
- The formula can be mathematical, logical statement

What is an instance/example?

- Represents a fact or a data point
- Described by a set of attributes (fields, columns, variables, or features)

What is training data?

The input data to create the model

What are the 2 feature types?

- Numeric: Anything that has some order like numbers, dates
- Categorical: Stuff that does not have order like text

What are some common data mining tasks?

- Classification and class probability estimation
- Regression
- Similarity Matching
- Clustering
- Co-occurrence grouping and association rules

What is an example of a classification model?

decsion tree

What is the purpose of a regression model? Provide examples.

- It finds a function from data which relates a real-valued variable with one or more other variables
- For example, predict daily water demand

What is the purpose of a clustering model?

- To group data to form classes (clusters)
- Class label is unknown in the training data
- Principle: maximizing the intra-class similarity and minimizing the inter-class similarity
- Applications include market/customer segmentation

What are supervised targets?

- A supervised technique is given a specific purpose for the grouping—predicting the target.
- Supervised tasks require different techniques than unsupervised tasks and are more useful

What are the 2 main subclasses of supervised data mining?

- Classification and regression

What are the 2 main subclasses of supervised data mining distinguished by?

- They are distinguished by the type of target

What are the 2 types of subclasses of supervised data mining under classication?

- Binary
- Categorical target

What type of supervised data mining might we address the following question with?
"Which service package (S1, S2, or none) will a customer likely purchase if given incen‐ tive I?"

This is also a classification problem, with a three-valued target.

What type of supervised data mining might we address the following question with?
"Will this customer purchase service S1 if given incentive I?"

This is a classification problem because it has a binary target (the customer either purchases or does not).

What type of supervised data mining might we address the following question with?
"How much will this customer use the service?"

This is a regression problem because it has a numeric target. The target variable is the amount of usage (actual or predicted) per customer

Explain how data mining applications can be applied to finance.

- Clustering and classification of customers for targeted marketing
- Identify customer groups or associate a new customer to an appropriate customer group

Explain how data mining applications can be applied to retail

- Discover customer shopping patterns and trends
- Re-arrange store layout
- Purchase recommendation and cross-reference of items

Explain how data mining applications can be applied to DNA Anlysis.

- Association analysis: identification of co-occurring gene sequences
- Most diseases are not triggered by a single gene but by a combination of genes acting together
- Association analysis may help determine the kinds of genes that are likely to co-occur together in target samples

What is dimensionality of a dataset?

- It is the sum of the dimensions of the features
- It the sum of the number of numeric features and the number of values of categorical features

What are association analysis used for?

- It is widely used for market basket or transactional data analysis

Which data mining tasks are supervised methods?

- Classification
- Regression
- Casual modeling
- similarity matching
- Link predicition
- Data reduction

Which data mining tasks are unsupervised methods?

- Similarity matching
- link prediction
- data reduction
- clustering
- co-occurence grouping
- profiling

What are some classical pitfalls in data mining setup?

Quiz
stems list w
Communication
Organisation du noyau
nucleic acid
itm618 week 2
bacteriology
NGO toets 2.3 & 2.4
1- SCIN 1556 Communication infirmière (examen finale)
dual facial
Nucleic acids (a-level)
chapter-2
Afrikanska huvudstäder
La membrane plasmique
Mitochondries
bio 11
Pharmacology
Cytosquelette
newfoundland drivi g test
Communication cellulaire
Les choses practiques
History
bill of rights
french directions
BLG101 Chapter 16
Last section of soc
WLL
Diverse 1
French- Verb to like
French- Pronouns
ADN, opéron Trp
Ljud och ljus begrepp
infection and responses
geschiedenis hoofstuk 2
Chem-121 Exam
PHL Final
EBDM
Lipides 1 et 2
Lipides 3
test review
Python
lecture 1-4 research methdology
Lois de probabilités
Business- Booklet F
Intérêts des statistiques
7 ontleedbare stoffen
Rayons X
metallurgy exam review
CHYS 2P10 Final Flashcards
Week 11 - Skin Care 1 - Assignment - Nutrition
RBCs