Utilisateur
Free-form data that can't be organized into rows and columns.
Data that's well organized and in formats that can be stored in a database. (such as a csv file)
Data that's partially organized and partially free-form. (For example, emails)
Access and manipulate data from databases.
Develop applications and control their behavior.
Automate repetitive operational tasks.
Data that has been:
- COLLECTED
- ORGANIZED
- ISOLATED
Before being used for reporting, analytics and archival purposes.
- DATABASES (Relational & Non-Relational
- DATA WAREHOUSES
- DATA MARTS
- DATA LAKES
- BIG DATA STORES
[DATA REPOSITORY] - Defined by:
- Following a set of organizational principles
- Only storing specific data
- Using specific tools to query, organize, and retrieve data
[DATA REPOSITORY] - Consolidates incoming data in one place.
[DATA REPOSITORY] - Sub-section of a warehouse that isolates data for a specific use case.
[DATA REPOSITORY] - Stores large amounts of structured, semi-structured, and unstructured data in their native format.
*Often used as staging areas.
[DATA REPOSITORY] - 1. Distributes computational and storage infrastructure.
2. Used to store, scale, and process very large data sets.
An automated process that converts raw data into analysis ready data.
EXTRACT data from source location.
TRANSFORM raw data by cleaning, enriching, standardizing, and validating it.
LOADING the processed data into a destination system or data repository.
Yes. Both encompass the process of moving data from its source to a destination such as a data lake or application.
The vast amount of data being produced by people, tools, apps, and machines.
[V]ELOCITY
[V]OLUME
[V]ARIETY
[V]ERACITY
[V]ALUE
The speed at which data accumulates.
The scale of the data or the physical size of stored data.
The diversity of the data such as sources and data type.
The quality and origin of the data including consistency, completeness, integrity, and ambiguity.
The potential to turn the data into tangible value.