↙

1.DATA GENERATION

DATA PREPARATION

Depending on the data modality and task, different types of preprocessing may be applied to the dataset before using it. Datasets are usually split into training data used during model development, and test data used during model evaluation.

DATA COLLECTION

The data generation process begins with data collection. This process involves defining a target population and sampling from it, as well as identifying and measuring features and labels.

DATA ANONIMISATION

The process of protecting private or sensitive information by erasing or encrypting identifiers that connect an individual to stored data.

DATA POISONING

(Capture Process: Sensor)
(Resolution Reduction - Machine Bias)

Involves tampering with machine learning training data to produce undesirable outcomes.

↘

2.TRAINING DATASETS

2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING / 2.1 POPULATION DEFINITION AND SAMPLING /

> Population sampling is the process of taking a subset of subjects that is representative of the entire population. If done wrong, errors can lead to inaccurate and misleading data.

Population - Sample

> A Population refers to any collection of specified group of human beings or of non-human entities such as

Objects,

Educational
institutions,

Time
units,

Geographical
areas,

SOURCE SELECTION

"Data is the first source of value and intelligence."

OPERATOR

Training data is prepared by a human operator

GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.

REPRESENTATION BIAS

> Representation bias occurs when the development sample under-represents some part of the population, and subsequently fails to generalize well for a subset of the used population.[2]

DATA

(Information Reduction - Machine Bias)

Data are a set of values of qualitative or quantitative variables about one or more persons or objects.

/ 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION / 2.2 SELECTION

> Defined as the process of selecting individuals, groups, or data for analysis. (sample)

SELECTION

OPERATOR

Training data is prepared by a human operator

GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.

SAMPLING BIAS

> Sampling bias is a bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others.[4]

SELECTION BIAS

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed.[4]

DATABASE FORMAT

(Format Framing - Machine Bias)

The Machine Learning Database, or MLDB, is an open-source system aimed at tackling big data machine learning tasks.

/ 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT / 2.3 MEASUREMENT

> Measurement in machine learning refers to the choosing, collecting, or computing features and labels to use in a prediction problem.

LABELLING

OPERADOR

Training data is prepared by a human operator

GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.

MEASUREMENT BIAS

> Measurement bias occurs when choosing, collecting, or computing features and labels to use in a prediction problem.[2]

METADATA/ LABELS

(Category Reduction - Machine Bias)

Metadata is descriptive data that labels a piece of information and provides meaning to what that piece of information is.

PREPROCESSING

>Data preprocessing involves transforming raw data to well-formed datasets so that data mining analytics can be applied. Raw data is often incomplete and has inconsistent formatting.

BENCHMARKS

TEST DATA

TRAIN-TEST SPLIT

>The train-test split is a technique for evaluating the performance of a machine learning algorithm. It can be used for classification or regression problems and can be used for any supervised learning algorithm.

TRAINING DATA