"Computer Says No"
×
about
O crescente uso de sistemas de inteligência artificial (IA) têm atualmente impacto em várias esferas da sociedade contemporânea, desde os media online às finanças, política, saúde ou educação, deixando poucas áreas intocadas. Estes sistemas são, na sua grande maioria, baseados em algoritmos de aprendizagem automática ou machine learning (ML) que aprendem a tomar decisões com vista a um determinado fim, a partir de vastos conjuntos de dados de treino. Como tal, o desenvolvimento destas técnicas teve um crescimento exponencial com a generalização da Internet e expansão das redes sociais. Estes sistemas tornam-se praticamente ubíquos, alargando o debate sobre as suas implicações na sociedade.

À medida que delegamos capacidades como raciocínio, aprendizagem, reconhecimento de padrões, inferência ou dedução a estes sistemas, a opacidade inerente à sua complexidade frequentemente obscurece o viés incorporado nos seus dados de treino ou processos de tomada de decisões.

O projeto “Computer says no” toma por título uma expressão que satiriza esta opacidade, procurando revelar os vieses ocultos da aprendizagem automática. Desenvolvido como uma narrativa evolutiva e diagramática, apresentada num website, o projeto decompõe a forma como o viés humano, social e cultural é incorporado nos conjuntos de dados de treino e amplificado no processo de aprendizagem automática.

Esta abordagem procura promover uma reflexão informada sobre o ciclo "biased data in, biased data out" na propagação de desigualdades e formas de discriminação social, atendendo a que, como explica Kate Crawford, “Os dados e conjuntos de dados não são objetivos; são criações de design humano. Damos aos números a sua voz, deles tiramos inferências, e definimos o seu significado através das nossas interpretações".


+

All contents extracted from the © respective authors and their works; only with the purpose of exploratory content manipulation and the creation of new publications & objects - integrated in an academic experimentation project.


+

Main references:

> https://nooscope.ai/

> https://dl.acm.org/doi/fullHtml/10.1145/3465416.3483305


+

Actual cases of "Human labeling bias" presented:

> "ai is sending people to jail - and getting it wrong - By Karen Hao, 2019"

> "Millions of black people affected by racioal bias in health care algorithms - By Heidi Ledford, 2019"

> "Google apologizes after its Vision Ai produced racist results - By Nicolas Kayser-Bril, 2020"

> "Self-driving cars more likely to drive into black people, study claims - By Anthony Cuthbertson, 2019"

> "healthcare algorithms are biased, and the results can be deadly - ByBen Dickson, 2020"

> "twitter taught microsoft's AI chatbot to be a racist asshole in less than a day - By James Vincent, 2016"

> "why it's totally unsurprising that amazon's recruitment ai was biased against women - By Isobel Asher Hamilton , 2018"

> "algorithms that run our lives are racist and sexist. - By Eliza Anyangwe, 2020"

> "google autocomplete still makes vile suggestions. - By ISSIE LAPOWSKY, 2018"


+

Bias lexicon references:

> [1] https://cpdonline.co.uk/knowledge-base/safeguarding/types-of-bias/

> [2] https://dl.acm.org/doi/fullHtml/10.1145/3465416.3483305

> [3] https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291520-6807%28199604%2933%3A2%3C143%3A%3AAID-PITS7%3E3.0.CO%3B2-S

> [4] https://en.wikipedia.org/


+

!! Not optimized for safari, please use another browser !!



AUTHOR
Marco Alpoim

PROJECT ADVISORS
Prof. Luísa Ribas
Prof. Pedro Ângelo

DISCIPLINES
Project II
Laboratory II

Master's in Communication Design

Faculty of Fine Arts, University of Lisbon

2021/2022

SPECIAL THANKS
Alexandra Guimarães
Beatriz Querido
Pedro Pereira
The hidden biases of machine learning
×
Bias Lexicon

WORLD

"Where it all begins and ends"

1.DATA GENERATION
DATA PREPARATION

Depending on the data modality and task, different types of preprocessing may be applied to the dataset before using it. Datasets are usually split into training data used during model development, and test data used during model evaluation.
DATA COLLECTION

The data generation process begins with data collection. This process involves defining a target population and sampling from it, as well as identifying and measuring features and labels.

DATA ANONIMISATION

The process of protecting private or sensitive information by erasing or encrypting identifiers that connect an individual to stored data.

DATA POISONING

(Capture Process: Sensor)
(Resolution Reduction - Machine Bias)

Involves tampering with machine learning training data to produce undesirable outcomes.




2.TRAINING DATASETS
> Population sampling is the process of taking a subset of subjects that is representative of the entire population. If done wrong, errors can lead to inaccurate and misleading data.

Population - Sample

> A Population refers to any collection of specified group of human beings or of non-human entities such as

Objects,

Educational
institutions,

Time
units,

Geographical
areas,

SOURCE SELECTION

"Data is the first source of value and intelligence."

OPERATOR

Training data is prepared by a human operator





GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.







DATA

(Information Reduction - Machine Bias)

Data are a set of values of qualitative or quantitative variables about one or more persons or objects.
> Defined as the process of selecting individuals, groups, or data for analysis. (sample)

SELECTION

OPERATOR

Training data is prepared by a human operator





GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.









DATABASE FORMAT

(Format Framing - Machine Bias)

The Machine Learning Database, or MLDB, is an open-source system aimed at tackling big data machine learning tasks.
> Measurement in machine learning refers to the choosing, collecting, or computing features and labels to use in a prediction problem.

LABELLING









OPERADOR

Training data is prepared by a human operator





GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.







METADATA/ LABELS

(Category Reduction - Machine Bias)

Metadata is descriptive data that labels a piece of information and provides meaning to what that piece of information is.

PREPROCESSING

>Data preprocessing involves transforming raw data to well-formed datasets so that data mining analytics can be applied. Raw data is often incomplete and has inconsistent formatting.

BENCHMARKS

TEST DATA

TRAIN-TEST SPLIT

>The train-test split is a technique for evaluating the performance of a machine learning algorithm. It can be used for classification or regression problems and can be used for any supervised learning algorithm.

TRAINING DATA

3. MODEL BUILDING
> A machine learning model is an expression of an algorithm that combs through mountains of data to find patterns or make predictions.
> Fueled by data, machine learning (ML) models are the mathematical engines of artificial intelligence.







MODEL

MODEL

MODEL

MODEL

MODEL

MODEL

> After the final model is chosen, the performance of the model on the test data is reported. The test data is not used before this step, to ensure that the model’s performance is a true representation of how it performs on unseen data.
> Aside from the test data, other available datasets — also called benchmark datasets — may be used to demonstrate model robustness or to enable comparison to other existing methods.






EVALUATION

"Where it all begins and ends"






OPERATOR

Training data is prepared by a human operator





GHOST WORKER

The term ‘ghost work’ is for the invisible labour that makes AI appear artificially autonomous.













TESTING ENVIRONMENT

Model Evaluation

Model testing is referred to as the process where the performance of a fully trained model is evaluated on a testing set.













Algorithm Architecture

Topology

Model fitting is a measure of how well a machine learning model generalizes to similar data to that on which it was trained.





STATISTICAL INTERFERENCE

Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. More input features often make a predictive modeling task more challenging to model.




Curve fitting is the process of constructing a curve that has the best fit to a series of data points. When we predict values that fall within the range of data points taken it is called interpolation. When we predict values for points outside the range of data taken it is called extrapolation.
> Learning models are any framework that defines the mechanism of learning. A learning model is any form of learning new skills or information. [These models have sub-categories that further divide into various learning styles].




4. IMPLEMENTATION

MODEL OUTPUT

A machine learning model is a file that has been trained to recognize certain types of patterns. You train a model over a set of data, providing it an algorithm that it can use to reason over and learn from those data.

BLACK BOX HORIZON

multidimensional vector space

“Black box” is shorthand for models that are sufficiently complex that they are not straightforwardly interpretable to.




NATURALIZATION OF BIAS




Pattern Recognition
CLASSIFICATION


Present World
Pattern Generation
PREDICTION


Future World

POSTPROCESS, INTEGRATE INTO SYSTEM
AND HUMAN NTERPRETATION
> Once a model has been trained, there are various post-processing steps that may needed.
> If the output of a model performing binary classification is a probability, but the desired output to display to users is a categorical answer, there remains a choice of what threshold(s) to use to round the probability to a hard classification.

REAL WORLD IMPLICATIONS