AI Summary Hub

Bias in AI

Sources and mitigation of bias in ML systems.

Definition

Bias in AI refers to systematic errors or unfair outcomes that arise from AI systems and disproportionately affect certain groups of people — typically along lines of race, gender, age, socioeconomic status, or other protected attributes. These biases can produce tangible harms: loan applications unjustly denied, resumes filtered out based on name, medical diagnoses missed for underrepresented populations, or facial recognition failing for darker skin tones. Understanding bias requires looking at the full pipeline, from data collection and labeling through model training to deployment and feedback loops.

Bias enters systems at multiple stages. Historical bias in training data encodes past discrimination — if a company historically hired fewer women in engineering, a model trained on that data will replicate the pattern. Measurement bias occurs when the proxies used in data collection are unequally accurate across groups; for example, using zip code as a proxy for creditworthiness encodes residential segregation. Label bias occurs when human annotators bring their own assumptions to tasks like toxicity detection or sentiment labeling. Representation bias arises when certain groups are simply underrepresented in training data, leading to worse performance for those groups.

Bias sits at the intersection of AI ethics and AI safety. Evaluation metrics provide the quantitative tools to detect and measure bias, while explainable AI can help identify where in a model's reasoning bias manifests. In regulated domains — hiring, lending, healthcare, criminal justice — bias detection and mitigation are legal requirements, not optional best practices. Bias audits should be conducted before deployment and monitored continuously in production.

How it works

Sources of bias

Bias enters pipelines through skewed or unrepresentative training data, proxy variables that correlate with protected attributes, biased human labels, and feedback loops where model outputs influence future data collection. Each source requires different detection and mitigation strategies.

Detection with fairness metrics

Mitigation strategies

Mitigation strategies fall into three categories. Pre-processing methods modify the training data: reweighting samples, resampling underrepresented groups, or collecting additional representative data. In-processing methods modify the training objective: adding fairness constraints, using adversarial debiasing where an auxiliary classifier tries to predict protected attributes from the model's representations. Post-processing methods adjust model outputs: setting group-specific decision thresholds to equalize rates across groups without retraining.

When to use / When NOT to use

Use whenAvoid when
Model decisions affect people in regulated or sensitive domains (hiring, lending, healthcare)The model's output has no impact on people or their opportunities
Deploying at scale where small error-rate disparities cause large aggregate harmYou have no access to demographic data needed for stratified evaluation
Auditing a model before or after deploymentThe ground truth labels are themselves too biased to serve as fair references
Required by regulation to demonstrate non-discriminationAll predictions are reviewed by experts who can override incorrect decisions

Comparisons

Fairness metricWhat it measuresWhen to use
Demographic parityEqual positive prediction rates across groupsWhen equal representation is the goal
Equalized oddsEqual TPR and FPR across groupsWhen consequences of errors should be equal
CalibrationPredicted probabilities match actual rates per groupWhen score values are used for decisions
Individual fairnessSimilar individuals get similar predictionsWhen case-by-case consistency is required

Pros and cons

ProsCons
Reduces discriminatory harm to affected groupsFairness metrics are mathematically incompatible — satisfying one often violates another
Builds legal and ethical compliance evidenceMitigation can reduce overall accuracy
Enables proactive detection before deploymentRequires demographic data that may be unavailable or sensitive
Supports transparent reporting and accountabilityFeedback loop bias is hard to detect without longitudinal monitoring

Code examples

Computing demographic parity with scikit-learn (Python)

import numpy as np
from sklearn.metrics import confusion_matrix

def demographic_parity_difference(y_pred: np.ndarray, groups: np.ndarray) -> float:
    """
    Compute demographic parity difference between two groups.
    Returns the difference in positive prediction rates.
    A value of 0 indicates perfect parity.
    """
    unique_groups = np.unique(groups)
    rates = {}
    for g in unique_groups:
        mask = groups == g
        rates[g] = y_pred[mask].mean()

    values = list(rates.values())
    diff = max(values) - min(values)
    print(f"Positive prediction rates by group: {rates}")
    print(f"Demographic parity difference: {diff:.4f}")
    return diff

# Example usage
np.random.seed(42)
y_pred = np.random.randint(0, 2, size=1000)
groups = np.random.choice(["A", "B"], size=1000, p=[0.6, 0.4])

demographic_parity_difference(y_pred, groups)

Practical resources

See also