<aside>

WORKING DRAFT FOR PUBLIC FEEDBACK For more context on this draft, please see here. Please submit feedback here.

</aside>


Previous: Use Case: Study of (New) Market Needs

Table of Contents

Next: Use Case: Expert Feedback and Guidance


Overview

Given growing concerns around algorithmic bias, validity issues, and reliability issues across stakeholder groups, there is growing attention on ensuring that datasets upon which AI systems are built and tested are robust and high quality. Such datasets require both that data is collected from a broad and diverse set of sources and populations (e.g., for facial recognition systems, datasets have the full diversity of skin tones and facial features) and datasets are annotated reflecting different context-specific value systems and understandings of the world(e.g., for content moderation systems, words are labeled as “obscene” in ways that reflect region-specific notions of “obscenity”). Stakeholders may be engaged to both serve as data providers (e.g., people knowingly volunteer their personal and user data to help diversify datasets) and to improve data annotation or enrichment (e.g., people from many different contexts are employed to provide annotation services).

Example

A tech company is refining its facial recognition feature (as part of its aim to improve biometric access to devices) and wants to ensure that it works accurately for all types of faces. The development team found the training and testing datasets require updating and want to collect 50 images from 5,000 people of diverse skin tone and facial characteristics to supplement their existing dataset.

Practices

Untitled


Previous: Use Case: Study of (New) Market Needs

Table of Contents

Next: Use Case: Expert Feedback and Guidance


© 2024 Partnership on AI | All Rights Reserved