Measuring Well-being

Well-being

Central to our guiding principles is the use of empathic AI to measure and improve human well-being. We see this as an urgent need that makes the development of empathic AI worthwhile. As AI gets smarter, it becomes increasingly essential to ensure that it learns to accomplish its objectives using methods aligned with human well-being. The principal goal of these guidelines is to ensure that empathic AI is used to improve human emotional well-being (and not the contrary).

Guiding Principles

Given that well-being is key to the ethical deployment of empathic AI, it is critical that we define human well-being and identify ways it can be measured. First, a definition: well-being is the experience of living a good life. It is the experience of living a life on balance, characterized by comfort, health, happiness, fulfillment, and a desired level of variety or richness in emotional experience. This brings to fore three key tenets of measuring and optimizing for well-being:

No single measure of well-being is perfect, but many are adequate.

No single measure of well-being reigns supreme in the eyes of scientists, philosophers, or poets. Yet it is not impossible to measure well-being. There are many valid metrics to choose from, just as there are many ways to be happy. Developers of empathic AI should strive to optimize for the most contextually appropriate measure. They should also ensure that improvements in one measurement or facet of well-being do not come at the expense of decreases in others. It is therefore essential to use multiple complementary measures, particularly in high-risk applications.

When optimizing for well-being, preserve emotional richness.

Well-being is not just about feeling "positive." We are not always better off smiling. We like feeling different emotions in different contexts—horror felt in response to a horror movie is, to many, a sought-after experience. We also enjoy experiencing a variety of positive emotions, from awe to amusement to love. But this should not discourage developers from training AI to increase positive emotions on balance. It should just be trained to do so without reducing the overall variety of emotions we experience and express.

Algorithms should be tested for their causal effects on well-being.

Developers should take observational measures of well-being into consideration to ensure algorithms are designed in a manner likely to improve well-being. But they should be aware that observational measures of well-being do not allow for causal inference. Before deploying an algorithm, standard experimental tests—such as A/B tests—should be used to evaluate its causal effects on well-being. Such tests should adhere to applicable standards of research ethics.

With these tenets in mind, here we provide a summary of widely accepted measures of human well-being. Given that well-being is subjective, it is most directly measured using self-report instruments. But there are also many objective proxies of well-being that can be measured reliably and unobtrusively. We summarize a range of widely accepted self-report measures and objective proxies in the tables below.

Self-Report Measures and Objective Proxies

Self-Report Measures

Standards of Measurement

The following are recommended standards for ongoing measurement of well-being within an application or condition of an A/B test, by risk level and number of users. We recommend discontinuing any application of empathic AI whose benefits do not substantially outweigh its costs for the well-being of users and other affected parties.

Low Risk

Low-risk area AND used for <30 min. per day on average.

Medium Risk

Medium risk area OR used for 30 min. to 1 hour per day on average.

High Risk

High-risk area OR used for >1 hour per day on average.

None (informed consent is sufficient).

<100 daily users

Track at least one measure of well-being across a minimum of 20 randomly sampled users on at least a weekly basis.

100-1K daily users

Track at least one Tier 1 measure of well-being across a minimum of 50 randomly sampled users on at least a weekly basis.

1k-10K daily users

Track at least two measures of well-being, including a Tier 1 measure, across a minimum of 200 randomly sampled users on at least a weekly basis.

10k-100k daily users

Track at least two measures of well-being, including a Tier 1 measure, across a minimum of 400 randomly sampled users on a daily basis.

100K-1M daily users

Track at least two Tier 1 measures of well-being across a minimum of 2,000 randomly sampled users on a daily basis. Provide public access to anonymized results.

1M+ daily users

Track at least one measure of well-being in every user on at least a weekly basis.

Track at least one Tier 1 measure of well-being across a minimum of 40 randomly sampled users on at least a weekly basis.

Track at least two measures of well-being, including a Tier 1 measure, across a minimum of 100 randomly sampled users on a daily basis.

Track at least three measures of average well-being, including a Tier 1 measure, across a minimum of 300 randomly sampled users on a daily basis.

Track at least three measures of average well-being, including two Tier 1 measures, across a minimum of 1,000 randomly sampled users on a daily basis. Provide public access to anonymized results.

Track at least three measures of average well-being, including two Tier 1 measures, across a minimum of 5,000 randomly sampled users on a daily basis. Provide public access to anonymized results.

Track at least one Tier 1 measure of well-being in every user on at least a weekly basis.

Track at least two measures of well-being, including a Tier 1 measure, in every user on at least a weekly basis.

Track at least three measures of well-being, including a Tier 1 measure, across a minimum of 500 randomly sampled users on a daily basis.

Track at least four measures of well-being, including two Tier 1 measures, across a minimum of 1,000 randomly sampled users on a daily basis. Provide public access to anonymized results.

Track at least four measures of well-being, including two Tier 1 measures and another Tier 1 or 2 measure, across a minimum of 4,000 randomly sampled users on a daily basis. Provide public access to anonymized results.

Track at least four measures of well-being, including two Tier 1 measures and another Tier 1 or 2 measure, across a minimum of 10,000 randomly sampled users on a daily basis. Provide public access to anonymized results.

Guiding Principles

View Principles

Conditionally Supported Use Cases

View Cases

Drag