Mission and Outcomes

Impact measures include two separate estimates: program outcomes and cost. An important challenge is carefully matching costs incurred to the outcome under review. Matching costs to outcomes is challenging whenever the nonprofit intervention generates multiple outcomes.

Impact analyses are retrospective. They are based on data from the past.


A mission describes the change that a nonprofit seeks to achieve and the population or setting in which it seeks to achieve it. The choice of mission serves as the basis for the impact analysis.


Outcome metrics are the direct measures of the success of a program in achieving its mission. Consider a program to reduce childhood mortality that provides free vaccinations to children in a rural area with poor medical infrastructure. The program might track the number of children treated and the incidence of preventable childhood illnesses in the area it serves. Each of these measures provides information on the results of program activities.

Impact analysis distinguishes between primary and intermediate outcomes.

Primary outcomes are the ultimate target of a nonprofit program. A program’s success is indicated by desired change to primary outcomes. In the above example, the free vaccination program attempts to reduce childhood mortality; therefore, its primary outcome is “a reduction in childhood mortality.”

An intermediate outcome captures a step before the primary outcome that is valuable in and of itself. In the example above, reducing the incidence of disease is an intermediate outcome. Philanthropic interventions may generate multiple intermediate outcomes, each working separately to boost primary outcomes. We distinguish intermediate outcomes from outputs (e.g., number of vaccines delivered) and process changes (e.g., take up of vaccines) although naming conventions here vary.

Outputs track delivery of a treatment (for example, 10 trainees participated in job training). For some program types, outputs are a close proxy for intermediate outcomes — and so merely counting the number of individuals receiving a program, without reference to the primary outcomes, may be an acceptable compromise. For example, we analyze the efficiency with which a soup kitchen delivers meals because there is evidence that meals reduce hunger and save beneficiaries money they would otherwise have to spend on food. In the absence of better data, meals serve as a reasonable measure of success. Vaccines provided, mosquito nets distributed and wells dug are other instances where outputs are clearly linked to improved health. We can often use research literature to model the relationship between outputs and primary outcomes.

Other times, outputs are clearly not sufficient to establish impact. For example, 700 children taught with new curriculum. Most observers would not consider switching to a new curriculum “good” in and of itself unless doing so engendered some desired change. By contrast, 700 additional children graduating from high school would typically be considered outcomes. [In this case, implementing the new curriculum might have been a step toward achieving an outcome but not valuable in and of itself.]


Outcomes must be carefully defined and interpretable. A good outcome metric follows five criteria:

1. Quantities must be clear and explicit
“$1,000 reduces mortality for children under 5” is not an acceptable impact statement. “$1,000 saves the life of an additional child under 5” is. The latter very clearly describes the magnitude of the effect (one child’s life) while the former leaves the reader unable to determine the nature of the impact.

2. Quantities must not be described using statistical terms or jargon
While “$200 improves literacy by 0.2 standard deviations for a student” is explicit and would be quite acceptable to report in an academic journal, it is not accessible to a larger audience. Such measures can be converted into terms someone without specialized training can understand: “$200 improves literacy by one primary school grade level for a student.”

3. Outcome metrics should typically be presentable on a per person or per animal basis, with some exceptions
Often the best way to make an outcome understandable is to convert it into per person terms. For example, the impact of a medical intervention on deaths of individuals under age 5 expressed as a percentage of live births can be converted into an estimate of the intervention’s impact on the absolute number of lives saved. If an outcome is discrete (e.g., “graduations”), this sort of conversion avoids reporting something like “$10,000 generates an average of 3.458 high school graduations.” If an outcome is continuous or roughly continuous, it is acceptable to report the averaged effect. For example, “$400 increases earnings by $750 on average” is acceptable because it is intuitive for most readers. Even better might be “$1 increases earnings by $2 on average.”

4. Outcome metrics must be singular
As pointed out, our impact measure handles one outcome at a time. But there are certain outcomes that are composed of multiple dimensions. Take “psychosocial wellbeing.” It covers: anxiety, depression, absence of personality disorders, existence of robust social support network, etc. We can calculate an average impact measure over the dimensions, but any such average masks the variation across dimensions. The intervention might have modest effects on virtually all the dimensions or extremely low and high measures across dimensions — in effect, cancelling each other out. This issue generally comes up when nonprofits report outcomes on multidimensional scales.

5. Outcomes must be measured using a reliable format
While some outcomes are easily quantified with simple measures (dollars earned, degrees obtained, diseases averted), others are not. The latter typically includes mental states (e.g., “is depressed”) and statuses (“kindergarten ready,” “financially secure”). To quantify these, researchers and practitioners have developed scales. These range greatly in sophistication and extent of usage. Because scales can be confusing and leave readers wondering “so what?”, we only accept outcomes presented in scales if:

a. The scale is widely used or known in the field. For example, the Kessler Psychological Distress Scale (often called the K6 or K10) is an accepted and widely used method of screening for depression and anxiety in patients. Impacts on average score or on number of individuals in distress as measured by the scale would be readily interpretable by anyone familiar with mental health work. We would not report impacts on depression as measured by a proprietary scale — that is, a scale not widely known or used outside the nonprofit (even if the scale was developed with admirable statistical sophistication).

b. The scale has to be validated: It produces consistent results with retesting, it can and has been used to improve prediction, and techniques like factor or principal component analysis have shown it not to include any extraneous questions or additional, unwanted dimensions. In general, if a scale is widely used or known, it will have passed the test of validity.

c. The scale measure used expresses a single concept (i.e., “dimension”) rather than a multiple concepts averaged together. Some scales (e.g., K6) are single-dimensional by nature. Others intentionally capture multiple dimensions (e.g., kindergarten readiness comprises cognitive, emotional and social readiness). The latter type of scale can be used, but we will report on a single concept or dimension only, not on an average score across all dimensions.

If a scale provides a score on a dimension you have never heard of or that does not appear to capture an outcome with some intuitive meaning, it has likely failed one or more of the above criteria. A widely used, validated scale will give meaning to the dimensions it quantifies.