Using Benchmarks to Analyze Cost-Effectiveness
We analyze the impact of nonprofits — that is, the good a nonprofit achieves relative to the cost to achieve it. In this post, we describe how we then make sense of nonprofit impact by making comparisons to a benchmark. We also discuss how we select those benchmarks, and some of the critiques of our benchmarks and benchmarking in general. To see a specific example of benchmarking, we recommend you look at our work on food distribution programs, which are assessed against a relatively straightforward benchmark.
Why benchmark? Impact calculations alone don’t tell us whether the nonprofit is making good use of resources. If a nonprofit is seeking to achieve some mission (e.g., reduce hunger), we should be fairly agnostic about how it does it. Two programs — e.g., food vouchers and a soup kitchen — may both reduce hunger, but at different costs. We should choose the program that delivers more at a lower cost.
When we directly compare two programs, analyzing which is better is fairly straightforward (unless there are big differences in evidence quality or we suspect there are secondary effects we are not capturing). In a more generalized way, how do we know if a program is “good” or “not good”? Here, we turn to benchmarks. A benchmark is just a point we choose above which we deem a program is good. The choice of a benchmark is usually rooted in real-world data — such as the cost of a meal. However, benchmarks are ultimately value judgments, and reasonable people can disagree over them.
When analyzing nonprofits, we generally believe benchmarks should be set based on the outcome. We don’t attempt to set a universal benchmark (so many utils per dollar, say). This isn’t because it’s impossible — but because, as a nonprofit rater, we have decided it isn’t our place to make judgments of the relative value of different causes.1 [This isn’t to say no one should be making those judgments — we admire the work of groups such as GiveWell.]
Below, we describe two strategies for setting benchmarks. First, the “market alternative” strategy: if a nonprofit provides its beneficiaries with a good that fulfills a basic need, the benchmark is set to the price of that good on the market. Second, benchmarks can be set using established norms such as “internal rate of return” for income-generating programs and “cost per disability-adjusted life year averted” for health programs.
The market alternatives strategy is appropriate for goods that fulfill basic human needs (e.g., food, water, shelter). People need these goods to survive, whether they are obtained through charity or market transactions. Consider the following thought exercise: A nonprofit could spend programmatic resources to provide a meal to a person in need or, alternatively, that beneficiary could, using the nonprofit’s resources, purchase the equivalent meal on the market. Using the market alternatives framework, we conclude the nonprofit’s program is cost-effective if its cost to provide a meal is about as much or lower than the price a beneficiary would have to pay to obtain a meal on the market.2
Prices vary by geography. If we did not adjust for geography, we would often find that the least cost-effective programs are in developed countries, and there within major metropolitan cities. There is logic to directing your dollar to where it will do the most good — that said, we see our role as meeting the donor’s demand for information, not shaping opinion, and it’s clear geography matters for donors. Few are giving to a food bank outside their hometown.
To account for this, we use local prices wherever possible. For meals, we construct a benchmark for each nonprofit that is based on the average price of a meal in the nonprofit’s service area.3
We believe the best approach to setting benchmarks is to follow established principles in the sector when possible. Here are two examples: internal rate of return and disability-adjusted life year.
Internal Rate of Return (I.R.R.)
Internal rate of return (I.R.R.) is one of the most frequently used methods for judging the worthiness of an investment of resources designed to produce a financial return. Like its cousin, return on investment (R.O.I.)4, I.R.R. is a ratio of expected returns to costs. It answers the question: does this program generate enough value to cover costs?
I.R.R. can be calculated for any income-boosting program, whether it aims to boost income immediately via a transfer of resources or in the future by, for instance, raising beneficiaries’ earning potential, or both. Take, for instance, scholarships for students to pursue postsecondary education. We measure the impact of a scholarship program by how much it boosts the future income of scholarship recipients (for more details, read our methodology for scholarship programs).
I.R.R. does not itself point to the right benchmarks, beyond showing the break-even point. I.R.R. does enable comparison across a range of different interventions. We apply benchmarks such that a program receives 5 stars if its I.R.R. is 50 percent (i.e., it boosts income by 150 percent as much as total program cost). To receive 4 stars, a program must boost income by 85 percent as much as total program cost. Functionally, these benchmarks mean that a nonprofit must be generating close to as much in future income than it expends in costs to run its scholarship program in order to earn 4 stars and substantially more in future income to earn 5 stars. So, if a nonprofit spends $15,000 on a scholarship that increases the beneficiary’s future earnings by $30,000, that’s a 5-star nonprofit.
Cost per disability-adjusted life year averted
One of the most common methods health economists use to assess cost-effectiveness is cost per disability-adjusted life year (DALY) averted. A DALY is a year of full health lost due to disability, poor health or premature death. In a single metric, the DALY captures two dimensions of life affected by adverse health: quality of life and life expectancy. By estimating DALYs associated with various health conditions (ranging in severity from near-sightedness to schizophrenia), health experts create a standardized language with which to compare disease burdens across conditions — and, in turn, the disease burdens eased by interventions for different health conditions. We follow this convention for nonprofit programs that aim to improve health outcomes.
To determine whether a health program is cost-effective, we compare its cost to avert one DALY to industry standard benchmarks set by the World Health Organization (W.H.O.)5. Following the W.H.O., benchmarks are based on the gross domestic product (G.D.P.) per capita of the country or countries in which the program operates. If a program can avert one DALY for less than three times the G.D.P. per capita, it is considered cost-effective (awarded 4 stars); if it averts one DALY for less than the G.D.P. per capita, it is considered highly cost-effective (awarded 5 stars). Tacitly, the assumption underlying the W.H.O. guidelines is that the willingness to pay for better health is linked to income.6
For example, to evaluate nonprofit programs that prevent or reverse blindness by surgically removing cataracts, we first calculate how many DALYs are averted by a successful cataract surgery. To do this, we refer to disability weights reported by the Global Burden of Disease. Based on this data, we estimate that a successful cataract surgery improves the quality of each year of a beneficiary’s life by about 15.6 percent.7 Next, we multiply 15.6 percent by the number of remaining years an average cataract patient will live (drawn from W.H.O. life tables) in order to calculate the total number of DALYs averted by a successful cataract surgery. For example, a nonprofit that operates in Mexico averts 2.878 DALYs per cataract surgery (2.878 = 0.156 reduction in disability weight * 18.5 remaining years of life for the average 65 year old cataract surgery patient in Mexico). We then calculate each nonprofit’s cost to avert one DALY and compare it to the benchmarks described above. For example, Mexico's G.D.P. per capita is $8,903. The 5-star benchmark is therefore $8,903, and the 4-star benchmark is $26,708. Assuming the nonprofit averts a DALY at a cost of $2,000, it passes the $8,903 benchmark and is awarded 5 stars.
1 Unlike the conventional benchmarking practice of the for-profit world, ImpactMatters has chosen not to use “competitive” benchmarks. That is, benchmarks are not based on the average performance of peer nonprofits. Although not without value, competitive benchmarks set up a portion of the sector for failure, regardless of whether they are doing a “good job”. With value benchmarks, nonprofits can do well (or poorly) based on their merits and given viable alternatives in their respective service areas.
2 We set three bands: not cost-effective, cost-effective and highly cost-effective. Not cost-effective is substantially above the market price (typically 125% or more). Cost-effective is around the market price (typically 75-125%) and highly cost-effective is substantially below the market price (typically <75%).
3 County-specific meal prices are drawn from Feeding America’s 2017 Map the Meal Gap dataset. This dataset estimates meal prices by adjusting the national average meal cost with county-specific cost-of-food indices. We then identify the counties within a nonprofit’s service area and apply the appropriate meal prices.
4 Different from R.O.I., however, I.R.R. bakes in what social scientists call “time discounting”: the idea that a $1 benefit received in year 10 is worth less than a $1 benefit received in year one, and a cost of $1 incurred in year 10 is worth a fraction as much as a cost of $1 incurred today (for more details, read our Methodology for Estimating Impact).
5 While the WHO-CHOICE framework is industry standard, it is not a perfect measure. Some criticism has indicated that using G.D.P. per capita as a measure of cost-effectiveness may not fully capture people’s valuation process. Other approaches have been suggested that consider limited resources in low- and middle- income countries.
6 However, there is no evidence that the relationship between income and willingness to pay for health care is linear, as is assumed in the WHO-CHOICE standards.
7 The Global Burden of Disease disability weight associated with blindness due to cataract is 0.187. This represents a loss of health of 18.7 percent, where a loss of 100 percent is death and a loss of 0 percent is full health. The disability weight of moderate vision impairment due to cataract is 0.031. We assume that the difference between the two, 0.156, represents the health loss averted by cataract surgery.