Why We Might Be Wrong

Limitations of our analysis

We discuss some of the limitations of our ratings in this section and, where possible, the steps we take to address each one. We encourage you to reach out with recommendations about how to improve our work. We can be reached at info@impactmatters.org.

Complex outcomes

We draw from a standard set of outcome metrics for nonprofit programs of the same, narrowly defined type. We choose an outcome that best suits the common mission of those programs and has data widely available. For example, a typical outcome for a job training program is employment status. However, there can be substantial diversity in the nature of employment. Some jobs pay higher wages than others, provide better job security and offer more generous health insurance. And some job duties just provide greater satisfaction to workers. As a result, our impact estimates may capture the higher costs associated with “better” jobs but not the corresponding benefit to beneficiaries, such as better health. However, lacking evidence from the large majority of nonprofits that they are generating superior outcomes compared to others, we are generally unable to issue any commentary or make ratings adjustments on the basis of superior outcomes. This may change as more and better data become available in the sector, including with the help of ImpactMatters’ tools and tips for nonprofits.

Multiple important outcomes

We estimate the impact of a nonprofit’s program on a single outcome that best aligns with its stated mission. However, a nonprofit’s program may be achieving other important outcomes that are not captured by our impact estimate. Consider a job training and placement program for formerly incarcerated people that both increases employment for beneficiaries as well as reduces their rates of recidivism. It would be impossible to draw a line between the parts of the program that affect employment alone and the parts of the program that affect recidivism alone. More likely, the program as a whole affects both employment and recidivism simultaneously. Therefore, measuring the impact of the program on a single outcome (say, employment) would count the costs but not the corresponding causal change in the second outcome (recidivism). As a result, this program might appear less cost-effective than peer programs that conform more closely to a single-outcome framework.

We adjust upward the star ratings of nonprofits that are evidently achieving other important outcomes outside the norm. What qualifies as important and the magnitude of adjustment vary with each program type (for details, see our program-specific methodologies here).

Multiple important outcomes are not the same as downstream outcomes, which are the second-order effects of first-order outcomes. For example, increased employment (first-order outcome) might allow a beneficiary to afford better health care and live a longer, healthier life (second-order outcome). We do not adjust star ratings for downstream outcomes because if the outcome is not oversimplified (see above) they are, by definition, already encapsulated in the first-order outcome in our impact estimate.

Cost of reaching special populations

Often the people most in need are the most difficult — and costly — to reach. For example, a nonprofit may incur unusually high costs to search for children in forced labor or women facing domestic abuse because those individuals may not have a safe way to make themselves known to the nonprofit. Or, a nonprofit might have to perform many health screenings to identify just one beneficiary who suffers from a rare but debilitating disease. Many nonprofits and donors are willing to spend more to reach special populations. This may lead to a lower impact estimate relative to others.

In many cases, special populations are especially neglected by regular social safety nets and other available services. They may also face systemic barriers like discrimination that affect their ability to achieve outcomes with or without services. As a result, they face a different counterfactual scenario than other beneficiaries: In the absence of the nonprofit’s program, a person from a special population may be less likely to receive services than her counterpart from the general population, and less likely to achieve outcomes with or without services. All else equal, a nonprofit serving the person from a special population would therefore achieve greater outcomes net of counterfactual. We try our best to take account of differing counterfactuals in our impact estimates. To illustrate, consider two nonprofits that provide identical programs but serve different populations:

Nonprofit A provides college and career counseling to low-income students.

  • The college graduation rate among low-income students in Nonprofit A’s city is 60 percent. On that basis, we assume that in the absence of Nonprofit A, 60 percent of its would-be beneficiaries would have graduated from college.

  • Nonprofit A spends $1,000 to serve a single beneficiary.

  • A crude estimate of its impact yields $2,500 per additional graduate (calculated as $1,000 / (1 - 0.6)).

Nonprofit B provides college and career counseling to low-income students who grew up in foster care.

  • The average college graduation rate among low-income students who grew up in foster care is much lower. In the absence of Nonprofit B, about 30 percent of its would-be beneficiaries would have graduated from college.

  • Nonprofit B spends $2,000 to serve a single beneficiary.

  • A crude estimate of its impact yields about $3,000 per additional graduate (calculated as $2,000 / (1 - 0.3)).

Adjusting the counterfactual for special populations improves our estimates. However, even after adjusting, some donors may value the population served more than other populations. We cannot capture this well in our estimates. Instead, we provide commentary that acknowledges that a nonprofit is evidently serving a particularly difficult-to-reach population.

Cost of working in special environments

Certain environments are particularly costly to work in. The most common example: conflict zones, where nonprofits likely need to spend more to protect their field staff and may need to halt activities during active conflict. Other examples include particularly challenging topographical and climate conditions, and major natural disasters.

For nonprofits working in conflict zones, we adjust costs based on the U.S. Department of State’s Danger Pay policy. Danger Pay is additional compensation to U.S. government employees working in designated Danger Pay posts “where civil insurrection, terrorism, or war conditions threaten physical harm or imminent danger.”1 Danger Pay is presented as a percentage of employees’ basic compensation. To adjust for Danger Pay, we first check whether a nonprofit’s program is located near a designated Danger Pay post. If it is, we search for the Danger Pay percentage published by the U.S. Department of State for that post. We then subtract Danger Pay from the nonprofit’s personnel expenses for program-related staff (by multiplying (1 - Danger Pay %) by personnel expenses for program-related staff).2

Other features of a nonprofit’s environment are less straightforward to adjust for. Consider a nonprofit that aims to increase water access in a rural community by drilling wells. A deeper water table naturally requires deeper, more expensive drilling in order to reach water. In this case, it is difficult to ascertain just how much more the nonprofit had to spend as a result of working in a more challenging environment. We may not be able to adjust for all factors.

Cost of working in special locations

The same goods and services may be priced and taxed differently in different locations, making it more or less costly to run an identical program depending on where the nonprofit chooses to work. Many donors have special ties to certain locations and are willing to overlook cost differences in order to, for instance, support beneficiaries in their hometown. To account for these donor preferences, we make the adjustments explained below.

Where appropriate, we adjust our cost-effectiveness thresholds by relevant price indices from public sources. For instance, we calculate county-by-county cost-effectiveness thresholds for food banks by adjusting the national average meal cost (calculated from Census Population Survey data) by county-specific cost-of-food indices (from Nielsen and Feeding America). This means comparing the impact estimate of a food bank in Clark County, Indiana, to a different threshold than that used for a food bank in New York City.

Specific counterfactuals

To understand the impact of a program, we must ask the counterfactual question: What would have happened to beneficiaries if the program had not, counter to fact, been there to serve them? We then measure the difference between what actually happened and what we think would have happened if the program had not been around. That difference in outcomes can be attributed to the program. Just looking at what actually happened is not sufficient for understanding impact because many factors besides the program could affect how beneficiaries fare over time. For example, an economic boom affects both beneficiaries of a job training program and non-beneficiaries. An observed increase in employment among beneficiaries is insufficient evidence to conclude that the program — and not the economic boom or other factors — caused an increase in employment.

Most communication about impact today inadvertently ignores the counterfactual. But note that ignoring the counterfactual, in effect, assumes the counterfactual to be zero. In other words, it assumes that in the absence of a program, the outcomes of beneficiaries would not have changed at all. This may well be the case for some programs in certain settings. But for many others, it would be erroneous to assume, for instance, that without a program, no children would have graduated from high school or that without a program, the health of a population would have stagnated.

Ideally, we would have counterfactuals at the nonprofit level,3 as each group of beneficiaries served by a nonprofit faces a different counterfactual scenario than beneficiaries served by another nonprofit. But this would require that each nonprofit have conducted an impact evaluation, a study of outcomes that takes account of the counterfactual, such as by constructing a comparison group of people similar to beneficiaries. The vast majority of nonprofits have not conducted impact evaluations, so we need to construct our own using public data sources and the research literature. This often means applying uniform counterfactuals across, for instance, all nonprofits implementing the same program in the same county.

The “level” at which we apply different counterfactuals depends on what data are available and what level of variation we think is consequential. Take water purification programs as an example. The W.H.O. and UNICEF have made available a large dataset of rates of clean water access around the world. Within a country, there tends to be a large disparity in clean water access between rural and urban areas. We therefore search for information on whether a given nonprofit runs its water purification program in a rural or urban part of the country, then apply the appropriate counterfactual based on the public dataset.

By applying uniform counterfactuals, we risk masking variation across nonprofits. Clean water access in a remote, mountainous village may be lower than in a village on the outskirts of a city, where other service providers also work and where the terrain is more traversable. But both villages would be classified as “rural,” thereby obscuring differences in the counterfactuals their respective beneficiaries face. However, lacking better data, we must use counterfactuals that are not specific to the nonprofit.

Data quality

Our estimates rely on data made public by nonprofits on their websites, annual reports, financial statements and Form 990s. There are, of course, ambiguities in the data and our interpretation of the data may not always match the nonprofit’s intention. For instance, we sometimes need to use visual clues from the layout of an annual report to determine whether the total number of meals reportedly distributed by a food bank includes both meals distributed through partner organizations (soup kitchens and food pantries) and meals distributed directly by the food bank to pick-up sites or feeding sites (schools and senior centers). Multiple ambiguities can arise for a single nonprofit, making it difficult to conclude the likely direction of bias over all (i.e., whether the impact estimate is likely higher or lower than what is accurate). Without knowing the direction of bias, we are unable to upgrade or downgrade the nonprofit’s rating to correct the bias. If ambiguities in the data are too large to be resolved with reasonable assumptions, we will not generate an impact estimate for that nonprofit.

Data quality also depends on the nonprofit’s measurement and reporting error (and that of its auditors). We believe the vast majority of nonprofits take great pride and care in measuring and reporting on their program accomplishments, and do not intentionally report inaccurate figures. In this spirit, we generally take their data at face value, then reach out to each nonprofit to review and comment on our work.

Representativeness of (analyzed) programs

We can only analyze the programs for which we have a program methodology, but many nonprofits operate multiple programs. We issue ratings for nonprofits if our analysis covers 15 percent or more of the nonprofit’s total program budget.

For a nonprofit that runs multiple programs, we always rate each program separately. We then average those ratings together, weighted by the relative size of each program (the weights are each program’s respective share of the nonprofit’s total program budget). Estimates and ratings for each individual program are available on the nonprofit’s page.

This approach means some nonprofits are rated on only some of their programs. The remaining programs, which we could not analyze, could be more or less cost-effective than the programs we analyzed. Although our approach is imperfect, it provides more guidance on nonprofit impact than is currently available.

Explanatory factors

Below, we describe three factors that can explain real differences in cost-effectiveness. Unlike the factors described above, they reflect operational choices that the nonprofit has made rather than extenuating circumstances that call out for correction. Nevertheless, we discuss them to give more context when reviewing our impact estimates.

Scale of operation

For certain program types, scale of operation can be a large driver of variation in cost-effectiveness. In cataract surgery, for example, our (limited) data suggest cost-effectiveness may fall when a nonprofit grows from a small, volunteer-run operation to a mid-sized operation that employs staff and maintains offices. Cost-effectiveness may subsequently climb again when the nonprofit starts benefiting from economies of scale.

Stage of development

The stage of a program’s development can also affect a program’s costs. During its first few years, a program may be considered in the “design” stage. The nonprofit’s cost-effectiveness may be low at this stage as it works out kinks in the program. As it grows and formalizes, cost-effectiveness may rise.

Donated labor and goods

Our impact estimates favor nonprofits that have been able to capture “free” resources — resources that have zero or nearly zero opportunity cost in the eyes of a socially-minded donor. Here, the example of food distribution programs is illustrative. We generally assign a cost of $0 to donated food on the assumption that it would otherwise have gone to waste. [However, if there is reason to believe the food still has market value (i.e., could still be sold at a non-zero price), we will count the fair market value of that food.] By not counting in-kind donations, we also reduce the incentive for nonprofits to overstate these donations. Many food distribution programs also benefit from unskilled labor provided for free by volunteers. We generally assign a cost of $0 to unskilled volunteer time because we believe volunteers benefit in important non-monetary ways — fulfillment, for instance, in supporting a cause they are passionate about. Nonprofits that have been able to make smart use of donated labor and goods are likely more cost-effective than otherwise identical nonprofits that have not done so.



Frequently Asked Questions: Danger Pay                                                                                                                                                                                                


In audited financial statements, this is often listed as “Personnel expense” or a similar line item under the Program Services section of the Statement of Functional Expenses.                                                                                                                                                                                                


In reality, each beneficiary faces a different counterfactual: What alternative sources of services are available to her? What is happening in the policy and economic environment that might affect her outcomes? What personal characteristics and social networks does she have that would make her more or less likely to succeed with or without services? Any counterfactual assumption that we apply uniformly across groups of beneficiaries therefore requires assumption.