Alexander Cooley & Jack Snyder. Foreign Affairs. Volume 94, Issue 6. Nov/Dec 2015.
When the Berlin-based group Transparency International released its annual ranking of international corruption levels in December 2014, China’s Ministry of Foreign Affairs responded with a blistering statement. Chinese authorities were upset that their country had sunk from 80th to 100th place on the watchdog’s influential Corruption Perceptions Index, even though Beijing was pursuing a high-profile anticorruption campaign. “As a fairly influential international organization,” a Chinese Foreign Ministry spokesperson said, “Transparency International should seriously examine the objectiveness and impartiality of its Corruption Perceptions Index.”
This wasn’t the first time Beijing had dismissed the results of an international ranking. A year earlier, it had called for the elimination of the World Bank’s annual Ease of Doing Business Index, in which China had similarly underperformed, citing what Chinese officials described as flawed methodologies and assumptions.
China’s anger reveals just how powerful such ratings have become. Today’s ratings, produced by nongovernmental organizations and international agencies alike, score governments on nearly every aspect of a state: democracy, corruption, environmental degradation, friendliness to business, the likelihood of state collapse, the security of nuclear materials, and much more. The ratings’ customers are equally diverse. Government officials and activists refer to these indexes as measures of state performance, and international organizations and domestic bureaucracies use them as comparative benchmarks. Scholars and analysts use them to compare countries, and journalists routinely cite them as authoritative in their stories.
In theory, grading and comparing states should help the public hold governments accountable. In practice, however, ratings are fraught with unexamined assumptions and unintended consequences, limiting their value as tools for improved governance. They often oversimplify complex public policy issues, obscure policy tradeoffs, and invite manipulation by states eager to improve their reputations without undertaking real reform. Without a clearer understanding of these limitations, the ratings craze threatens to dumb down global governance practices and lower the quality of public debate rather than encourage better policy.
Mirror, Mirror, On the Wall
Since the early twentieth century, credit-rating agencies, such as Moody’s and Standard & Poor’s, have assigned scores to states based on evaluations of their sovereign debts. And some governance ratings, such as the measures of democracy produced by Freedom House and the Polity data series, first publicly appeared in the 1970s. But it wasn’t until recent decades that the ratings craze began. Indeed, over twothirds of the ratings currently in existence were founded after 2001. By our count, there are now some 95 such indexes that receive global media mention.
Why the frenzy? In part, it’s the natural extension of an emerging culture of performance evaluation and accountability. Consumers have long used ratings, scorecards, and benchmarks to make decisions, from which university to attend to which hotel to book, and now the same methodology is being applied to governance, as citizens are encouraged to become discerning policy “consumers.” All types of global organizations and liberal advocacy groups, meanwhile, have discovered that producing ratings can further their political and organizational goals. Many indexes are produced by groups that are advocates for the same causes they judge, and these reformers see the measures as powerful tools for shaming slackers and norm violators-and useful for standing out in the increasingly crowded field of global governance. The rise of ratings also owes to advances in computing and the availability of data. By compiling and processing open-source information, even small groups can generate indexes without conducting original research, such as labor-intensive surveys.
Ratings can indeed work as designed, pressuring states to improve governance. By comparing states with their rivals and peers, the measures exert social pressure for improved policy. The International Budget Partnership’s Open Budget Index, for instance, convenes regional conferences marking the publication of its biennial review of budget transparency to encourage finance ministers from neighboring countries to compare one another’s performance. And the European Council on Foreign Relations attracted urgent responses to its 2012 European Foreign Policy Scorecard when it added the labels “leaders” and “slackers” to its scores on states’ adherence to EU decision-making procedures and commitments: representatives of some EU states listed as “slackers,” for instance, called the council to dispute the results.
States care even more about ratings that have financial consequences. As the eurozone crisis deepened, for instance, national and EU officials lashed out at international credit-rating agencies for downgrading the sovereign credit rating of some EU states, including Greece and Portugal. Georgia and Rwanda have used their “most improved” awards on the World Bank’s Ease of Doing Business Index as centerpieces of campaigns to attract investment and to bolster domestic support for their governments.
Some ratings play a direct role in public and corporate policy. International banking and financial standards, such as the Basel Accords, for instance, have long used credit ratings to measure risk and capital reserves. U.S. federal and state regulations bar some pension funds from buying low-rated investments. Indexes that measure the fragility of states are now used by international organizations and state agencies to assess risks for humanitarian emergencies and to help allocate development assistance. Corporations have incorporated governance ratings into their due diligence procedures to avoid transacting with governments at a high risk for corruption or money laundering. The Millennium Challenge Corporation, a pioneering U.S. foreign aid program, relies on up to 20 third-party indicators, including indexes produced by Freedom House and the Heritage Foundation, to assess whether candidate states have reached “good governance” thresholds that unlock American assistance.
As ratings have grown in influence, states have begun to practice what might be called “ratings diplomacy,” whereby they dispatch delegations to learn how the ratings are created and directly lobby rating organizations for better scores. Although some of this lobbying is formal and institutionalized, such as the many delegations hosted by the World Bank’s Doing Business division, much of it is ad hoc and informal. The Heritage Foundation, for example, reported that it received a visit from Bahrain’s finance minister during the country’s 2011 crackdown on antigovernment protests in Manama; Bahrain, which had been highly rated on the foundation’s Index of Economic Freedom, wanted to assure the think tank that it would maintain its economic commitments despite its political troubles. And after their ratings were included in the Millennium Challenge Corporation’s indicators, organizations such as Freedom House and the Heritage Foundation reported a sharp increase in the number of national delegations that visited them to discuss and dispute their scores.
Ratings Run Amok
Ratings are meant to diagnose policy ills and bring about improvements. All too often, however, they produce unintended consequences that hinder analysis and worsen policy outcomes. The problem usually derives from consumers’ fixation on parsimony, on a single number that reveals, for instance, whether a country is free, whether a government adheres to the rule of law, or whether an investment is safe. But the lack of complexity comes at a cost. Too often, oversimplified ratings bury crucial assumptions and hide value judgments about the policies and states they describe.
For example, the World Health Organization’s ranking of national health-care systems, which was discontinued following its inaugural release in 2000, assigned “equity of access” the same weight as “responsiveness,” despite the deliberate choice by different states, such as France and the United States, to prioritize these occasionally conflicting goals differently. Rather than investigate the reasons behind these varying priorities and their public policy consequences, the who made the arbitrary choice to weigh them equally, itself a value-laden move.
Arbitrary simplifications such as these not only hide value judgments. They can also produce mystifying variations in the outcomes they describe. Groups that produce ratings often evaluate complex concepts such as democracy or media freedom by adding together loosely related components that can vary independently. Democracy, for instance, can mean strong civil liberties, regular turnover in office, separation of powers, or high voter turnout, attributes to which democrats may attach different intensities of preference. Such values should be measured and reported separately, not lumped together into a single score. Indeed, some governments have already contested the results of this kind of simplification. In April 2013, for instance, officials in Kyrgyzstan objected when Freedom House rated the country’s media “not free,” as it had the year before, for reasons including the shuttering of many of the country’s Uzbek-language newspapers, radio stations, and television channels in the wake of ethnic violence. Kyrgyz officials argued that broader trends in the country were going in the other direction, with a wave of online media outlets flourishing after the ouster of the country’s autocratic president in 2010.
State fragility indexes, such as Foreign Policy and the Fund for Peace’s prominent Fragile States Index, similarly mix together a jumble of variables. Some of these components measure state policies, such as commitment to economic reform, whereas others judge state capacities, such as the quality of infrastructure. Still others present social statistics that governments have little to do with, such as demographic trends. Even when an index focuses on a state’s actions, it can conflate disparate objectives. When assessing corruption, for example, sometimes the point is to evaluate an outcome (corruption got worse), sometimes it is to evaluate a public policy (raising police salaries failed to reduce corruption), and sometimes it is to hold some authority accountable (the justice ministry refused to investigate corruption). But composite measures fail to specify who or what is responsible for the state of affairs they describe.
Finally, ratings of individual countries often ignore the international actors and networks that enable local misbehavior. Transparency International’s Corruption Perceptions Index, for instance, spotlights domestic bribery but downplays the transnational banking links that abet large-scale corruption. In this sense, China’s ranking reveals little about the Western companies that facilitate graft, the offshore financial vehicles (many in Western jurisdictions) that conceal illicit transactions, the overseas real estate holdings where Chinese officials store their money, or the investor residency and citizenship policies that allow corrupt officials to flee to Western countries.
International raters are no doubt aware of these complexities. But rather than ground their evaluations in nuanced theories of conditional and interactive effects, they often take a shortcut: they assume that the outcomes of interest to them represent syndromes in which all good things (or all bad things) go together. Of course, that is not the case, as the many components of a single score can easily undercut one another and because external variables often play hidden roles. The various criteria that produce a country’s media freedom rating, for instance, can move in opposite directions without affecting that state’s final score-say, when increases in the availability of information motivate leaders to crack down on free expression in response.
Such simplifications cannot be solved by carefully weighting the components that produce ratings, since factors such as repression of speech and the availability of political information are interactive, rather than additive, variables, meaning that they can dampen or multiply the overall effect. So what’s needed is not a single metric of state behavior but a better understanding of the interactions that produce the outcomes being studied.
Thumbs on the Scales
The more that ratings are used to allocate resources and inform global governance, the more incentives governments have to game the system, taking small actions to change their scores instead of reforming their underlying behavior. Some have already done so. Georgia, one of the “most improved” countries on the World Bank’s Ease of Doing Business Index, manipulated the relevant indicators by creating crossministerial working groups to rapidly pass laws and promulgate administrative rules. As a result, the country rose from 112th place in 2006 to 37th place in 2007. But Georgia’s reforms failed to address some major inefficiencies—for example, the country’s suboptimal tax auditing procedures.
Its “most improved” status gave Georgia plenty to flaunt, but the evidence suggests that the change was more cosmetic than structural: Georgia’s spectacular and much-publicized improvement on the Ease of Doing Business Index was not matched by similar improvements on comparable indexes, such as the World Economic Forum’s Global Competitiveness Index, nor did it produce a sustained increase in foreign investment. Similar leaps by Rwanda in 2010 and Azerbaijan in 2009 were likewise the results of limited legislative acts rather than substantive regulatory reform. These states gamed the system, but the ranking organization allowed the system to be gamed in the first place.
Indeed, opportunism on the part of states is not the only problem. Rating organizations can be blamed, too. Most of them have political motivations, and because they serve as judges, sources of policy advice, advocates, and self-promoters, they have conflicting interests. Advocacy organizations, for example, often attempt to mobilize activists and browbeat the noncompliant by judging their subjects against an aspirational ideal-a certain standard, say, of state investment in education. Although rating countries against an ideal can gratify activists, the practice can alienate the states it is intended to help by casting them as irredeemably backward.
This can be counterproductive. Ratings detract from their informational and advocacy roles when they assign disparaging labels that prompt government officials to challenge them and the organization that produced them rather than engage in substantive dialogue about the underlying issues. Authoritarians would rather pick a fight with a Western-backed organization such as Freedom House than publicly defend a dismal civil liberties record. Rating organizations make it too easy for them to do so. International media outlets, which tend to rapidly reproduce and disseminate sensationalist ratings without questioning their validity, contribute to this problem.
For ratings to become effective policy tools, they should be based on proven causal relationships and clearly stated assumptions, not ideal standards. Thus, aggregate ratings should be replaced with indexes focused on a narrower set of subjects, such as the performance of specific institutions. Some organizations have already moved in this direction. In 2011, for example, the anticorruption watchdog Global Integrity dropped its annual corruption index to concentrate instead on the evaluation of anticorruption bodies in a limited number of countries. Likewise, in 2013, an independent review panel at the World Bank recommended that the aggregate state rankings in the Ease of Doing Business Index be eliminated in favor of data reflecting each country’s performance on specific indicators-a recommendation that unfortunately went unheeded.
With greater nuance, ratings could become useful policy tools for governments instead of battlefronts in public diplomacy campaigns. Those who peddle in slick ratings are doing a disservice to the very causes they wish to promote. If advocates want indexes to actually help diagnose and cure states’ ills, they will need to sharpen their ratings’ analytic precision and tone down their shock value.