by Aniket Bhushan
The forthcoming edition of the SAIS Review published by Johns Hopkins University Press includes an article titled “Fast Data, Slow Policy: Making the Most of Disruptive Innovation in International Affairs”. In it I argue:
International affairs policy and practice are particularly ripe for disruptive innovation fueled by the rise of big data and open data. This paper demonstrates how the speed and time dimension in policy-relevant research in international affairs and international development are being disrupted.
Three illustrative case studies—real-time macroeconomic analysis, humanitarian response, and poverty measurement—are discussed. The concluding section explores how successful policy entrepreneurs can make the most of disruptive innovation in the age of big data.
Most trends are accompanied by a familiar cycle of hype. After the initial trigger come inflated expectations, then a trough of disillusionment followed by an upward slope of enlightenment, before finally resting on a mainstream plateau. The heightened pace of open data and big data, and their potential impact on international affairs, is following a similar pattern. Whether one chooses to defend the hype or challenge “data fundamentalism,” enough fodder exists to fuel both sides of the debate.
Proponents argue the rise of big data and open data fundamentally changes the way we think about the world. The sheer volume, velocity, variety and veracity of big data means we can worry less about quality issues associated with narrower information sources. We can reframe our methodological orientation to focus on iterative learning and correlations as opposed to obsessing over causality. Doing this allows us to embrace the possibility of working with a plethora of untapped (and growing) data feeds to address challenges not even fully articulated yet. Doing this also means leaving in abeyance a host of new dilemmas in areas such as privacy, autonomy, and asymmetric coverage.
Detractors on the other hand are quick to point out that data is not objective (indeed the term “raw data” is an oxymoron). Data cannot “speak for itself,” as the proponents of big data would have us believe. There are biases at all stages, from collection to analysis to presentation. Big data may be unbeatable when it comes to forecasting, but it is “dumb” when it “comes to doing science,” as it is not underpinned by sophisticated research designs that aim to identify causal relationships. The bigger the data, the more we are prone to lull ourselves into a false sense of confidence in predictive analytics. Indeed, big data accentuates the “signal to noise” problem.
There are a multitude of other issues associated with the use of data. Big data has its roots in the commercial sector. The main intention behind generating sharper insights into customer behavior and profiles is to achieve better targeting and segmentation; in other words, smarter discrimination to ultimately drive profitability. When examined from a public policy perspective, this could be highly problematic. The kinds of targeting and discrimination taken for granted in many commercial sectors, like advertising, would be expressly forbidden in more regulated industries, like the insurance industry, and may be contrary to the aims of public policy and public service delivery.
What does one mean by disruptive innovation in international affairs? The disruptive innovation paradigm argues that small, speculative innovations at the base of the pyramid can often leapfrog and disrupt established domains because top-tier players pursue incremental innovation with their most important, but also most change-resistant clients.There are several examples from big business—Amazon’s business model disrupted brick and mortar retail, Skype disrupted long distance telephony, Netflix is disrupting cable broadcasting.
The paradigm is also applicable to international affairs. The main clients of policy analysis and research are bureaucrats or political decision makers whether in government or international institutions. In this context, the potential disrupters are analysts who can make the most of the rise of big and open data.
How Big Data and Open Data are Disrupting International Affairs
Real-time Economic Analysis
Current or near real-time economic analysis is a highly data dependant enterprise. It is highly conservative, in that it is dominated by central banks, ministries of finance, and large private financial institutions. The more timely, accurate, and relevant the data, the better the current assessment and the more valuable it is from a policy perspective. Big data is already disrupting how we collect, compute, and project basic real-time macroeconomic indicators, ranging from GDP and inflation to financial, housing, and labor market indicators.
Recently, many central banks, including the Bank of England, European Central Bank, The Bank of Japan, and the Bank of Canada, have looked into the possibility of leveraging big data to enhance the timeliness of current economic analysis. An interesting innovation in Canada is the use of big data to fill the gaps in the timeliness of official GDP statistics by developing a new short term GDP indicator that provides daily updates of real GDP growth forecasts. Existing monthly data is combined with big data to predict GDP growth before official national accounts data are released for a given quarter, thus bridging the gap period. The example also demonstrates how big data traverses the “official” and “unofficial” domains.
In the case of Japan, the Abe government needed immediate information on time-sensitive policy changes, such as a major increase in the sales tax. What analysts found was that under the existing system there was no way to assess the situation until the household survey or sales data was released and analyzed months later—an eternity in terms of real-time economic analysis. In response, the government proposed the development of a new composite index that would use big data, including online searches and point-of-sale records that would shed immediate light on the impact of policies, albeit not without significant methodological challenges.
Similarly, the Billion Prices Project (BPP) at the Massachusetts Institute of Technology (MIT) demonstrates how big data can be leveraged to provide a real-time gauge of inflation. BPP uses web-scrapers (a relatively simple approach, but one that is highly extensible and adaptable to several uses) to scour websites of online retailers for real-time prices on an enormous range of products. After the collapse of Lehman Brothers in 2008, BPP data showed how businesses started cutting prices immediately. In contrast, official inflation figures did not show deflationary pressures until November. Given the importance of inflation and timely assessment of inflationary expectations from the perspective of monetary policy response, this information represents a significant improvement in response time.
To assume that these innovations are limited to advanced economies would be a mistake. The UN Global Pulse initiative has partnered with BPP and Price Stats to apply the same web-scraping approach in six Latin American countries, specifically to monitor the price of bread and calculate a new eBread Index. Nascent results from the project show the approach can be extended to developing country contexts, and that, in general, the eBread Index is highly correlated with the official consumer price index for the food basket in these countries. However unlike official inflation data that is available monthly, the eBread Index is available daily. This again is a major improvement in country contexts where inflation and inflationary expectations can change rapidly.
Big data has also been successfully leveraged for a range of other macro indicators. For instance, online search data from Google has been successfully used to predict initial claims for unemployment benefits, consumer sentiment indexes in the United States and United Kingdom, and even car sales down to specific brands. These trends show that companies like Google, Facebook, and Twitter are as important to the future data flow that will fuel policy relevant international affairs research as any national official statistical agency.
The implication is that these companies may be far more important than multilateral data clearing houses such as the World Bank, OECD or UN bodies, on whose highly questionable traditional data—in terms of quality, coverage, granularity and timeliness—much of the current research and analysis in international affairs and development depends. While academics have often pontificated about new and alternative measures of progress, such as the “happiness index,” lesser known firms than Google or Facebook, like Jana, are experimenting with SMS based surveys on a global scale that are able to deliver a real-time snapshot of societal well-being.
Humanitarian Crises and Disaster Relief 2.0
The tragic earthquake off the coast of Haiti’s capital in January 2010 marked a watershed moment for the impact of big data and open data on disaster relief. The earthquake “created a chasm between what the international humanitarian community knew about Haiti prior to the quake and the reality it faced in the immediate aftermath.” The response in Haiti demonstrated an important change in how the huge information gap between damage assessment and response planning was filled. For the first time, two new data inflows were added to the typical crisis response data: one from volunteer and technical communities around the world (principally open source mapping communities like OpenStreetMap, Sahana, CrisisMappers and Ushahidi), and one directly from the affected community of Haitians.
The experience in Haiti showed that the international humanitarian community was not equipped to handle these new information channels, in terms of both speed and complexity. The volunteer technical communities approached the problems in ways that fundamentally challenged the status quo of large humanitarian agencies leading the recovery efforts while the smaller groups follow.
Criticism of the Haiti experience revolves around the overflowing information pipeline. Yet, this is a far better problem than the opposite situation. The ability to learn and rapidly apply lessons in future crises, as discussed below, demonstrates the benefit of having “too much” information. Before focusing on the lessons, it is important to emphasize that the Haitian response proved the rise of big data and open data is not simply about data or technical sophistication. One of the most useful roles played by volunteers was language translation of a huge volume of SMS and other messaging through social media channels. The disruptive innovation was that a highly networked and highly technical, yet contextually aware, virtual community emerged organically. Arguably, the creation of such a community may not have been possible, no matter how many pilot projects were funded by well-meaning donor agencies. One reason is that the problem-solving, transparency-driven, open source mindset that underpins much of the virtual community is not always shared by big bureaucracies and senior policymakers.
Lessons from the Haitian earthquake have been applied in other contexts. User generated crisis maps have saved lives in subsequent disasters. Volunteers involved in the Haiti mapping project have supported other crowd-sourced mapping initiatives, including projects that emerged in the wake of the earthquake in Chile, floods in Pakistan, the crisis in Libya, the earthquake and tsunami in Japan, and the typhoon in the Philippines. With each experience, the work has gotten better as lessons are rapidly shared within a likeminded, highly motivated, and well organized community. The process of interlinking real-time, geo-spatial crisis data with other relevant data feeds, such as traditional media, has grown exponentially in the past few years. The time taken between crisis impact and information generation has shrunk dramatically compared to historical response times. In the case of Japan, within two hours after the earthquake and tsunami, real-time witness reports were being mapped and shared. In a context where seconds and minutes can determine the difference between life and death, the rise of big and open data and their associated communities has disrupted how society plans humanitarian responses, ensuring such tools will be leveraged in future crises.
At the other end of the velocity spectrum are data on typically slow moving measures like poverty. Not only are poverty trends relatively slow moving, at least in comparison to the examples discussed above, but the reporting lags are enormous. The significant lag time of the data bears repeating: When the World Bank announced that 22 percent of the world’s population lived on less than $1.25 a day in 2012—and, consequently, the first Millennium Development Goal had been achieved—that data was four years old when reported, dating from 2008.
The data is the poorest where it matters the most. Recent analysis of the state of widely used economic indicators, such as GDP in sub-Saharan Africa, raises serious issues. While international databases like the World Bank report time-series data for many countries, the countries themselves were found to have not published their own data for many of the years covered. Many countries in the region have or are in the process of updating their national income accounts methodology, making these more consistent with what most countries use. In so doing, many are finding a very different picture than they had been led to believe.
For instance, Ghana’s 2010 revision showed that GDP was 60 percent higher than expected, instantly catapulting a low income country to middle income status. Research comparing GDP data from country sources with GDP data from the World Bank is alarming. GDP estimates according to national sources in some countries like Burundi (2007) were found to be 32 percent higher than the same reported by the World Bank. However, in other cases the reverse was true, and for Guinea-Bissau in 2006, the World Bank’s estimate was 43 percent higher than that of the national authority.
It is important to understand that the problems underpinning these data challenges are not merely an issue of technical capacity, competence, or cost of collection. A far greater problem is perceived or actual interference, whether from political authorities, donors, or other actors. These issues have been aptly termed “the political economy of bad data,” which neatly describes the situation in many developing countries. Huge incentives to misreport plague administrative data systems on many levels. For example, when Kenya decided to abolish fees in primary school, this radically changed the incentives for reporting by school administrators, as schools are allocated more teachers and funding if they attract more students. While administrative data from the Ministry of Education shows a steady increase in primary school enrollment rates, demographic survey and national statistical data fails to confirm the trend and instead indicates enrollment rates have been flat over the same time period.
These findings, while extremely troubling, are made worse by added issues that complicate incentives. For instance, a fast growing trend among donors is cash-on-delivery or performance-based aid, a trend based on the idea of paying for results instead of paying for inputs. Whatever one may think about this conceptually as an aid modality, the fact is that these approaches greatly increase the data burden. In this approach donors pay for development results or outcomes such as increased educational enrollment and improved performance. For performance-based measures to work, organizations need better, more timely, and more granular data. The more ingenuity society can throw at the problem the better.
How are big data and open data disrupting this landscape? Given the context described above, tapping into passively generated and proxy data, if only to triangulate results or provide baseline referential information, could be a welcome innovation. Big data approaches have thrown up three interesting possibilities. The first is analysis of anonymized call detail records (CDRs). A recent project in Cote d’Ivoire, using five million anonymized CDRs from Orange telecommunications customers collected over a five month period, analyzed both the level and location of activity. The analysis indicated that a wider range of calls and longer durations were good proxies for wealth. Using this data, researchers were able to create a granular geospatial estimate of poverty in Cote d’Ivoire—the first data pertaining to a full survey of the country that has been available since the late 1990s due to political strife and economic turmoil in recent years, which have hampered traditional methods.
Another interesting innovation in small scale poverty measurement and prediction is an approach using night light illumination. This approach rests on the assumption that poorer places are quite literally in the dark. Using geospatial, night light, and census data for Bangladesh in 2001 and 2005, researchers showed that a regression model combining the data was able to predict poverty at a granular level. The cost effective and non-intrusive nature of this approach makes it a useful source of proxy poverty data, and makes up for potentially lower accuracy. The concept is also being extended with application to other geographic regions, including in Africa.
A third avenue is high-frequency micro-surveys conducted using mobile phones and other platforms. The World Bank’s Listening to Latin America or L2LAC project was launched out of a frustration among policymakers looking for information on the impact of the 2008 economic crisis in Latin America. Typically this sort of analysis depends on household survey data collected and reported over years—and at a high cost. The L2LAC pilot covered nationally representative samples in Peru and Honduras and demonstrated that by using mobile platforms, small versions of wider household surveys can be conducted on a monthly basis and at a fraction of the cost. This provides much closer to “real-time” insights into poverty, employment, inequality, and other trends essential for effective responses to fast moving crises. L2LAC also provides a useful gauge of poverty dynamics and trends between official reporting periods, which can be years apart. The model has since been extended to pilot projects in Africa.
Anonymized CDR analysis, proxy light source data, and mobile phone based micro-surveys are big data innovations that are disrupting how we measure and respond to poverty at various levels. Aspects of each approach have the potential to be “mainstreamed,” which would have been unthinkable just a few years ago.