navigating the data swamp

Last week, I joined a session of Charles Martin-Shields’ course on technology and conflict response. The course is offered by TechChange, a Washington, DC-based start-up that develops education resources for peacebuilding practitioners, technologists, and policymakers. The company’s competitive edge is its distance-learning platform, and the course boasted participants–on-the-ground, at an organization’s headquarters, and somewhere in between–from several countries. The discussion was based on my recent back-and-forth with Charles and Christopher Neu, a TechChange operations guru, on navigating data about violent conflict and mass atrocities.

The participants’ questions provoked a constructive discussion about the mass atrocity data “swamp,” its information and security risks, and how practitioners can navigate both. Participants agreed that the best information exists between big, computed data and small, user-generated data. This agreement, however, opens new dilemmas: how peacebuilding organizations balance the moral act of “bearing witness,” with the no-less-moral act of protecting their local officers and sources; and, how analysts assess conflict amid small amounts of low-quality information. Below, I summarize my initial thoughts on these two dilemmas, based on the TechChange discussion and relevant reading since.

Proprietary data aren’t private, and open data aren’t public: Better data, big or small, can only emerge from stronger computer and human information networks. For big data practitioners, this means expanding systems like the recently-suspended Global Database of Events, Language, and Tone (GDELT), which I discussed in my last post as an example of useful machine-coded datasets. Before the platform’s suspension, GDELT’s data scientists viewed–and may still view–its future in these terms: the platform’s value grows as it acquires more reliable and diverse news sources.

For small data practitioners, a “network” refers to the human relationships, bolstered by communication technologies, that transfer information from local sources to global headquarters. This information, about where a conflict occurs, which populations are vulnerable, and what their needs are, often informs the distribution of peacebuilding resources. Additionally, organizations carry the amorphous public responsibility of “bearing witness” to ongoing abuses. These dual hats create internal contradictions between an organization’s public face and its private needs.

Many practitioners, however, view this dilemma as an unresolvable dichotomy, rather than, more accurately, a give-and-take. Peacebuilding data are effective when an organization shares them–within its organization, but also with others. Small data are the property of an organization and its sources, and not the private confidence of a tiny group of people. Peacebuilding organizations should weigh the burden of risky information, but also grant their local sources the agency to shape, if not determine how the data are used.

The best analysis is transparent, not definitive: Analysis is never an independent affair. An analyst’s client may be a practitioner, a policymaker, another, more senior analyst, or the general public, but the relationship is consistent: the client clarifies expectations that an analyst uses to determine priorities, hone datasets, and frame conclusions. In our discussion, several practitioners lamented their clients’ demand for certain assessments amid uncertain data. I cited a scene from Zero Dark Thirty, Kathryn Bigelow’s dramatic rendering of the manhunt for Al Qaeda chief Osama Bin Laden, that resounds with my own, brief exposure to the U.S. intelligence community. In the scene, then-CIA director Leon Panetta asks a cohort of senior analysts whether Osama bin Laden is located in a compound on Abbottabad, Pakistan, where Navy SEAL Team 6 later killed him. The CIA’s deputy director, presumably for intelligence, suggests that bin Laden is more than likely located at the compound; Panetta, visibly disgruntled, presses his junior colleague for a more confident response. The deputy director sighs, “We don’t deal in certainty, we deal in probability. I’d say there’s a sixty percent probability he’s there.”

As in the information security problem, peacebuilding experiences mirror their national security counterparts. If an analyst says, “according to qualitative and quantitative tools, a conflict in a local village in northern Kenya will probably emerge over the next six months,” the client–a UN agency or a private foundation or a humanitarian aid group–may request a more definitive response. In these circumstances, given poor-quality data, the analyst’s best option is transparency–about the quantity, quality, and limited reach of the data and its conclusions. Better peacebuilding emerges from an acceptance of uncertainty, rather than the creation of certainty where it cannot exist.

navigating the data swamp

Published by Daniel Solomon

Leave a comment

Share this:

Published by Daniel Solomon

Leave a comment