navigating the data swamp

Last week, I joined a session of Charles Martin-Shields’ course on technology and conflict response. The course is offered by TechChange, a Washington, DC-based start-up that develops education resources for peacebuilding practitioners, technologists, and policymakers. The company’s competitive edge is its distance-learning platform, and the course boasted participants–on-the-ground, at an organization’s headquarters, and somewhere in between–from several countries. The discussion was based on my recent back-and-forth with Charles and Christopher Neu, a TechChange operations guru, on navigating data about violent conflict and mass atrocities.

The participants’ questions provoked a constructive discussion about the mass atrocity data “swamp,” its information and security risks, and how practitioners can navigate both. Participants agreed that the best information exists between big, computed data and small, user-generated data. This agreement, however, opens new dilemmas: how peacebuilding organizations balance the moral act of “bearing witness,” with the no-less-moral act of protecting their local officers and sources; and, how analysts assess conflict amid small amounts of low-quality information. Below, I summarize my initial thoughts on these two dilemmas, based on the TechChange discussion and relevant reading since.

Proprietary data aren’t private, and open data aren’t public: Better data, big or small, can only emerge from stronger computer and human information networks. For big data practitioners, this means expanding systems like the recently-suspended Global Database of Events, Language, and Tone (GDELT), which I discussed in my last post as an example of useful machine-coded datasets. Before the platform’s suspension, GDELT’s data scientists viewed–and may still view–its future in these terms: the platform’s value grows as it acquires more reliable and diverse news sources.

For small data practitioners, a “network” refers to the human relationships, bolstered by communication technologies, that transfer information from local sources to global headquarters. This information, about where a conflict occurs, which populations are vulnerable, and what their needs are, often informs the distribution of peacebuilding resources. Additionally, organizations carry the amorphous public responsibility of “bearing witness” to ongoing abuses. These dual hats create internal contradictions between an organization’s public face and its private needs.

Many practitioners, however, view this dilemma as an unresolvable dichotomy, rather than, more accurately, a give-and-take. Peacebuilding data are effective when an organization shares them–within its organization, but also with others. Small data are the property of an organization and its sources, and not the private confidence of a tiny group of people. Peacebuilding organizations should weigh the burden of risky information, but also grant their local sources the agency to shape, if not determine how the data are used.

The best analysis is transparent, not definitive: Analysis is never an independent affair. An analyst’s client may be a practitioner, a policymaker, another, more senior analyst, or the general public, but the relationship is consistent: the client clarifies expectations that an analyst uses to determine priorities, hone datasets, and frame conclusions. In our discussion, several practitioners lamented their clients’ demand for certain assessments amid uncertain data. I cited a scene from Zero Dark Thirty, Kathryn Bigelow’s dramatic rendering of the manhunt for Al Qaeda chief Osama Bin Laden, that resounds with my own, brief exposure to the U.S. intelligence community. In the scene, then-CIA director Leon Panetta asks a cohort of senior analysts whether Osama bin Laden is located in a compound on Abbottabad, Pakistan, where Navy SEAL Team 6 later killed him. The CIA’s deputy director, presumably for intelligence, suggests that bin Laden is more than likely located at the compound; Panetta, visibly disgruntled, presses his junior colleague for a more confident response. The deputy director sighs, “We don’t deal in certainty, we deal in probability. I’d say there’s a sixty percent probability he’s there.”

As in the information security problem, peacebuilding experiences mirror their national security counterparts. If an analyst says, “according to qualitative and quantitative tools, a conflict in a local village in northern Kenya will probably emerge over the next six months,” the client–a UN agency or a private foundation or a humanitarian aid group–may request a more definitive response. In these circumstances, given poor-quality data, the analyst’s best option is transparency–about the quantity, quality, and limited reach of the data and its conclusions. Better peacebuilding emerges from an acceptance of uncertainty, rather than the creation of certainty where it cannot exist.

freedom, control, and the act of (mass) killing

On December 15, South Sudan’s presidential guard began shooting. South Sudanese officials loyal to President Salva Kiir exited a late-afternoon meeting of the Sudan People’s Liberation Movement (SPLM), the country’s ruling party; hours later, elite soldiers entered communities nearby. Survivor testimonies, compiled by Human Rights Watch, describe the melee. “I stopped to pick [up] my son but he was heavy and dead,” said one Juba bricklayer.

Throughout the history of mass violence, uncounted fathers have stopped to pick up their sons, only to find them heavy. Picking up the dead, especially the familiar dead, is the most immediate, intimate act of remembrance. Memories of violence pass through osmosis; the physical gap between past life and present death, just previously the sole territory of the dead, closes. After the dead are retrieved, mass death is public, known. The act of killing–who is responsible, and why they killed–however, remains little more than an imagined truth.

The mystery of mass violence is, of course, the central question of Joshua Oppenheimer’s The Act of Killing, the play-within-a-play-qua-documentary about the ringleaders of an Indonesian militia under the murderous, repressive Suharto regime. That question reappears, perhaps more subtly, in 12 Years a Slave, the cinematic retelling of Solomon Northup’s liberation from antebellum slavery. In tandem, the two reveal the layered relationship between memory and the politics of control, separately and together, and responsibility for mass violence.

Joshua Oppenheimer, who directed The Act of Killing, is not its central storyteller. Anwar Congo, the head gangster of a former Indonesian death squad, claims that role, alongside several others: documentarian, performer, memorialist, memorialized. The film introduces Anwar and his cohort during their first local casting call for a dramatic commemoration of Indonesia’s mass violence, which killed hundreds of thousands of Anwar’s fellow countrymen between 1965 and 1966. Of course, they were never countrymen of Anwar’s, nor of the sprawling paramilitary organization to which Anwar still belongs. They were killed, by Anwar’s militia and similar ones, as communists and dissidents and civilians and terrorists and opposition leaders and insurgents, even if they died as humans.

Whether the militia’s fictive victims are also humans, as opposed to communists, is unclear. The humanity of their mass murderers exits repeatedly, like the comic relief in a poorly-scripted Beckett play. In one scene, Anwar’s cohort acknowledges their collective crimes. “The key,” one starts, “is to find a way not to feel guilty, find the right excuse.” In contemporary Indonesia, as in its past, guiltlessness is a simple task. Many beneficiaries of Suharto’s regime herald the survival of Indonesia’s gangster culture, of which Anwar is an archetype. Officially, an Indonesian gangster is a “free man”–a human, living among communists.

At first, a gangster’s freedom is an impression, an act of cultural mimicry: aggressive pastels, based loosely on Fredo Corleone’s wardrobe; across town, fascistic rallies where slouching (but patriotic) militiamen chew on dwindling cigarette butts. The scale and consequence of Anwar’s freedom, however, comes quickly into view. As young gangsters, Anwar and his militia stood outside the local movie theater most nights, scalping tickets and harassing passerby; one recalls Malcolm X’s description, in his autobiography, of Harlem’s gangsters, or the youthful rabble of an S.E. Hinton novel. But Anwar’s freedom was no such casual nuisance. After their quick fun, his crew exited the street, and ascended a ramshackle citadel of rudimentary terror. There, they interrogated and often killed Indonesia’s most wanted. These actions occurred at the state’s behest, but far beyond its control.

Solomon Northup’s story is about a different freedom, and a very different violence. Unusually, Solomon was born a free man, and became a literate member of the Saratoga Springs, NY, educated class. One weekend, while his wife and children are out of town, two hucksters invite Solomon, a proficient violinist, to Washington, DC, a frequent cross-roads for the domestic U.S. slave trade. After a night of drinking, Solomon awakes in a dungeon cell, visibly enslaved. He remains in chains, physically and figuratively, for the ensuing majority of the movie, until a court order grants his freedom.

If Solomon sketches his own slow, terrible crawl from slavery, his story also describes the disparate freedom of those around him. Freedom is an imperfect privilege–in the antebellum United States, as elsewhere, the freest humans are always bound to someone by fealty or debt or, among the less free, weaker chains. Early in the movie, Solomon’s well-spoken companion Clemens Ray is released in the New Orleans port, from which Solomon will travel inland. Despite his release, Clemens is in debt: “Ironically,” John Ridley’s scripted stage directions read, “his master now represents ‘freedom.'”

Solomon’s drivers are also free, often at the expense of their slaves. Solomon’s first encounter with the slave plantation’s regular cruelty, Tibeats, is an aggressive overseer with unusual control over the disciplinary affairs of Ford’s plantation. Tibeats is a feeble character, physically meek and psychologically unstable. After Solomon reciprocates Tibeats’ violent provocations, the overseer corrals a lynch mob–he cannot punish Solomon single-handedly. Ford, the plantation master, repays Solomon’s previous obedience with generous mercy, though barely enough to protect his slave.

The episode is an ugly waltz of violent control, between Ford’s repression and Tibeats’ quick, brutal rage. Their dance, a brief sliver of Solomon’s suffering, is a synecdoche of slavery’s slow, massive violence. When the system worked, its slaves suffered, though always at their master’s hand; when it withered, its masters beat and murdered and starved and tortured their victims at overwhelming scale. Though we remember slavery as an institution of totalitarian control, its worst abuses occurred under Tibeats, not Ford–that is, where control was weakest. Solomon’s later masters mimic Tibeats’ cowardly violence amid the anarchic authority in which that violence thrives: the cotton farmer, Edwin Epps, whips Patsey’s crumbling body, a rightly spiteful target of his sexual desire; or, Mary, Epps’ wife, who subjects Patsey to a jealous, tyrannical cruelty. Of course, the untold epilogue of Solomon’s story–the American Civil War–is also the indirect consequence of slavery’s weakening hold.

The dilemma of control sharpens the two films’ divergent understanding of who is responsible for mass violence, and how that responsibility shapes our memory of its event. Just as Solomon’s violence is a small lens into slavery’s greater toils, so too are Anwar’s gangsters a microcosm of Indonesia’s rapid mass atrocity. In the movie’s opening credits, Oppenheimer quotes the quantitative range of violence–one million dead. One understands that this one million, like the Holocaust’s six, is an unfathomable number, and that Anwar’s story grants subtle texture to its massive scale. The movie, however, is as much about avoiding responsibility as 12 Years a Slave is about claiming it. Anwar’s true victims are anonymous ghosts of a suppressed past; his actors, meanwhile, exist out of space and time–except for the viewer’s foreknowledge, the fictive victims might as well be civilians in Cambodia, under the insurgents of the Khmer Rouge, or at My Lai, under U.S. Army Company C. In Indonesia, many were killed; Anwar’s victims, however, are countless and unknown.

As the act of killing concludes, Anwar retches. He retches into the trough, and against the bullet-pocked concrete column, and onto the dusty floor, stained by deep-crimson memories of Indonesia’s retold death. In between heaves, he picks up a ragged canvas sack. This was the sack, he sighs, with which his gangster squad discarded “the human beings we killed.” He returns to his retching spot, where they died.

what is america’s foremost public intellectual?

Earlier this week, following a too familiar media scuffle over Mitt Romney, the American family, and race, Ta-Nehisi Coates published an article that described MSNBC anchor Melissa Harris-Perry as America’s “foremost public intellectual.” Coates’ Harris-Perry is a brave and unusual public voice, in large part because of her identity: an educated black woman in an overwhelmingly white, male public discourse. Coates’ hyperbole–an intellectual’s “foremost”-ness ranks that which defies measurement–provoked a predictable firestorm. According to these comments, Harris-Perry cannot be a “foremost” public intellectual because she is not Noam Chomsky, the profession’s prototype, or because she is not often cited in the halls of power, or because she does not shape the common structure of our public sphere.

These objections contain subtle half-truths, and little more. What a public intellectual is–professionally, socially, politically, and otherwise–rests entirely on two dilemmas, awkwardly combined: what a “public” is, and whose ideas the public accepts. These questions are as fresh in our contemporary discourse as they are age-old–in fact, they underpin the very contours of our political conversation.

Over the past few years, I have engaged a slow, plodding biographical study of Tony Judt, the late historian whose profound prose Coates and I recently share in common. For a short two decades between 1992, when Judt published a controversial study of postwar French intellectuals, and 2010, when he died, very slowly, of ALS, Judt was a “public intellectual,” by many definitions. His biography, a very interesting one, offers three preliminary insights into what a public intellectual is, and how we should think about the pillars of such a person’s identity.

Ideology: An intellectual’s ideas are not independent–they exist in an ideology, the society of ideas with which they publicly and privately cohere and differ. An intellectual can proclaim themselves “non-ideological,” as a politician proclaims themselves “non-partisan,” but the social act of discussing ideas requires that they agree with some similar idea, whether in structure, logic, or outcome. Even the discussions of Socrates, Western history’s most storied gadfly, created ideologies–modes of thought with which his students disagreed, and against which they defined the substance of their own ideas. Judt was also ideological, despite Coates’ insistence to the contrary: in refuting the totalitarian-lite preferences of postwar French intellectuals, in Past Imperfect, he associated with a particular (Isaiah Berlin-inspired) definition of “freedom”; in dismissing the Zionism of the hawkish American left, he adopted a particular concept of political Jewishness. Judt’s intellectual milieu made much of Vaclav Havel’s trope, “speaking truth to power.” But his ideas, like those of all public intellectuals, were just another power, crafting another truth, however more righteous.

Access: When Coates described Harris-Perry’s qualifications, he was quick to cite her credentials: degrees from top universities, a prominent job. Of course, these are metrics of influence, but they also indicate access, a feature not necessarily driven by the quality of an intellectual’s ideas. As social justice advocates often observe, access is more reliably an indication of privilege, less of inherent intellectual value. Harris-Perry may be morally correct on some topics, and wrong on others, but it usually doesn’t matter: her influence rests on her credentials, the currency of our current meritocracy. Like Harris-Perry, who brushed against a public racial taboo, Judt’s post-Zionism ruffled the delicate pro-Israel opinions of a liberal intelligentsia. He faced significant objections, and some alleged efforts to restrict his public commentary, but remained tied to NYU, his home institution, and the New York Review of Books, which first published his critical commentary on Israel in 2003. Judt could afford to be a public intellectual because his public offered few sustained consequences for his dissidence.

Audience: When Coates asserted that Harris-Perry was the foremost public intellectual in America, he took two statements for granted: that “America” is a unitary public, and that its publicness carries social and moral meaning. Surveying the American media landscape, the most reliable proxy for our national public sphere, both statements appear patently, unavoidably false. “America” is surely a salient concept and identity for many, including those who live within the borders of the United States, but its salience, symbols, and virtues diverge along that stretch of highway between Chicago’s South Side and the plains of Oklahoma. Logically, its publics must also split. There are those who view a stretch of 57th Street as their common ground, and others who view morning-time Fox News talk-shows as a vessel for their cultural norms. Judt also faced these divided publics. Left-wing French intellectuals incited literary riots over his damning historical portrait of their predecessors; the New York Times Book Review, in contrast, featured a generous front-page essay on Past Imperfect, but few would describe its American reception as “publicly prominent.”

the murky swamp of mass atrocity data

Evangelists of “big data,” the possibility of computed knowledge at unprecedented scale, often describe our contemporary world as a “sea” of information. Data scientists have more and better knowledge of how humans behave, how they interact, how they cooperate, and how they conflict, generated as much by our own actions–through the Internet, mostly–as by those who surveil us. For some problems, the dataset is a near perfect match. Commercial airlines use “frequent flyer” programs to track when their customers fly, and to where; electoral strategists manipulate marketing information to infer norms, cultural preferences, and political opinions among likely voters. Amid a unfathomable sea, these data are intimate and human. Sgt. Pepper’s “day in the life,” once framed by a cup of coffee, is now an ever-present data-stream. We wake up, we create data; we go to the bodega, we create data; we set up shop in a six-by-six cubicle. We create data.

Violent conflict, especially on a mass scale, is never so neat. Acts of violence don’t create data, but rather destroy them. Both local and global information economies suffer during conflict, as warring rumors proliferate and trickle into the exchange of information–knowledge, data–beyond a community’s borders. Observers create complex categories to simplify events, and to (barely) fathom violence as it scales and fragments and coheres and collapses. A “mass atrocity” is a fiction; an analytically and morally useful one, but a fiction nonetheless. We expect system to follow scale, but it rarely does. So rarely, in fact, that observers identify little more than one hundred mass atrocity events since the end of the cataclysmic Second World War. One hundred is a large number, but it’s a negligible fraction of the individual violence that comprises its subjects.

Mass atrocity data have improved in fits and starts. The Global Dataset of Events, Language, and Tone (GDELT), a massive open-source computing effort, uses an automated, iterative data-stream to collect events. GDELT ingests information, imperfectly, to create a more perfect portrait of where events, including violence, globally occur. John Beieler, a political science PhD student at Penn State, recently experimented with the GDELT dataset of violent events in the Central African Republic (CAR) and South Sudan, both of which are embroiled in ongoing mass atrocities. Beieler uses the dataset to assess the likelihood of future mass atrocities in either country, but came up short. Local and international media sources feature both conflicts–gruesome portraits grace A1, and prominent global officials publish opinion pieces to “bear witness” to CAR and South Sudan’s respective horrors. But media publications cover these events as “mass atrocities,” and not as a sequential series of individual violent events. In a coda, Beieler contrasts this to Egypt, which, because of a glut of foreign journalism, the availability of citizen reporting tools like Twitter, and robust foreign diplomatic engagement, appears as both “mass repression” and a sequential series. Our understanding of the conflict’s progression throughout time–what it is, as a global event–determines its media coverage, and therefore its usefulness as a big data subject.

The convergence of scarce media, knowledge, and data is not unique to massive datasets, nor to time-bounded events. The information that local aid groups use to assist conflict-affected communities is small, in comparison. Small data are complementary, not subordinate, to their massive counterparts. Humanitarian networks, mediators, and civil society organizations want to know where violence occurred and, consequently, where vulnerabilities persist. While time is a useful data point, location is essential. Without location, aid groups won’t know where to go or how far to extend their operations. As Christopher Neu, a peacebuilding technologist, observes, the usefulness of public small data rests on an ethical quandary: In a live conflict, do humanitarian small data expose the same vulnerabilities they aim to fix? Where GDELT’s big data are open-source, small data are inherently proprietary–they’re generated by a user, one who sometimes risks physical safety to report a violent event’s location. Proximity, so praised among peacebuilders as big data’s lacking nuance, also muddies the data pond it aspires to clarify.

why did mass killing increase in 2013?

For Joseph Brodsky, mass violence was a close, unwanted companion. The Russian poet’s career began in earnest under repression’s shadow, in the exiled cold of the northern Arkhangelsk region. In 1964, Brodsky wrote “Spring Season of Muddy Roads,” a quaint and subtly tragic pastoral. The verse portrays a weary road, recently muddied by a spring rain. The road’s new season, so often refreshing, is now uncertain: “It’s not quite spring, but some- / thing like it. / The world is scattered now, / and crooked. / The ragged villages / are limping. / There’s straightness only in / bored glances.”

The year 2013 was scattered, too, and crooked. For all that went well, much also went poorly. In the wake of South Sudan’s horrific violence, Jay Ulfelder reports, “2013 may become worst year for onsets of state-led mass killing since early 1990s.” By Ulfelder’s count, mass killing began, nationally, in South Sudan, the Central African Republic, Nigeria, and Egypt, and continued in Syria and Iraq. That these events–“mass killing in South Sudan,” for example–are statistical inventions does not less the human significance of their politics. Mass killing is a very particular form of conflict, with particularly grave consequences for its victims. Its persistence–in those four new events, and in the thousands of microscopic conflicts that comprise them–may explain a tragic feature of our present politics.

Since the regional Arab uprisings of 2011, the “global upsurge” in protest has become an accepted proxy for our present era of instability. Global civil society appears, often in tandem, to carve new political space against a backdrop of regime repression. These civil society actions represent an increasing fraction of anti-state activity, writ large. Beyond them, the Central African Republic’s anti-balaka militias now violently contest the Seleka movement’s revolutionary rule; until last spring, when a heavy-handed counterinsurgency pushed them out, Boko Haram’s implants secured near-complete authority over Nigeria’s northeastern Borno State. In this context, mass killing is often the state’s last barricade–an extreme measure in the most desperate times.

Anti-state activity was not necessarily more frequent in 2013, nor will it always precipitate mass killing. But as protest and insurgency reemerge, the type of extraordinary politics that mass killing represents may prove more frequent. As Brodsky writes, there’s straightness only in bored glances.

Update: In the comments, Jay Ulfelder chimed in with the following comment on the above-mentioned data:

One point of clarification: I don’t see *state-led* mass killing in all of the cases you list, and I do see state-led and non-state mass killing in some cases not listed here. Here’s a quick list that will probably turn into a blog post before year’s end:

* Ongoing episodes of state-led mass killing: Syria (opposition), Sudan (multiple), Egypt (Islamists), North Korea (gulags), Myanmar (Kachin); now maybe also Nigeria (anti-Boko Haram), South Sudan (anti-Machar faction/Nuer); and maybe still DRC (eastern)

* Other conflicts producing episodes of mass killing that aren’t state-led: Iraq, Pakistan, Nigeria (Boko Haram), Mexico, now also CAR, and surely others I’m forgetting

I look forward to Ulfelder’s post.