Chapter 20Monitoring the Syrian War with Natural Language Processing
—Rahul Dodhia and Michael Scholtens
Executive Summary
The civil war in Syria may no longer be in the public eye, but The Carter Center has been monitoring it since 2013. The Carter Center collects reports of incidents in the war, conducts detailed analyses, and reports these to organizations like the United Nations, the European Union, and interested non-governmental organizations.
The Center manually classifies conflict events into 13 incident types. However, manual classification is a time-consuming process and hard to scale. To address this, we fine-tuned an existing language model on a sample of The Carter Center's Syrian conflict data. The resulting model achieved 96 percent accuracy on held-out test data and 90 percent on out-of-sample data. When reviewed by experts, the model could successfully identify events that should be classified as multiple incident types, such as events categorized as clashes, but which might also be categorized as shelling: the previous methodology had allowed only a single classification.
Overall, our language model reduced the time needed to transform conflict data, giving The Carter Center the ability to produce reports in a timelier manner and scale the number of reports that could be processed. It also made a breadth of conflict datasets accessible by automating cumbersome manual transformations. Our work contributes to incorporating technology into the peace-building ...
Get AI for Good now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.