A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data

Submitted by kalthaus on

“Big data” in the form of unstructured text pose challenges and opportunities to social scientists committed to advancing research frontiers. Because machine-based and human-centric approaches to content analysis have different strengths for extracting information from unstructured text, the authors argue for a collaborative, hybrid approach that combines their comparative advantages. The notion of a progressive supervised-learning approach that combines data science techniques and human coders is developed and illustrated using the Social, Political and Economic Event Database (SPEED) project’s Societal Stability Protocol. SPEED’s rich event data on civil strife reveal that conventional machine-based approaches for generating event data miss a great deal of within-category variance, while conventional human-based efforts to categorize periods of civil war or political instability routinely misspecify periods of calm and unrest. To demonstrate the potential of hybrid data collection methods, SPEED data on event intensities and origins are used to trace the changing role of political, socioeconomic, and sociocultural factors in generating global civil strife in the post–World War II era.

Media Date