← All Publications
A Framework for Narrative Structure Analysis Using Automated Semantic Role Labeling in Korean Text
Korean Journal of Sociology (한국사회학), 2025, Vol. 59, no. 3, pp. 101-146
Eun Rang Kwon, Junmo Song, Donggeon Seo, Kangmin Lee, Taeuk Kim, Jeong-Han Kang
One-Line Summary
An interdisciplinary framework that applies automated semantic role labeling (SRL) to Korean text, enabling large-scale extraction and quantitative analysis of narrative structures — "who did what to whom" — and demonstrating its utility through sociological case studies on news corpora covering topics such as immigration and labor disputes.
Background & Motivation
Narrative analysis is a core method in sociology for understanding how individuals, organizations, and societies construct meaning through stories. Since the "narrative turn" in social science, scholars have studied narratives by manually identifying actors, actions, and targets in text — a tradition rooted in structural narratology (Propp, Greimas) and refined by sociologists such as Franzosi, who formalized the Subject-Verb-Object (SVO) triple as the atomic unit of narrative. However, traditional narrative analysis is labor-intensive, relying on close reading by trained researchers, which severely limits the scale of corpora that can be studied.
Key Challenges Addressed:
- Scalability gap: Manual narrative analysis can handle only tens to hundreds of documents, while sociological questions often require analysis of thousands or tens of thousands of texts (e.g., years of newspaper coverage on immigration policy or labor disputes).
- Korean-specific linguistic challenges: Korean's agglutinative morphology (e.g., a single eojeol like "정부에서는" encodes subject + topic marker + source particle), flexible word order (SOV but with frequent scrambling), and pervasive pro-drop (omission of subject/object arguments in up to 30-50% of clauses) make standard English-trained NLP pipelines unsuitable.
- Conceptual bridge needed: While NLP researchers develop SRL systems using PropBank-style argument labeling and sociologists conduct narrative analysis using SVO triples, there has been no systematic framework connecting SRL's predicate-argument structures to sociological narrative elements.
- Reproducibility: Qualitative narrative coding is inherently subjective — inter-coder reliability between human annotators is often moderate at best; a computational approach offers reproducible, consistent extraction across large corpora.
Semantic role labeling (SRL) — the task of identifying "who did what to whom, when, where, and how" in a sentence — offers a natural computational analog to sociological narrative analysis. Both decompose sentences into structured actor-action-target relationships. This interdisciplinary collaboration between sociologists (Kwon, Kang) and NLP researchers (Song, Seo, Lee, Kim) bridges the two fields by formalizing the mapping from SRL outputs to Franzosi-style narrative structures, thereby automating what has traditionally been a manual sociological method.
Proposed Method
The framework consists of four stages, from raw Korean text to quantitative narrative analysis:
1
Korean SRL Pipeline
A deep learning-based semantic role labeling system processes Korean text through a multi-stage pipeline. First, morphological analysis segments each eojeol (spacing unit) into morphemes, separating content words from grammatical particles (josa) and verbal endings (eomi) — critical because Korean case markers (이/가, 을/를, 에게) directly encode semantic roles. Next, predicate identification detects both verbal predicates (e.g., "발표하다," "비판하다") and nominalized predicates (e.g., "발표," "비판") that frequently serve as event anchors in Korean news. Finally, argument extraction assigns PropBank-style labels (ARG0: agent, ARG1: patient, ARG2: instrument/benefactive, ARGM-TMP: temporal, ARGM-LOC: locative, ARGM-CAU: causal) to each predicate's arguments, producing structured predicate-argument tuples.
2
SRL-to-Narrative Mapping
A formal mapping translates SRL outputs into sociological narrative elements aligned with Franzosi's SVO framework. The core mapping is: ARG0 → Subject/Actor (who performs the action), Predicate → Verb/Action (what is done), ARG1 → Object/Target (who or what is affected). Additional arguments enrich the narrative context: ARG2 maps to Instrument/Benefactive, ARGM-LOC to Setting, ARGM-TMP to Temporal Anchor, and ARGM-CAU to Cause/Motivation. For example, from the sentence "정부가 이민자에게 새로운 정책을 발표했다" (The government announced a new policy to immigrants), the system extracts: Actor=정부, Action=발표하다, Target=정책, Benefactive=이민자.
3
Aggregation & Quantification
Individual sentence-level narrative events are aggregated at the document and corpus level through three complementary analytical methods. Frequency analysis computes distributions of actors, actions, and actor-action-target triples across the corpus. Narrative network construction builds actor-action co-occurrence graphs where nodes represent actors/actions and weighted edges capture their association strength, enabling visualization of dominant narrative structures. Role distribution analysis tracks how often specific actors (e.g., "government," "citizens," "corporations") appear in agentive (ARG0) versus patient (ARG1) positions, revealing systematic asymmetries in media portrayal.
4
Sociological Case Study Application
The framework is applied to Korean news corpora to demonstrate its practical utility. The study analyzes how different actors (government agencies, civic groups, corporations, individual citizens) are portrayed through their associated actions and targets across topics such as immigration, labor, and social policy. By examining actor-action distributions and narrative networks, the study reveals systematic patterns in media framing — such as which actors are consistently depicted as agents of change versus passive recipients — patterns that would remain invisible through small-scale manual analysis.
SRL-to-Narrative Mapping Table
| SRL Label | Narrative Element | Sociological Role | Example (Korean) |
| ARG0 | Subject / Actor | Who initiates the action | 정부 (government) |
| Predicate | Verb / Action | What is done | 발표하다 (to announce) |
| ARG1 | Object / Target | What/who is affected | 정책 (policy) |
| ARG2 | Instrument / Benefactive | Means or beneficiary | 이민자 (immigrants) |
| ARGM-TMP | Temporal Anchor | When it happened | 어제 (yesterday) |
| ARGM-LOC | Setting | Where it happened | 국회에서 (at the National Assembly) |
| ARGM-CAU | Cause / Motivation | Why it happened | 인구 감소로 (due to population decline) |
Experimental Results
The framework is validated both quantitatively (SRL system accuracy) and qualitatively (narrative analysis utility):
SRL System Performance
| Evaluation Aspect | Description | Key Considerations |
| Predicate Identification | Accurate detection of verbal and nominalized predicates in Korean sentences | Handles both "발표하다" (verbal) and "발표" (nominal) as event anchors |
| Argument Extraction | Correct identification and labeling of core arguments (ARG0, ARG1) and modifiers | Disambiguates case markers (이/가 vs. 을/를 vs. 에게) to determine role assignments |
| Pro-Drop Handling | Robust performance even when subjects or objects are omitted | Critical for Korean, where 30-50% of clauses lack overt subjects or objects |
| Morphological Processing | Correct segmentation of agglutinative word forms into meaningful argument units | Separates content morphemes from grammatical particles within single eojeol units |
Narrative Analysis Findings
Key Analytical Dimensions Demonstrated:
- Actor frequency ranking: Identification of the most frequently mentioned actors across a corpus, revealing whose voices dominate media narratives
- Action verb clustering: Grouping of semantically related actions (e.g., "criticize/condemn/oppose" vs. "support/endorse/advocate") to characterize narrative tone
- Agency asymmetry detection: Measurement of how often specific actors appear as agents (ARG0) vs. patients (ARG1), exposing systematic portrayal biases
- Narrative network topology: Analysis of graph density, centrality, and community structure in actor-action networks to identify narrative coalitions and oppositions
- Consistency with expert analysis: The automatically extracted narrative structures align well with manually coded narratives by trained sociologists, validating the SRL-to-narrative mapping and demonstrating that computational extraction preserves sociologically meaningful patterns.
- Scale advantage demonstrated: The framework processes thousands of documents in minutes, compared to weeks or months of manual coding, enabling corpus-level narrative analysis previously infeasible — expanding the analytical horizon from hundreds to tens of thousands of articles.
- Actor-action pattern discovery: Quantitative aggregation reveals systematic patterns in how different actors are portrayed — for instance, government actors disproportionately appear in agentive positions with action verbs like "announce," "regulate," and "implement," while citizen groups are more frequently positioned as patients of actions like "affect" and "burden."
- Temporal narrative tracking: The framework enables tracking how narrative structures evolve over time, revealing shifts in dominant storylines and actor portrayals across different time periods — such as how the framing of immigration shifts from economic discourse to security discourse over successive years.
- Korean-specific robustness: Despite challenges such as argument omission, flexible word order, and complex morphological agglutination, the framework maintains meaningful narrative extraction quality, demonstrating that SRL-based narrative analysis can succeed beyond English-centric approaches.
Why It Matters
This work makes a genuinely interdisciplinary contribution at the intersection of computational linguistics and sociology, with implications for both fields:
- New methodology for social sciences: Sociologists gain a reproducible, scalable tool for narrative analysis that can process thousands of texts while maintaining analytical rigor — opening the door to quantitative narrative research at unprecedented scale. This directly addresses a long-standing bottleneck identified by Franzosi and others in computational narrative analysis.
- Real-world NLP application: For NLP researchers, the work demonstrates a compelling application of semantic role labeling beyond standard benchmarks, showing how SRL outputs can directly serve domain-specific analytical needs in the social sciences — a domain where NLP impact has been limited.
- Korean NLP advancement: The framework addresses a significant gap in Korean-language computational tools for social scientific text analysis, which have been far scarcer than their English counterparts. By tackling Korean-specific challenges (agglutination, pro-drop, SOV word order), it establishes a foundation for Korean computational social science.
- Cross-disciplinary bridge: Published in the Korean Journal of Sociology (a top-tier KCI-indexed journal, Vol. 59, No. 3), the work introduces computational NLP methods to a social science audience, establishing a shared vocabulary and methodology between the two fields and demonstrating that NLP tools can be rigorously applied within sociological research frameworks.
- Generalizable framework: The SRL-to-narrative mapping is not limited to news analysis — it can be extended to literary texts, political speeches, legal documents, court rulings, parliamentary transcripts, and other narrative-rich corpora in Korean, with potential adaptation to other agglutinative languages such as Japanese and Turkish.
Parsing & Syntax
Multilingual