EN KO
← All Publications

X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity

EMNLP 2023 Findings
Taejun Yun, Jinhyeon Kim, Deokyeong Kang, Seong Hoon Lim, Jihoon Kim, Taeuk Kim

One-Line Summary

A model-oriented method that predicts cross-lingual transfer performance by measuring how much two languages share internal sub-network structure within a multilingual model, achieving 4.6% average improvement in source language ranking without requiring any external linguistic resources.

Paper overview
Figure 1. X-SNS overview: For each language, a binary sub-network is extracted based on Fisher Information scores, and the Jaccard similarity between sub-networks serves as a proxy for cross-lingual transfer compatibility.

Background & Motivation

Cross-lingual transfer (XLT) enables multilingual language models to perform well on tasks in unseen languages without target-language labeled data. While English is typically used as the default source language, recent evidence shows that selecting the most appropriate source language for a given target can significantly amplify transfer effectiveness -- in fact, X-SNS demonstrates that non-English sources outperform English in 11 out of 15 cases with an average improvement of 1.8 points.

Limitations of Existing Approaches: Prior methods for predicting transfer compatibility depend on external resources: Lang2Vec requires typological features from the WALS database, lexical divergence uses subword distribution statistics, and embedding similarity only captures surface-level representation overlap. None of these directly examine how the model internally processes different languages. X-SNS addresses this gap with a model-oriented approach that peers inside the network to measure structural language similarity.

Proposed Method

X-SNS proposes using sub-network similarity between language pairs as a proxy for predicting cross-lingual transfer compatibility. The core idea is that if two languages activate similar parameters within a multilingual model, knowledge learned from one should transfer well to the other.

1
Fisher-Based Sub-Network Extraction
For each language, compute the approximated Fisher Information for every model parameter using raw text. The Fisher score quantifies how sensitive each parameter is to a given language's data. Parameters in the top p% (default: 15%, matching the masked language modeling ratio) are selected to form a binary sub-network vector where 1 indicates an important parameter and 0 otherwise.
2
Jaccard Similarity Computation
Measure the structural overlap between two languages' binary sub-networks using the Jaccard similarity coefficient: |s_source ∩ s_target| / |s_source ∪ s_target|. Higher Jaccard similarity indicates that the model processes both languages through largely the same internal pathways.
3
Source Language Ranking
Given a target language, rank all candidate source languages by their sub-network similarity score. The top-ranked source is predicted to yield the best zero-shot transfer performance. This ranking can also be used in multi-source settings by selecting the top-k languages for disjoint multilingual training.

Key advantage: The method requires only a moderate amount of raw text (as few as 256 examples suffice for near-optimal performance) from candidate languages -- no labeled data, external linguistic databases, or typological annotations needed. The sub-networks are extracted using masked language modeling, making the approach fully unsupervised.

Experimental Results

X-SNS is evaluated on five tasks from the XTREME benchmark using XLM-RoBERTa Base, covering 7 to 20 languages per task. NDCG@3 measures how well each method ranks source languages for zero-shot transfer.

Task (Dataset)Lang2VecEmbeddingX-SNS
NER (WikiANN, 17 langs)62.3576.0678.12
POS (UD 2.8, 20 langs)78.0674.6583.73
NLI (XNLI, 15 langs)59.7763.1568.73
PI (PAWS-X, 7 langs)86.8183.5189.82
QA (TyDiQA, 8 langs)84.5286.0087.95

In the regression framework, X-SNS as a single feature outperforms multiple linguistic features from typological databases:

Feature SetNER (RMSE)QA (RMSE)
X-POS + MER (linguistic)7.187.40
X-SNS + MER (ours)5.125.80

Why It Matters

X-SNS provides a practical, model-grounded mechanism for source language selection in cross-lingual transfer. Unlike methods that depend on external typological knowledge (which may be incomplete or unavailable for many languages), X-SNS looks directly at how the model internally represents languages, making it applicable to any language with raw text data. The method's data efficiency (near-optimal with just 256 examples) and fully unsupervised nature make it particularly valuable for deploying multilingual systems to low-resource languages, where choosing the right source language can make the difference between successful and failed cross-lingual transfer. The findings also provide a deeper understanding of how multilingual models organize linguistic knowledge internally -- languages that share more sub-network structure genuinely transfer knowledge more effectively.

Links

Representation Learning Multilingual