A model-oriented method that predicts cross-lingual transfer performance by measuring how much two languages share internal sub-network structure within a multilingual model, achieving 4.6% average improvement in source language ranking without requiring any external linguistic resources.
Cross-lingual transfer (XLT) enables multilingual language models to perform well on tasks in unseen languages without target-language labeled data. While English is typically used as the default source language, recent evidence shows that selecting the most appropriate source language for a given target can significantly amplify transfer effectiveness -- in fact, X-SNS demonstrates that non-English sources outperform English in 11 out of 15 cases with an average improvement of 1.8 points.
Limitations of Existing Approaches: Prior methods for predicting transfer compatibility depend on external resources: Lang2Vec requires typological features from the WALS database, lexical divergence uses subword distribution statistics, and embedding similarity only captures surface-level representation overlap. None of these directly examine how the model internally processes different languages. X-SNS addresses this gap with a model-oriented approach that peers inside the network to measure structural language similarity.
X-SNS proposes using sub-network similarity between language pairs as a proxy for predicting cross-lingual transfer compatibility. The core idea is that if two languages activate similar parameters within a multilingual model, knowledge learned from one should transfer well to the other.
Key advantage: The method requires only a moderate amount of raw text (as few as 256 examples suffice for near-optimal performance) from candidate languages -- no labeled data, external linguistic databases, or typological annotations needed. The sub-networks are extracted using masked language modeling, making the approach fully unsupervised.
X-SNS is evaluated on five tasks from the XTREME benchmark using XLM-RoBERTa Base, covering 7 to 20 languages per task. NDCG@3 measures how well each method ranks source languages for zero-shot transfer.
| Task (Dataset) | Lang2Vec | Embedding | X-SNS |
|---|---|---|---|
| NER (WikiANN, 17 langs) | 62.35 | 76.06 | 78.12 |
| POS (UD 2.8, 20 langs) | 78.06 | 74.65 | 83.73 |
| NLI (XNLI, 15 langs) | 59.77 | 63.15 | 68.73 |
| PI (PAWS-X, 7 langs) | 86.81 | 83.51 | 89.82 |
| QA (TyDiQA, 8 langs) | 84.52 | 86.00 | 87.95 |
In the regression framework, X-SNS as a single feature outperforms multiple linguistic features from typological databases:
| Feature Set | NER (RMSE) | QA (RMSE) |
|---|---|---|
| X-POS + MER (linguistic) | 7.18 | 7.40 |
| X-SNS + MER (ours) | 5.12 | 5.80 |
X-SNS provides a practical, model-grounded mechanism for source language selection in cross-lingual transfer. Unlike methods that depend on external typological knowledge (which may be incomplete or unavailable for many languages), X-SNS looks directly at how the model internally represents languages, making it applicable to any language with raw text data. The method's data efficiency (near-optimal with just 256 examples) and fully unsupervised nature make it particularly valuable for deploying multilingual systems to low-resource languages, where choosing the right source language can make the difference between successful and failed cross-lingual transfer. The findings also provide a deeper understanding of how multilingual models organize linguistic knowledge internally -- languages that share more sub-network structure genuinely transfer knowledge more effectively.