EN KO
← All Publications

ENGinius: A Bilingual LLM Optimized for Plant Construction Engineering

ACL 2025 Industry
Wooseong Lee, Minseo Kim, Taeil Hur, Gyeong Hwan Jang, Woncheol Lee, Maro Na, Taeuk Kim

One-Line Summary

ENGinius is the first large language model specifically designed for plant construction engineering (PCE), built on SOLAR-10.7B with a four-stage bilingual training pipeline and 16.5B tokens of domain data, achieving 67.5% on the Professional Engineer benchmark -- surpassing GPT-4 (64.0%) -- and scoring 58.91 on KOPIA, outperforming all open-source and proprietary baselines by 3-17%.

Comparison of general LLMs vs ENGinius
Figure 1. General LLMs (top) often struggle with domain-specific terminology and knowledge -- e.g., ChatGPT misinterprets "NGS" as "Next-Generation Sequencing" instead of "Natural Gas System." ENGinius (below) correctly handles PCE-specific acronyms and delivers optimized responses.

Background & Motivation

Recent advances in large language models have drawn attention for their potential to automate and optimize processes across various sectors. However, the adoption of LLMs in plant construction engineering (PCE) -- covering oil refineries, power plants, chemical facilities, and large-scale infrastructure -- remains severely limited, mainly due to its highly specialized nature and the lack of resources for domain-specific training and evaluation.

Key Challenges in Plant Construction Engineering:

  • Highly specialized domain: PCE involves complex technical terminology across mechanical, electrical, piping, civil, architectural, and instrumentation disciplines that general-purpose LLMs fail to handle accurately. For example, ChatGPT's accuracy on PCE-specific acronyms is only 48.4-55.6%, compared to 86-100% on medical, financial, and legal terms.
  • Lack of training resources: Unlike medicine or law, PCE has virtually no publicly available domain-specific corpora or instruction datasets for LLM training. Authoritative information is often copyrighted by professional associations and accessible only through subscription-based text search services.
  • No evaluation benchmarks: Prior to this work, there were no benchmarks tailored to assess LLM performance on PCE tasks, making it impossible to measure domain competency.
  • Bilingual requirements: Korean engineering firms operate globally, requiring seamless Korean-English communication for technical documents, specifications, and cross-border collaboration. Domain-specific language often appears in multilingual or code-switching environments.

ENGinius addresses all of these challenges by presenting end-to-end procedures for domain data construction (16.5B tokens), a multi-stage model training pipeline scaling SOLAR-10.7B to 14.4B parameters, and the first benchmarks (KOPIA and PE) tailored to the plant construction engineering domain.

Proposed Method: Four-Stage Training Pipeline

ENGinius training procedure
Figure 2. Training procedure of ENGinius: (1) SOLAR-10.7B is expanded to 14.4B using WECHSEL and LLaMA PRO (ENGinius-BasePT), (2) Domain-Adaptive Pre-Training on PCE corpora (ENGinius-PlantPT), (3) Instruction tuning with ENGine-QA (ENGinius-PlantFT), (4) DPO alignment to produce the final ENGinius-14.4B.

ENGinius employs a four-stage training pipeline that transforms SOLAR-10.7B -- selected after evaluating Llama-2 13B and Mistral 7B for its best balance of model size and cross-lingual adaptability -- into a domain-specialized bilingual model, ultimately producing ENGinius-14.4B.

1
Bilingual Base Model Expansion (WECHSEL + LLaMA PRO)
SOLAR-10.7B's vocabulary and architecture are expanded to accommodate Korean language capabilities. WECHSEL integrates new Korean tokens by initializing their embeddings from semantically similar English tokens, while LLaMA PRO augments the model with additional transformer blocks, scaling from 10.7B to 14.4B parameters without degrading existing English capabilities. Continued pre-training on a Korean-English bilingual corpus produces ENGinius-BasePT, which achieves 78.09 on a Korean benchmark (vs. 59.57 for SOLAR-10.7B) while maintaining English performance.
2
Domain-Adaptive Pre-Training (DAPT)
ENGinius-BasePT undergoes continual pre-training on a curated corpus of 16.5 billion tokens covering PCE documents: plant journals (7.75M tokens), engineering books on civil/architectural/electrical/mechanical/piping/HVAC topics (89M-173M tokens each), plant commercial materials (14.2M tokens), regulation and standard handbooks (41.4M tokens), national competency standards (160.5M tokens), news articles (1.52B tokens), research papers (5.53B tokens), and plant articles (8.87B tokens) -- all in English and/or Korean. This produces ENGinius-PlantPT, which consistently outperforms ENGinius-BasePT on both KOPIA and PE benchmarks.
3
Instruction Tuning with ENGine-QA
The domain-adapted model is fine-tuned on ENGine-QA, a suite of 93,662 instruction-response pairs covering six task types: Plant Expert QA from ENG-TIPS forum discussions (58,834 KO + 29,417 EN), Plant Discipline Classification (595 EN/KO), Plant Multiple Choice (1,002 KO), Plant Terminology Dictionaries (3,276 EN), and Deviation Report Generation (538 EN/KO). This is supplemented with a Korean-translated Alpaca-GPT4 dataset for general fluency, producing ENGinius-PlantFT.
4
DPO Alignment
Direct Preference Optimization (DPO) is applied using Q&As from ENG-TIPS with two alternative responses per question generated via GPT-4o and Mixture-of-Experts prompting. Three senior specialists across mechanical, piping, electrical, and architectural disciplines evaluated response pairs and assigned preference scores, labeling responses as "Chosen" or "Rejected." This produces the final ENGinius-14.4B, trained to generate answers aligned with expert expectations.

Domain-Specific Benchmarks: KOPIA and PE

The paper introduces two novel multiple-choice question (MCQ) benchmarks -- the first evaluation tools for plant construction engineering:

  • KOPIA Benchmark (Korean): Developed in collaboration with the Korea Plant Industries Association (KOPIA). It covers mechanical and piping engineering with 1,000 expert-validated test questions on terminology, technical standards, and process knowledge. Planned for public release.
  • Professional Engineer (PE) Benchmark (English): Based on actual PE certification exams, comprising 80 questions across three categories: PE Code knowledge, PE Calculation (advanced engineering calculations), and PE General (conceptual understanding). A score of ~65 is generally regarded as the passing threshold.

Experimental Results

The authors evaluate ENGinius against general-purpose LLMs using the LLM-as-a-judge framework (LLaMA3-70B as judge), conducting 20 independent runs per model and averaging the top 5 for final scores.

KOPIA Benchmark (Korean, Plant Engineering)

ModelMech.PipeAvg.Diff. from ENGinius
Gemma2-9B-it58.6459.3957.89-2.13 (-3.6%)
Orion-14B-Chat51.9652.3251.61-8.41 (-15.0%)
SOLAR-10.7B50.6553.1348.17-10.12 (-17.2%)
ENGinius-14.4B60.7762.6358.91-

Professional Engineer (PE) Benchmark (English)

ModelPE CodePE CalPE GeneralAverageDiff. from ENGinius
Orion-14B-Chat41.3320.0052.2636.50-31.0 (-45.9%)
GPT-3.5-turbo60.0047.0645.1648.75-18.75 (-27.8%)
Gemma2-9B-it72.0034.7159.9951.50-16.0 (-23.7%)
SOLAR-10.7B72.0040.5954.8352.00-15.5 (-23.0%)
GPT-466.6752.9474.8464.00-3.5 (-5.2%)
ENGinius-14.4B10046.4774.8467.5-

Real-World Applications

ENGinius is actively deployed by a major company across real-world PCE workflows, demonstrating tangible industrial impact:

Why It Matters

ENGinius represents a pioneering effort to bring large language model capabilities to the plant construction engineering industry, an economically significant but technically underrepresented domain in NLP research. Its contributions extend beyond a single model:

Links

Domain LLM Multilingual