C-MoELM: Contrastive Mixture of Experts Language Model with Negative Sample Learning for Fact-Checking

C-MoELM is a parameter-efficient sentence representation model for Natural Language Inference (NLI) and Fact-Checking. It integrates a dynamic Top-k Mixture-of-Experts (MoE) routing mechanism into Quantized Low-Rank Adaptation (QLoRA), trained with a Fusion Negative Sample Learning strategy that combines semantic hard-negative mining, synthetic negative generation, and a weak-positive formulation for neutral pairs.

馃搫 Paper: Coming soon
馃捇 Code & Usage: github.com/tinvanhuynh/C-MoELM


Performance

C-MoELM outperforms 19 baseline models across all evaluation settings on three fact-checking benchmarks (Macro F1-Score, test set).

Model ViNumFCR ViFactCheck ViWikiFC Avg. F1
Fine-tuning PLM
mBERT 83.43 69.94 76.01 76.46
XLM-R Large 90.06 88.02 85.15 87.74
CafeBERT 89.35 87.45 85.24 87.35
PhoBERT Large 87.56 79.76 81.62 82.98
ViCLSR 90.04 88.78 86.57 88.46
InfoXLM 85.00 83.27 86.51 84.93
NLIMoE 88.67 87.37 84.96 87.00
Fine-tuning LLM
Mistral 7B 71.57 88.63 85.36 81.85
Llama-3 8B 92.73 88.67 89.19 90.20
Qwen3-4B 92.44 90.31 88.96 90.57
Phi-mini-MoE 60.60 70.63 74.88 68.70
LLaMA-MoE-v2 81.29 76.68 75.66 77.88
Qwen1.5-MoE-A2.7B 89.45 86.66 85.52 87.21
Prompting LLM
Mistral 7B 44.90 57.31 47.51 49.91
Llama-3 8B 44.69 63.10 53.07 53.62
Qwen3-8B 36.85 76.82 75.83 63.17
Mixtral 8x7B 43.01 59.81 62.77 55.20
Llama-4-Maverick 36.81 72.25 77.53 62.20
Qwen3-30B-A3B 36.44 71.04 72.96 60.15
C-MoELM (Ours) 93.53 91.75 90.08 91.79

Quick Start

For installation, detailed usage, and training code, please refer to the GitHub repository:

馃憠 github.com/tinvanhuynh/C-MoELM

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
3B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for huynhtin/C-MoELM

Finetuned
Qwen/Qwen3-4B
Finetuned
(730)
this model