RAL-RAG Concept-Map Proposition Scorer

Frank–Hall ordinal ensemble (XGBoost + LightGBM) untuk automated scoring proposisi concept-map mahasiswa, dibandingkan pada tiga kondisi: No-RAG, RAG-Standard, dan RAL (Rubric-Aligned Retrieval-Augmented).

Performance summary (5-fold GroupKFold by student UID, out-of-fold)

Condition QWK (mean) QWK (sd) Accuracy (mean) MAE (mean) RMSE (mean)
No-RAG 0.4514 0.0199 0.6193 0.4964 0.8814
RAG-Standard 0.6333 0.0357 0.6903 0.3777 0.7325
RAL 0.7098 0.0413 0.7149 0.3242 0.6413
  • Hierarchy check (RAL > RAG-Standard > No-RAG): MET
  • Target QWK(RAL) >= 0.81: NOT YET MET (actual: 0.7098)

Model architecture

  • Decomposition: Frank–Hall ordinal decomposition (N_CLASSES - 1 binary "greater-than-cutoff" classifiers)
  • Base learners per cutoff: XGBoost + LightGBM, blended via validation-tuned weight, isotonic-calibrated
  • Feature selection: mutual-information + max-correlation redundancy filter (max_corr=0.85) for RAL-only features
  • Cross-validation: GroupKFold (5 folds) grouped by student UID — prevents student-level leakage
  • Retrieval stack (for RAG-Standard/RAL features): SBERT dense embeddings + FAISS, BM25 sparse, CrossEncoder reranking
  • LLM-judge cascade (RAL only): multi-provider (OpenRouter / Groq / HuggingFace) ordinal judge features

Files in this repo

  • ral_ensemble_final.pkl — final RAL production model (cutoff-wise XGBoost+LightGBM dict), refit on all data
  • results/fold_results.csv — per-fold metrics for all three conditions
  • results/scoring_summary.csv — aggregated mean/std metrics per condition
  • results/predictions_store.pkl — out-of-fold predictions & class probabilities per condition
  • results/feature_matrices.pkl — final feature matrices (X_norag, X_rag, X_ral) + labels/folds/groups
  • results/*.png — publication figures (performance bars, SHAP summaries, master figure, etc.)
  • results/publication_manifest.txt — list of all generated artifacts

Intended use

Research artifact accompanying a Scopus Q2 manuscript on rubric-aligned RAG for automated concept-map proposition scoring. Not validated for high-stakes grading without further review.

How to load the model

import pickle
from huggingface_hub import hf_hub_download

path = hf_hub_download(repo_id="Maskur1109/ral-rag-concept-map-scorer", filename="ral_ensemble_final.pkl")
with open(path, "rb") as f:
    artifact = pickle.load(f)

cutoff_models = artifact["cutoff_models"]
feature_columns = artifact["feature_columns"]
# Use predict_proba_cutoffs_v16 + cutoffs_to_class_proba from the source notebook
# to score new feature rows built with the same RAL feature pipeline.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support