ForesightLM Core DistilGPT-2

This repository contains the Core ForesightLM checkpoint based on distilgpt2.

ForesightLM studies whether a token-level autoregressive language model can acquire sentence-level foresight through an auxiliary sentence-boundary future semantic objective. The model preserves standard next-token generation while adding a learned projection head for future sentence embedding prediction.

Model components

  • Base language model: distilgpt2
  • Core checkpoint: ForesightLM seed 42
  • Sentence encoder used during training/evaluation: sentence-transformers/all-MiniLM-L6-v2
  • Future objective: sentence-boundary contrastive future embedding prediction
  • Future-loss weight: lambda_future = 0.08
  • Contrastive temperature: tau = 0.07

Intended use

This checkpoint is intended for research on:

  • autoregressive language modeling
  • sentence-level semantic planning
  • discourse coherence diagnostics
  • semantic reranking
  • future-representation calibration

Important limitations

This model is a small research prototype. It should not be treated as a production-quality text generator.

Automatic metrics show that semantic reranking is a strong component by itself. Foresight training improves several diagnostics but does not uniformly dominate a reranked baseline. Direct future-head reranking exposes a calibration gap.

Human evaluation protocol files are released in the GitHub repository, but human judgments are still being collected and will be added in a later revision.

Reproducibility

Code, SLURM scripts, evaluation summaries, compute-cost accounting, bootstrap confidence intervals, qualitative examples, and reproducibility manifests are available at:

https://github.com/Ahmet2001/foresightLM

Large generation JSONL files and training data are not included in this model repository.

Citation

If you use this checkpoint, please cite the ForesightLM project repository until a paper DOI/arXiv identifier is available.

Downloads last month
17
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mandotosh/foresightlm-core-distilgpt2

Finetuned
(1487)
this model