Serve an LLM (vLLM) 🚧 not trained yet

Deploy a (fine-tuned) LLM for fast, batched, OpenAI-compatible serving.

Status β€” documented recipe (placeholder). A production-grade pipeline from Ropedia Academy for an advanced, GPU-heavy task. Everything below β€” base model, objective, dataset, config, the exact evaluation β€” is specified; the weights / metrics / figures land here automatically when you run the notebook on a GPU (one click below). Try the trained models live in the Ropedia demos Space.

At a glance

Base model Any (fine-tuned) LLM you serve
Task fast LLM serving / deployment
Training objective High-throughput batched inference (PagedAttention) β€” no training.
Track LM Β· Language & multimodal
Built on vllm-project/vllm
Notebook Open In Colab
Compute / storage / time GPU required β€” see the Compute Β· storage Β· time table in the notebook

Dataset

  • Source: n/a (serving).

Training config

GPU-scale β€” the notebook ships a demo profile (free Colab T4) and a full profile, with an exact Compute Β· storage Β· time table. Hyperparameters (optimizer, steps, batch, LoRA rank, …) are in the training cell.

Evaluation results

⏳ Pending β€” run the notebook on a GPU to fill this in. This lab reports throughput (tok/s) Β· latency on a held-out split (see its Evaluate cell).

Inference example

No weights are published yet. After a GPU run, load the checkpoint/adapter the notebook saves (it also has a ready inference cell). Base model: Any (fine-tuned) LLM you serve.

How to fill this repo

  1. Open the notebook in Colab β†’ Runtime β†’ GPU β†’ Run all (runs the real pipeline).
  2. Run its Publish to the Hugging Face Hub step (or HfApi().upload_folder(...)) β€” the checkpoint + metrics.json + figures replace this placeholder.
  • Train / run on a GPU Β· [ ] upload weights Β· [ ] add metrics.json Β· [ ] add figures Β· [ ] swap in the real results card

Limitations

Not yet trained β€” no numbers to report. The pipeline is GPU-heavy (see the compute table); on free Colab use the demo-scale settings. This is an educational, reproducible recipe, not a tuned production release.

License

Code: MIT (this repository). The base model (vllm-project/vllm) and dataset are each under their own licenses β€” check the upstream source before redistribution.

Citation

@misc{ropedia_academy,
  title  = {Ropedia Academy: an interactive course on embodied & spatial AI},
  author = {Ropedia Academy},
  year   = {2026},
  howpublished = {\url{https://chaoyue0307.github.io/ropedia-academy/}}
}

Method / original work: Kwon et al., vLLM / PagedAttention, SOSP 2023.

Related assets


Documented placeholder in the Ropedia Academy collection β€” train it on a GPU to publish the real model. Contributions welcome on GitHub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including cy0307/lm-vllm-serving