LiquidAI
/

LFM2-700M

@@ -74,7 +74,7 @@ tags:
 LFM2 is a new generation of hybrid models developed by [Liquid AI](https://www.liquid.ai/), specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.
-We're releasing the weights of three post-trained checkpoints with 350M, 700M, and 1.2B parameters. They provide the following key features to create AI-powered edge applications:
 * **Fast training & inference** – LFM2 achieves 3x faster training compared to its previous generation. It also benefits from 2x faster decode and prefill speed on CPU compared to Qwen3.
 * **Best performance** – LFM2 outperforms similarly-sized models across multiple benchmark categories, including knowledge, mathematics, instruction following, and multilingual capabilities.
@@ -89,15 +89,15 @@ Due to their small size, **we recommend fine-tuning LFM2 models on narrow use ca
 They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations.
 However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills.
-| Property            | [**LFM2-350M**](https://huggingface.co/LiquidAI/LFM2-350M)                 | [**LFM2-700M**](https://huggingface.co/LiquidAI/LFM2-700M)                 | [**LFM2-1.2B**](https://huggingface.co/LiquidAI/LFM2-1.2B)                 |
-| ------------------- | ----------------------------- | ----------------------------- | ----------------------------- |
-| **Parameters**      | 354,483,968                   | 742,489,344                   | 1,170,340,608                 |
-| **Layers**          | 16 (10 conv + 6 attn)         | 16 (10 conv + 6 attn)         | 16 (10 conv + 6 attn)         |
-| **Context length**  | 32,768 tokens                 | 32,768 tokens                 | 32,768 tokens                 |
-| **Vocabulary size** | 65,536                        | 65,536                        | 65,536                        |
-| **Precision**       | bfloat16                      | bfloat16                      | bfloat16                      |
-| **Training budget** | 10 trillion tokens            | 10 trillion tokens            | 10 trillion tokens            |
-| **License**         | LFM Open License v1.0         | LFM Open License v1.0         | LFM Open License v1.0         |
 **Supported languages**: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
@@ -152,13 +152,11 @@ The candidate with ID 12345 is currently in the "Interview Scheduled" stage for
 ## 🏃 How to run LFM2
-You can run LFM2 with transformers and llama.cpp. vLLM support is coming.
 ### 1. Transformers
-To run LFM2, you need to install Hugging Face [`transformers`](https://github.com/huggingface/transformers) v4.55 or more recent as follows:
-```python
 pip install -U transformers
 ```
@@ -168,7 +166,7 @@ Here is an example of how to generate an answer with transformers in Python:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load model and tokenizer
-model_id = "LiquidAI/LFM2-700M"
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
@@ -206,7 +204,41 @@ print(tokenizer.decode(output[0], skip_special_tokens=False))
 You can directly run and test the model with this [Colab notebook](https://colab.research.google.com/drive/1_q3jQ6LtyiuPzFZv7Vw8xSfPU5FwkKZY?usp=sharing).
-### 2. Llama.cpp
 You can run LFM2 with llama.cpp using its [GGUF checkpoint](https://huggingface.co/LiquidAI/LFM2-700M-GGUF). Find more information in the model card.

 LFM2 is a new generation of hybrid models developed by [Liquid AI](https://www.liquid.ai/), specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.
+We're releasing the weights of four post-trained checkpoints with 350M, 700M, 1.2B, and 2.6 parameters. They provide the following key features to create AI-powered edge applications:
 * **Fast training & inference** – LFM2 achieves 3x faster training compared to its previous generation. It also benefits from 2x faster decode and prefill speed on CPU compared to Qwen3.
 * **Best performance** – LFM2 outperforms similarly-sized models across multiple benchmark categories, including knowledge, mathematics, instruction following, and multilingual capabilities.
 They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations.
 However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills.
+| Property            | [**LFM2-350M**](https://huggingface.co/LiquidAI/LFM2-350M) | [**LFM2-700M**](https://huggingface.co/LiquidAI/LFM2-700M) | [**LFM2-1.2B**](https://huggingface.co/LiquidAI/LFM2-1.2B) | [**LFM2-2.6B**](https://huggingface.co/LiquidAI/LFM2-2.6B) |
+| ------------------- | ----------------------------- | ----------------------------- | ----------------------------- | ----------------------------- |
+| **Parameters**      | 354,483,968                   | 742,489,344                   | 1,170,340,608                 | 2,569,272,320                 |
+| **Layers**          | 16 (10 conv + 6 attn)         | 16 (10 conv + 6 attn)         | 16 (10 conv + 6 attn)         | 30 (22 conv + 8 attn)         |
+| **Context length**  | 32,768 tokens                 | 32,768 tokens                 | 32,768 tokens                 | 32,768 tokens                 |
+| **Vocabulary size** | 65,536                        | 65,536                        | 65,536                        | 65,536                        |
+| **Precision**       | bfloat16                      | bfloat16                      | bfloat16                      | bfloat16                      |
+| **Training budget** | 10 trillion tokens            | 10 trillion tokens            | 10 trillion tokens            | 10 trillion tokens            |
+| **License**         | LFM Open License v1.0         | LFM Open License v1.0         | LFM Open License v1.0         | LFM Open License v1.0         |
 **Supported languages**: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
 ## 🏃 How to run LFM2
 ### 1. Transformers
+To run LFM2, you need to install Hugging Face [`transformers`](https://github.com/huggingface/transformers) v4.55 or a more recent version as follows:
+```bash
 pip install -U transformers
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load model and tokenizer
+model_id = "LiquidAI/LFM2-1.2B"
 model = AutoModelForCausalLM.from_pretrained(
     model_id,
     device_map="auto",
 You can directly run and test the model with this [Colab notebook](https://colab.research.google.com/drive/1_q3jQ6LtyiuPzFZv7Vw8xSfPU5FwkKZY?usp=sharing).
+### 2. vLLM
+You need to install [`vLLM`](https://github.com/vllm-project/vllm) v0.10.2 or a more recent version as follows:
+```bash
+uv pip install vllm==0.10.2 --extra-index-url https://wheels.vllm.ai/0.10.2/ --torch-backend=auto
+```
+Here is an example of how to use it for inference:
+```python
+from vllm import LLM, SamplingParams
+prompts = [
+    "What is C. elegans?",
+    "Say hi in JSON format",
+    "Define AI in Spanish"
+]
+sampling_params = SamplingParams(
+    temperature=0.3,
+    min_p=0.15,
+    repetition_penalty=1.05
+)
+llm = LLM(model="LiquidAI/LFM2-700M")
+outputs = llm.generate(prompts, sampling_params)
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
+```
+### 3. llama.cpp
 You can run LFM2 with llama.cpp using its [GGUF checkpoint](https://huggingface.co/LiquidAI/LFM2-700M-GGUF). Find more information in the model card.