Correct the spelling error in the ReadMe file: change "guild" to "guide"

#11
by QscQ - opened
Files changed (1) hide show
  1. README.md +14 -5
README.md CHANGED
@@ -60,7 +60,7 @@ pipeline_tag: image-text-to-text
60
  # MiniMax-VL-01
61
 
62
  ## 1. Introduction
63
- We are delighted to introduce our **MiniMax-VL-01** model. It adopts the ViT-MLP-LLM framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM.
64
  MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
65
  The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
66
  Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
@@ -190,9 +190,18 @@ For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/lat
190
  ⚡ Efficient and intelligent memory management
191
  📦 Powerful batch request processing capability
192
  ⚙️ Deeply optimized underlying performance
193
- For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/vllm_deployment_guild.md).
194
 
195
- # 5. Citation
 
 
 
 
 
 
 
 
 
196
 
197
  ```
198
  @misc{minimax2025minimax01scalingfoundationmodels,
@@ -206,9 +215,9 @@ For detailed deployment instructions, please refer to our [vLLM Deployment Guide
206
  }
207
  ```
208
 
209
- ## 5. Chatbot & API
210
  For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
211
 
212
 
213
- ## 6. Contact Us
214
  Contact us at [model@minimaxi.com](mailto:model@minimaxi.com).
 
60
  # MiniMax-VL-01
61
 
62
  ## 1. Introduction
63
+ We are delighted to introduce our **MiniMax-VL-01** model. It adopts the "ViT-MLP-LLM" framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM.
64
  MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
65
  The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
66
  Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
 
190
  ⚡ Efficient and intelligent memory management
191
  📦 Powerful batch request processing capability
192
  ⚙️ Deeply optimized underlying performance
193
+ For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/vllm_deployment_guide.md).
194
 
195
+ ## 5. Function Calling
196
+ MiniMax-VL-01 supports Function Calling capability, enabling the model to intelligently identify when external functions need to be called and output parameters in structured JSON format. With Function Calling, you can:
197
+
198
+ - Let the model recognize implicit function call needs in user requests
199
+ - Receive structured parameter outputs for seamless application integration
200
+ - Support various complex parameter types, including nested objects and arrays
201
+
202
+ Function Calling supports standard OpenAI-compatible format definitions and integrates seamlessly with the Transformers library. For detailed usage instructions, please refer to our [Function Call Guide](./MiniMax-VL-01_Function_Call_Guide.md) or [Chinese Guide](./MiniMax-VL-01_Function_Call_Guide_CN.md).
203
+
204
+ ## 6. Citation
205
 
206
  ```
207
  @misc{minimax2025minimax01scalingfoundationmodels,
 
215
  }
216
  ```
217
 
218
+ ## 7. Chatbot & API
219
  For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
220
 
221
 
222
+ ## 8. Contact Us
223
  Contact us at [model@minimaxi.com](mailto:model@minimaxi.com).