SweepGPM
SweepGPM is a multimodal dialogue model for sweeping robots in home scenarios, fine-tuned from VisualGLM-6B. The language model is based on ChatGLM-6B (6.2B parameters, frozen), and the image encoder uses CLIP ViT-L/14 (frozen). The Q-Former, fully connected projection layer, and LoRA adapters (rank=4, last 2 layers only) are trained to adapt the model to the domain knowledge of sweeping robots.
Performance
| Downstream Task | Metric | SweepGPM |
|---|---|---|
| Room Type Classification | Mean Accuracy | 84.3% |
| Obstacle Detection | mAP@0.5 | 86.1% |
| Lost Item Search | Mean Recall | 80.2% |
Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bazaar-research/sweepgpm", trust_remote_code=True)
model = AutoModel.from_pretrained("bazaar-research/sweepgpm", trust_remote_code=True).half().cuda()
image_path = "your_image.jpg"
response, history = model.chat(tokenizer, image_path, "Give the room type in the image.", history=[])
print(response)
response, history = model.chat(tokenizer, image_path, "Provide fine-grained bounding boxes for all objects in the image.", history=history)
print(response)
Dependencies
pip install SwissArmyTransformer>=0.3.6 torch>=2.0.1 torchvision transformers>=4.31.0 cpm_kernels peft>=0.4.0
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support