YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Fine-tune Vision AI Model for Volume Recognition
This project demonstrates how to fine-tune a vision AI model for recognizing fluid volumes in test tubes, with applications across medical, laboratory, and industrial settings.
Prerequisites
1. HuggingFace Setup (Required)
- Create an account at huggingface.co
- Go to Settings → Access Tokens
- Create a new token (read access)
- Copy and save your token - you'll need it later
Quick Start
Open terminal in your JarvisLabs workspace:
File > New Launcher > TerminalClone the repository:
git clone https://github.com/ictBioRtc/finetune_florence2_vision_language_model.gitNavigate to project directory:
cd finetune_vision_ai_modelInstall dependencies:
pip install -r requirements.txtRun the application:
python app.pyCopy the public URL provided (e.g., https://ff20bc33e416f3319f.gradio.live)
Open in a new browser tab
Using the Application
Step 1: Test Initial Model (Inference Tab)
- Unzip the provided
test_images.zip - Go to "Inference" tab
- Upload a test image
- Leave other settings at default
- Click "Run Inference"
- Observe how the untrained model performs
Step 2: Train the Model (Training Tab)
- Dataset:
ictbiortc/beaker-volume-recognition-dataset - Change epochs to 15 (for workshop purposes)
- Click "Start Training"
- Note: Full training could take ~5 hours
Step 3: Upload Model to HuggingFace
- After training completes, click "Upload to Hub"
- Enter your model name (e.g.,
your-username/beaker-volume-recognition-model) - Paste your HuggingFace token
- Click "Upload"
Step 4: Important Configuration Update
- Go to your model on HuggingFace
- Navigate to "Files and versions"
- Find
config.json - Edit line 165 from:
to:"model_type": "","model_type": "davit",
Step 5: Evaluate Your Model
- Return to the app
- Go to "Evaluate" tab
- Upload a test image
- Use your trained model
- Compare results with the initial inference
Applications
This volume recognition model has potential applications in:
- IV Fluid Monitoring
- Laboratory Automation
- Medication Dosing
- Urine Monitoring
- Manufacturing Quality Control
- Chemical Processing
- Beverage Industry
- Petroleum Industry
Training Notes
- Full training typically takes days
- Workshop version uses 15 epochs (~5 hours)
- Larger epoch numbers yield better results
- GPU acceleration is recommended
Troubleshooting
Common issues:
- "Model not loading": Check your internet connection
- "Training too slow": Verify GPU availability
- "Upload failed": Verify your HuggingFace token
- "Config error": Double-check the davit model_type update
Next Steps
After successful training:
- Experiment with different epochs
- Try different image types
- Test various fluid volumes
- Integrate with your specific use case
Congratulations! You've successfully:
- Tested a base vision model
- Fine-tuned it for volume recognition
- Uploaded it to HuggingFace
- Created a practical AI solution for real-world applications
This workshop demonstrates how vision language models can be adapted for specific industrial and medical applications.