File size: 1,422 Bytes
06e402e a093459 06e402e 48a2798 b292df8 dbb98f9 7c1495b dbb98f9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
license: mit
base_model:
- mistralai/Pixtral-12B-2409
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- lora
datasets:
- Multimodal-Fatima/FGVC_Aircraft_train
- takara-ai/FloodNet_2021-Track_2_Dataset_HF
---
# pixtral_aerial_VQA_adapter
## Model Details
- **Type**: LoRA Adapter
- **Total Parameters**: 6,225,920
- **Memory Usage**: 23.75 MB
- **Precisions**: torch.float32
- **Layer Types**:
- lora_A: 40
- lora_B: 40
## Intended Use
- **Primary intended uses**: Processing aerial footage of construction sites for structural and construction surveying.
- Can also be applied to any detailed VQA use cases with aerial footage.
## Training Data
- **Dataset**:
1. FloodNet Track 2 dataset
2. Subset of FGVC Aircraft dataset
3. Custom dataset of 10 image-caption pairs created using Pixtral
## Training Procedure
- **Training method**: LoRA (Low-Rank Adaptation)
- **Base model**: Ertugrul/Pixtral-12B-Captioner-Relaxed
- **Training hardware**: Nebius-hosted NVIDIA H100 machine
## Citation
```bibtext
@misc{rahnemoonfar2020floodnet,
title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding},
author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy},
year={2020},
eprint={2012.02951},
archivePrefix={arXiv},
primaryClass={cs.CV},
doi={10.48550/arXiv.2012.02951}
}
``` |