Edit model card

1S-Lab, Nanyang Technological University  2Microsoft Research, Redmond

This weight is for initilizing training for Otter. It's directly converted from Openflamingo.

You can load and try this model using

model = OtterForConditionalGeneration.from_pretrained("luodian/OTTER-MPT7B-Init", device_map="sequential")
model.text_tokenizer.padding_side = "left"
tokenizer = model.text_tokenizer
image_processor = transformers.CLIPImageProcessor()
model.eval()

You can also start training Otter via the commands

python -m accelerate.commands.launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \
pipeline/train/instruction_following.py \
--pretrained_model_name_or_path=luodian/OTTER-MPT7B-Init \
--mimicit_path=/data/azure_storage/otter/mimicit/xx/xx_instructions.json \
--images_path=/data/azure_storage/otter/mimicit/xx/xx.json \
--batch_size=4 --num_epochs=1 --report_to_wandb \
--wandb_entity=ntu-slab \
--external_save_dir=/data/bli/checkpoints \
--save_hf_model \
--run_name=OTTER-MPT1B \
--wandb_project=OTTER-MPT1B \
--workers=4 \
--lr_scheduler=cosine \
--learning_rate=1e-5 \
--warmup_steps_ratio=0.01

If you wish to init a video instruction tuning, you should add

"max_num_frames": 128

to config.json inside the folder.

Downloads last month
132
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.