Edit model card

Model Details

We have developed and released the family llama3s. This family is natively understanding audio and text input.

We continual pretrain on the expanded vocabulary homebrewltd/llama3.1-s-whispervq-init with 900M tokens from homebrewltd/raw-speech-whispervq-v1 dataset.

Model developers Homebrew Research.

Input Text and sound.

Output Text.

Model Architecture Llama-3.

Language(s): English.

Intended Use

Intended Use Cases This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.

Out-of-scope The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.

Training process

Training Metrics Image: Below is a snapshot of the training loss curve visualized.

train_log

Hardware

GPU Configuration: Cluster of 10x NVIDIA A6000-48GB.

GPU Usage:

  • Continual Training: 30 hours.

Training Arguments

We utilize torchtune library for the latest FSDP2 training code implementation.

Parameter Continual Training
Epoch 1
Global batch size 480
Learning Rate 2e-4
Learning Scheduler Cosine with warmup
Optimizer AdamW fused
Warmup Steps 50
Weight Decay 0.01
Max Sequence Length 512

Citation Information

BibTeX:

@article{Llama3-S: Sound Instruction Language Model 2024,
  title={Llama3-S},
  author={Homebrew Research},
  year=2024,
  month=August},
  url={https://huggingface.co/homebrewltd/llama3.1-s-2024-08-15}

Acknowledgement

Downloads last month
14
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train homebrewltd/llama3-s-base-v0.2