metadata

title: README
emoji: 👀
colorFrom: purple
colorTo: indigo
sdk: static
pinned: false
short_description: VLM assets for Llama-3.2-11B-Vision-Instruct

🇳🇴🇩🇰 Open Source Vision Language Model assets

Building on the philosophy of open source with the Llama-models 🦙, this repo is an effort to support development of small VLM's in the Scandinavian languages. Aa we are only fluent in Norwegian and Danish, we have focused on these two languages. However, we encourgage the community (🇫🇮🇸🇪🇫🇴🇮🇸🇬🇱Sami) to help build on our work and extend the coverage.

The current models and data focus on transcription and annotiation of documents in Norwegian and Danish, going beyond the limitations of OCR.

We expect this line of work to help businesses, government institutions and citizens alike. Please se for how to run inference on the final models.

In these collections you will find:

💽 Datasets for fine-tuning VLM
- 🇳🇴 See collection: https://huggingface.co/collections/MykMaks/datasets-nb-679f081d89be13de6a9fe71b
- 🇩🇰 See collection: https://huggingface.co/collections/MykMaks/datasets-da-679f07b68e587e67bba71fdd
💾 Training code
- Approach: We trained every epoch with a different prompt, stored the adapter as a checkpoint and continued to next prompt-dataset pair.
- MM checkpoints: https://github.com/Mikeriess/llama33_resources/tree/MM-models
- V-I checkpoints: https://github.com/Mikeriess/llama33_resources/tree/v-i-models
🤖 Model LORA-adapter checkpoints for Llama-3.2-11B-Vision-Instruct
- The model is iteratively trained over all datasets:
  - The suffix of each file denotes the order of the checkpoint, along with the dataset that it was fine-tuned on
  - Prompts can be tracked in the respective experiment.json files in the MM and V-I code repositories
💸 Final full-precision merged models:
- See collection: 🦙 https://huggingface.co/collections/MykMaks/models-679f08ab3ea3e21df62c87e8
  - MykMaks/llama-3.2-11B-MM-20-MykMaks_da-wit-merged
  - MykMaks/llama-3.2-11B-V-I_39_MykMaks_NorwegianDataset-compressed-pt2-merged