File size: 2,653 Bytes
46a6847
b1d738d
 
 
 
 
 
 
 
 
 
 
 
 
 
d6953ca
a5092d8
 
 
 
 
 
88574b0
 
bcb9470
a1b79c1
58683de
b1d738d
a5092d8
6d444a0
 
9668814
cac8554
67c26a5
6d444a0
c97756c
6d444a0
111408f
125640d
c9f307b
111408f
712a623
111408f
6d444a0
712a623
 
 
 
 
 
 
 
 
cc9c00f
 
 
 
 
 
681c273
cc9c00f
 
681c273
cc9c00f
681c273
f8a1f78
cc9c00f
681c273
6d444a0
681c273
6d444a0
f8a1f78
 
 
 
 
125640d
f8a1f78
4b6816e
f8a1f78
c9f307b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
language:
- multilingual
- en
- sw
- ha
- yo
- ig
- zu
- sn
- ar
- am
- fr
- pt
tags:
- zero-shot-image-classification
- image generation
- visual qa
- text-image embedding
- image-text embedding
- pytorch
- sartify
- visual conversional ai
- image semantic retrival
- african raw resourced languages
- safetensors
- vision-text-dual-encoder
license: apache-2.0
library_name: transformers
---

# AViLaMa : African Vision-Languages Aligment Pre-Training Model. 
Learning Visual Concepts Directly From African Languages Supervision. [Paper is coming]()

## Model Details
AViLaMa is the large open-source text-vision alignment pre-training model in African languages. It brings a way to learn visual concepts directly from African languages supervision. Inspired from OpenAI CLIP, but with more based on African languages to capture the nuances, cultural context, and social aspect use of our languages that are so impossible to get just from machine translation. It includes techniques like agnostic languages encoding, data filtering network etc... All for more than 12 African languages, trained on the #AViLaDa-2B datasets of filtered image-text pairs.  

- **Developed by :** Sartify LLC (www.sartify.com)
- **Authors :** Innocent Charles
- **Funded by :** Sartify LLC, Open Source Community, etc..(We always welcome other donors)
- **Model type :** multilingual & multimodality transformer
- **Language(s) :** en (English), sw (Swahili), ha (Hausa), yo (Yoruba), ig (Igbo), zu (Zulu), sn (Shona), ar (Arabic), am (Amharic), fr (French), pt (Portuguese)
- **License:** apache 2.0

## Load model from hugging face. 
```python
import torch
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("sartifyllc/AViLaMa")
tokenizer = AutoTokenizer.from_pretrained("sartifyllc/AViLaMa")
model = model.eval()
```
## Model Sources
- **Repository :** [AViLaMa-Sources](https://github.com/Sartify/AViLaMa-Sources)
- **Datasets :** Coming...
- **Paper :** Coming...
- **Demo :** Coming...

## Direct & Downstream Use In African Languages: 
1. zero shot semantic image retrieval and ranking tasks. 
4. zero shot image classification tasks. 
7. visual QA tasks with African languages.
8. visual conversional GenAI tasks.
9. image capturing tasks.
10. images and art generation guiding and conditioning tasks.
11. text-images analysis tasks.
12. content moderation task etc....

## Citation

**BibTeX:**
```bibtex
AViLaMa paper
@article{sartifyllc2023africanvision,
  title={AViLaMa: Learning Visual Concepts Directly From African Languages Supervision},
  author={Innocent Charles},  
  journal={To be inserted},
  year={2024}
}
```