sartifyllc
/

AViLaMa

Zero-Shot Image Classification

vision-text-dual-encoder

image generation

text-image embedding

image-text embedding

visual conversional ai

image semantic retrival

african raw resourced languages

Inference Endpoints

Model card Files Files and versions Community

AViLaMa / README.md

innocent-charles's picture

innocent-charles

Update README.md

125640d verified 2 months ago

|

No virus

2.65 kB

	---
	language:
	- multilingual
	- en
	- sw
	- ha
	- yo
	- ig
	- zu
	- sn
	- ar
	- am
	- fr
	- pt
	tags:
	- zero-shot-image-classification
	- image generation
	- visual qa
	- text-image embedding
	- image-text embedding
	- pytorch
	- sartify
	- visual conversional ai
	- image semantic retrival
	- african raw resourced languages
	- safetensors
	- vision-text-dual-encoder
	license: apache-2.0
	library_name: transformers
	---

	# AViLaMa : African Vision-Languages Aligment Pre-Training Model.
	Learning Visual Concepts Directly From African Languages Supervision. [Paper is coming]()

	## Model Details
	AViLaMa is the large open-source text-vision alignment pre-training model in African languages. It brings a way to learn visual concepts directly from African languages supervision. Inspired from OpenAI CLIP, but with more based on African languages to capture the nuances, cultural context, and social aspect use of our languages that are so impossible to get just from machine translation. It includes techniques like agnostic languages encoding, data filtering network etc... All for more than 12 African languages, trained on the #AViLaDa-2B datasets of filtered image-text pairs.

	- Developed by : Sartify LLC (www.sartify.com)
	- Authors : Innocent Charles
	- Funded by : Sartify LLC, Open Source Community, etc..(We always welcome other donors)
	- Model type : multilingual & multimodality transformer
	- Language(s) : en (English), sw (Swahili), ha (Hausa), yo (Yoruba), ig (Igbo), zu (Zulu), sn (Shona), ar (Arabic), am (Amharic), fr (French), pt (Portuguese)
	- License: apache 2.0

	## Load model from hugging face.
	```python
	import torch
	from transformers import AutoModel, AutoTokenizer

	model = AutoModel.from_pretrained("sartifyllc/AViLaMa")
	tokenizer = AutoTokenizer.from_pretrained("sartifyllc/AViLaMa")
	model = model.eval()
	```
	## Model Sources
	- Repository : [AViLaMa-Sources](https://github.com/Sartify/AViLaMa-Sources)
	- Datasets : Coming...
	- Paper : Coming...
	- Demo : Coming...

	## Direct & Downstream Use In African Languages:
	1. zero shot semantic image retrieval and ranking tasks.
	4. zero shot image classification tasks.
	7. visual QA tasks with African languages.
	8. visual conversional GenAI tasks.
	9. image capturing tasks.
	10. images and art generation guiding and conditioning tasks.
	11. text-images analysis tasks.
	12. content moderation task etc....

	## Citation

	BibTeX:
	```bibtex
	AViLaMa paper
	@article{sartifyllc2023africanvision,
	title={AViLaMa: Learning Visual Concepts Directly From African Languages Supervision},
	author={Innocent Charles},
	journal={To be inserted},
	year={2024}
	}
	```