File size: 2,886 Bytes
1f07473 27f3b48 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
license: cc-by-4.0
language:
- en
- tr
tags:
- VLM
- image2text
- lm
---
# TeLVE: Turkish efficient Language Vision Engine 🧿
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![Models: v1.0](https://img.shields.io/badge/Models-v1.0%2c%20v1.0dep-blue)](https://huggingface.co/outsu/TeLVE)
## First Turkish VLM ever!
TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing.
No module named 'imagine'
![TeLVE logo](<teLVE_logo.png>)
## Model Description
TeLVE combines:
- 🖼️ Vision Transformer (ViT-base-patch16-224)
- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased)
- 🔄 Cross-attention mechanism for vision-language fusion
### Version Logs
- **TeLVE v1.0**: Trained on Unsplash Lite dataset
- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)*
## Usage
The model can be used in two ways:
### Inference (imagine.py)
```python
# Generate captions for images
python imagine.py
```
This script:
- Loads a trained TeLVE model
- Takes images from `images` directory
- Generates Turkish captions for each image
- Outputs the results to console
### Training (main.py)
Users can train their own models with ViT and BERT encoders.
```python
# Train a new model
python main.py
```
This script:
- Loads and preprocesses image-caption pairs
- Initializes ViT and BERT encoders
- Trains the combined model
- Saves the model and tokenizer
## Performance
Performance scores will be evaluated.
<!--
| Model Version | Dataset | BLEU-4 | METEOR | CIDEr |
|--------------|---------|---------|---------|--------|
| TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* |
| TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |-->
## Citation
```bibtex
@software{telve2024,
author = {Öğüt Su Karagün},
title = {TeLVE: Turkish efficient Language Vision Engine},
year = {2024},
url = {https://huggingface.co/outsu/TeLVE}
}
```
## License
<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://huggingface.co/outsu/TeLVE">TeLVE</a> © 2024 by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://outsu.github.io">Öğüt Su Karagün</a> is licensed under <a href="https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">Creative Commons Attribution 4.0 International</a></p> |