leloy
/

Anole-7b-v0.1-hf

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

leloy commited on about 17 hours ago

Commit

493b0db

•

1 Parent(s): f4498f2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ pipeline_tag: image-text-to-text
 ---
 ## Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
-Note: This is a huggingface-compatible version of the model converted by [leloy](https://huggingface.co/leloy). Currently only works [this branch of the Transformers library](https://github.com/leloykun/transformers/tree/fc--anole).
 Anole is the first open-source, autoregressive, and natively trained large multimodal model capable of interleaved image-text generation (without using stable diffusion). While it builds upon the strengths of Chameleon, Anole excels at the complex task of generating coherent sequences of alternating text and images. Through an innovative fine-tuning process using a carefully curated dataset of approximately 6,000 images, Anole achieves remarkable image generation and understanding capabilities with minimal additional training. This efficient approach, combined with its open-source nature, positions Anole as a catalyst for accelerated research and development in multimodal AI. Preliminary tests demonstrate Anole's exceptional ability to follow nuanced instructions, producing high-quality images and interleaved text-image content that closely aligns with user prompts.

 ---
 ## Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
+Note: This is a huggingface-compatible version of the model converted by [leloy](https://huggingface.co/leloy). Currently only works on [this branch of the Transformers library](https://github.com/leloykun/transformers/tree/fc--anole).
 Anole is the first open-source, autoregressive, and natively trained large multimodal model capable of interleaved image-text generation (without using stable diffusion). While it builds upon the strengths of Chameleon, Anole excels at the complex task of generating coherent sequences of alternating text and images. Through an innovative fine-tuning process using a carefully curated dataset of approximately 6,000 images, Anole achieves remarkable image generation and understanding capabilities with minimal additional training. This efficient approach, combined with its open-source nature, positions Anole as a catalyst for accelerated research and development in multimodal AI. Preliminary tests demonstrate Anole's exceptional ability to follow nuanced instructions, producing high-quality images and interleaved text-image content that closely aligns with user prompts.