How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published 13 days ago • 47
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 52
NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data Paper • 2402.15343 • Published Feb 23 • 7
Linear Transformers with Learnable Kernel Functions are Better In-Context Models Paper • 2402.10644 • Published Feb 16 • 72
Saiga Collection LLaMA-based or Mistral-based Russian chat models • 5 items • Updated 19 days ago • 11
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 50
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions Paper • 2311.12793 • Published Nov 21, 2023 • 18
Instruct datasets in Russian Collection All datasets have been translated using Google Translate • 9 items • Updated 14 days ago • 4
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 25
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline Paper • 2311.13073 • Published Nov 22, 2023 • 53
Tulu V2 Suite Collection The set of models associated with the paper "Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2" • 19 items • Updated Feb 1 • 43
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 74