notune (Noah)

liked a model about 1 month ago

Wan-AI/Wan2.1-T2V-14B

Text-to-Video • Updated 25 days ago • 99.6k • • 1.15k

reacted to freddyaboulton's post with 🔥 4 months ago

Post

2400

Version 0.0.21 of gradio-pdf now properly loads chinese characters!

liked a dataset 9 months ago

XAI/vlmsareblind

Viewer • Updated Nov 22, 2024 • 8.02k • 335 • 23

liked a Space 10 months ago

240

Convert to Safetensors

🐶

Convert and PR models to Safetensors

liked 2 models 10 months ago

Corcelio/mobius

Text-to-Image • Updated Jun 1, 2024 • 27.5k • • 230

mistralai/Codestral-22B-v0.1

Text Generation • Updated Jul 31, 2024 • 11.7k • 1.24k

liked a Space 11 months ago

690

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection

liked 2 models 11 months ago

cognitivecomputations/dolphin-2.9.1-yi-1.5-34b

Text Generation • Updated May 20, 2024 • 449 • 35

Salesforce/xgen-mm-phi3-mini-instruct-r-v1

Image-Text-to-Text • Updated Feb 3 • 971 • 186

liked 2 models 12 months ago

nvidia/canary-1b

Automatic Speech Recognition • Updated Feb 25 • 50.5k • 418

ByteDance/SDXL-Lightning

Text-to-Image • Updated Apr 3, 2024 • 109k • • 2.01k

New activity in databricks/dbrx-instruct about 1 year ago

How to Fine Tune DBRX-Instruct?

7

#18 opened about 1 year ago by

elysiia

liked a Space about 1 year ago

52

Arena Hard

🦾

Compare model answers to questions

reacted to vikhyatk's post with ❤️ about 1 year ago

Post

3776

Just released a notebook showing how to finetune moondream: https://github.com/vikhyat/moondream/blob/main/notebooks/Finetuning.ipynb

reacted to akhaliq's post with 🚀 about 1 year ago

Post

2256

Mora

Enabling Generalist Video Generation via A Multi-Agent Framework

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework (2403.13248)

Sora is the first large-scale generalist video generation model that garnered significant attention across society. Since its launch by OpenAI in February 2024, no other video generation models have paralleled {Sora}'s performance or its capacity to support a broad spectrum of video generation tasks. Additionally, there are only a few fully published video generation models, with the majority being closed-source. To address this gap, this paper proposes a new multi-agent framework Mora, which incorporates several advanced visual AI agents to replicate generalist video generation demonstrated by Sora. In particular, Mora can utilize multiple visual agents and successfully mimic Sora's video generation capabilities in various tasks, such as (1) text-to-video generation, (2) text-conditional image-to-video generation, (3) extend generated videos, (4) video-to-video editing, (5) connect videos and (6) simulate digital worlds. Our extensive experimental results show that Mora achieves performance that is proximate to that of Sora in various tasks. However, there exists an obvious performance gap between our work and Sora when assessed holistically. In summary, we hope this project can guide the future trajectory of video generation through collaborative AI agents.

liked a Space about 1 year ago

4.24k

Chatbot Arena Leaderboard

🏆

Display chatbot leaderboard results

reacted to akhaliq's post with ❤️ about 1 year ago

Post

VisionLLaMA

A Unified LLaMA Interface for Vision Tasks

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks (2403.00522)

Large language models are built on top of a transformer-based architecture to process textual inputs. For example, the LLaMA stands out among many open-source implementations. Can the same transformer be used to process 2D images? In this paper, we answer this question by unveiling a LLaMA-like vision transformer in plain and pyramid forms, termed VisionLLaMA, which is tailored for this purpose. VisionLLaMA is a unified and generic modelling framework for solving most vision tasks. We extensively evaluate its effectiveness using typical pre-training paradigms in a good portion of downstream tasks of image perception and especially image generation. In many cases, VisionLLaMA have exhibited substantial gains over the previous state-of-the-art vision transformers. We believe that VisionLLaMA can serve as a strong new baseline model for vision generation and understanding.

reacted to smangrul's post with 🤯 about 1 year ago

Post

🚨 Now you can run Starcoder- 2 models locally on your Mac M1 Pro Apple Silicon with 16GB memory! 🧑🏽‍💻 ⚡️✨

Below is the UX with Twinny extension using bigcode/starcoder2-3b for FIM and codellama/CodeLlama-7b-Instruct-hf for chat. Dev tools is showing the prompt being sent to ollama server.

Starcoder-2 is now supported in llama.cpp https://github.com/ggerganov/llama.cpp/pull/5795!

cd llama.cpp
python convert-hf-to-gguf.py ../starcoder2-3b/ --outfile models/starcoder2-3b.gguf --outtype "f16"
./quantize models/starcoder2-3b.gguf models/starcoder2-3b-Q4_K_M.gguf Q4_K_M

For more details, please go through the following tweet thread: https://x.com/sourab_m/status/1764583139798823235?s=20

liked a model about 1 year ago

bigcode/starcoder2-15b

Text Generation • Updated Jun 5, 2024 • 15.3k • • 598

reacted to vladbogo's post with 👍 about 1 year ago

Post

Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.

Keypoints:
* Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments.
* The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames.
* The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small.
* The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.

Paper: Genie: Generative Interactive Environments (2402.15391)
Project page: https://sites.google.com/view/genie-2024/
More detailed overview in my blog: https://huggingface.co/blog/vladbogo/genie-generative-interactive-environments

Congrats to the authors for their work!