tuanlda78202 (Charles)

liked a dataset 24 days ago

capleaf/viVoice

Viewer • Updated Jul 1, 2024 • 888k • 7.68k • 36

liked a dataset 29 days ago

linhtran92/viet_bud500

Viewer • Updated Feb 29, 2024 • 649k • 1.14k • 49

liked a dataset 30 days ago

parler-tts/libritts_r_filtered

Viewer • Updated Aug 6, 2024 • 359k • 1.69k • 12

liked a model about 1 month ago

WhisperSpeech/WhisperSpeech

Text-to-Speech • Updated Sep 8, 2024 • 223

reacted to AdinaY's post with 👀 about 1 month ago

Post

1496

LLaMA Mesh 🔥 Unifying 3D Mesh Generation with Language Models

Model: Zhengyi/LLaMA-Mesh
Demo: Zhengyi/LLaMA-Mesh
Paper: LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models (2411.09595)

✨ Unified 3D generation & text understanding.
✨ 3D meshes as plain text for seamless LLM integration.
✨ High-quality 3D outputs rivaling specialized models.

upvoted a paper about 2 months ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 63

reacted to clem's post with 🚀 3 months ago

Post

3711

Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!

4 replies

·

reacted to fdaudens's post with 👍 3 months ago

Post

996

🚀 OpenAI's new Whisper "turbo": 8x faster, 40% VRAM efficient, minimal accuracy loss.
🔒 Run it locally in-browser for private transcriptions! Transcribe interviews, audio & video.
⚡️ 40 tokens/sec on my MacBook

🔗 Try it: webml-community/whisper-large-v3-turbo-webgpu
Model: https://huggingface.co/ylacombe/whisper-large-v3-turbo

upvoted a paper 3 months ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 104

liked a model 3 months ago

allenai/Molmo-7B-D-0924

Image-Text-to-Text • Updated Oct 10, 2024 • 234k • 482

upvoted 2 articles 3 months ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14, 2024

• 229

Article

Vision Language Models Explained

Apr 11, 2024

• 238

upvoted an article 4 months ago

Article

Illustrated LLM OS: An Implementational Perspective

By

•

Dec 3, 2023

• 16

reacted to elinas's post with 👀 4 months ago

Post

2157

We conducted an experiment in an effort to revive LLaMA 1 33B as it had unique prose and a lack of "GPT-isms" and "slop" in its pretraining data, as well as being one of the favorites at the time. With multiple finetune runs, we were able to extend the model from it's pretrained base of 2048 to ~12,000 tokens adding approx. 500M tokens in the process. The effective length is 16,384 but it's better to keep it on the lower range. It writes well and in multiple formats. In the future, we have some ideas like implementing GQA. Please take a look and we would love to hear your feedback!

ZeusLabs/Chronos-Divergence-33B

reacted to davidberenstein1957's post with 🔥 4 months ago

Post

1502

Interested in learning about everything Image?

With the rise of recent interest in Vision Language Models (VLMs), we decided to make a push to include an ImageField within Argilla! This means any open source developer can now work on better models for vision ML tasks too and we would like to show you how.

We would love to introduce this new feature to you, so we've prepared a set of notebooks to go over some common image scenarios.
finetune an CLIP retrieval model with sentence transformers
use ColPali+ Qwen VL for RAG and log the results to Argilla
image-generation preference: creating multi-modal preference datasets for free using Hugging Face inference endpoints.

See you on Thursday!

https://lu.ma/x7id1jqu

upvoted a paper 4 months ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 28

liked a model 4 months ago

CohereForAI/c4ai-command-r-08-2024

Text Generation • Updated Sep 27, 2024 • 3.09k • 143

reacted to AlexBodner's post with 👀 4 months ago

Post

3788

💾🧠How much VRAM will you need for training your AI model? 💾🧠
Check out this app where you convert:
Pytorch/tensorflow summary -> needed VRAM
or
Parameter count -> needed VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star 🌟and share it to someone that will find the tool useful!