Full Name's picture

Full Name PRO

Gatozu35

·

AI & ML interests

Text-to-Speech, Voice Conversion

Recent Activity

liked a Space 15 minutes ago

LibreChat/LibreChat

liked a model 3 days ago

pcunwa/Kim-Mel-Band-Roformer-FT

liked a model 3 days ago

jarredou/aufr33-viperx-karaoke-melroformer-model

View all activity

Organizations

Gatozu35's activity

upvoted a paper 12 days ago

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

Paper • 2112.07804 • Published Dec 15, 2021 • 1

upvoted a collection 17 days ago

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 17 days ago • 113

upvoted 2 papers about 1 month ago

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

Paper • 2411.18447 • Published Nov 27, 2024 • 1

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Paper • 2411.19842 • Published Nov 29, 2024 • 10

upvoted a collection about 2 months ago

Cosmos Tokenizer

A suite of image and video tokenizers • 12 items • Updated 1 day ago • 29

upvoted a collection 2 months ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Nov 27, 2024 • 290

upvoted 3 papers 2 months ago

Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis

Paper • 2410.23320 • Published Oct 30, 2024 • 8

Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

Paper • 2403.12422 • Published Mar 19, 2024 • 1

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Paper • 2410.11081 • Published Oct 14, 2024 • 19

upvoted a collection 3 months ago

LAION Audio

9 items • Updated Sep 30, 2024 • 1

upvoted a paper 4 months ago

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Paper • 2206.04658 • Published Jun 9, 2022 • 3

upvoted a collection 4 months ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 225

upvoted 3 papers 4 months ago

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Paper • 2406.06484 • Published Jun 10, 2024 • 3

Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 6

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Paper • 2409.07146 • Published Sep 11, 2024 • 19

upvoted 3 papers 5 months ago

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

Paper • 2401.10460 • Published Jan 19, 2024 • 1

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

Paper • 2402.00892 • Published Jan 31, 2024 • 13

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3

upvoted a collection 5 months ago

Parler-TTS: fully open-source high-quality TTS

If you want to find out more about how these models were trained and even fine-tune them yourself, check-out the Parler-TTS repository on GitHub. • 8 items • Updated Dec 2, 2024 • 49

upvoted an article 5 months ago

Article

Mixture of Depth is Vibe

By

•

Apr 22, 2024

• 44