dan su

sudanenator

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

Project Fluently's profile picture

sudanenator's activity

reacted to nyuuzyou's post with πŸ”₯ 9 days ago
view post
Post
5486
πŸ‡·πŸ‡Ί Russian Forum Messages Dataset - nyuuzyou/ruforum

Collection of approximately 58 million Russian forum messages featuring:

- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content

This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.
reacted to Fishtiks's post with πŸ‘ 11 days ago
view post
Post
1522
I'm looking for a YouTube video summarizer to run locally. I did a search, but all of the models and spaces I was able to find here didn't work, which I find surprising, since it's a great tool I already use. Perhaps one of you can provide a better option, or just tell me what this actually is to get it: https://dev.gptcall.pages.dev/chat#id=&contactName=Youtube+summarizer

Other functionality I'd like to see is a genre-based music creation and alteration model. "Make it country" or "do a freestyle rap," as examples. I'm willing to work with someone on this, because I'd need help understanding. I'd also like to make medical AI, like Dr. Samantha, that functions like a PDR well, and doesn't get confused by drug names.
replied to ajibawa-2023's post 13 days ago
view reply

dose this dataset human record or generate by text-to-speech syntheis?

reacted to ajibawa-2023's post with πŸ‘ 13 days ago
view post
Post
3898
Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.
Β·
reacted to mrs83's post with 😎 29 days ago
view post
Post
3203
πŸš€ Just released a PoC: Kurtis-E1 MLX Voice Agent

An offline, privacy-first voice assistant built for macOS (Apple Silicon), designed for empathetic, short-form interactions.

🧠 Powered by:
- Whisper (via MLX) for speech-to-text: https://pypi.org/project/mlx-whisper/
- Kurtis-E1 (a custom SmolLM2 LLM) via Ollama
- Coqui-TTS XTTSv2 for multilingual TTS
- Optional translation layer via TowerInstruct-13B-v0.1 for non-English voice input/output: Unbabel/TowerInstruct-13B-v0.1

🎧 Everything runs entirely on-device (Mac Mini M4 Max - 24gb) β€” no cloud, no remote API calls, no data leakage.
πŸ’‘ Code is fully handcrafted (no AI-generated code), and designed to showcase what’s possible with local models, even on laptops.
πŸ› οΈ Open to contributions, ideas (e.g., LM Studio for MLX inference, MLX worker subprocess, optimize for latency and VRAM usage).

πŸ‘‰ Video demo (Italian): https://www.youtube.com/watch?v=8-1PcmUStaI

PoC: https://github.com/ethicalabs-ai/Kurtis-E1-MLX-Voice-Agent
Kurtis-E1: ethicalabs/kurtis-e1-67a9148e0836885c44c7902c
Kurtis-E1 WebGPU: ethicalabs/Kurtis-E1-WebGPU
  • 2 replies
Β·
reacted to AdinaY's post with 😎 about 2 months ago
view post
Post
4042
Exciting releases from the Chinese community this FebruaryπŸ”₯
πŸ‘‰ https://huggingface.co/collections/zh-ai-community/2025-february-67a35aaa68e97812def5b6ef

MLLM:
✨ Ovis2 by Alibaba
AIDC-AI/ovis2-67ab36c7e497429034874464
✨ Step Audio Chat by StepFun AI
stepfun-ai/step-audio-67b33accf45735bb21131b0b

Audio:
✨ Step Audio TTS by StepFunAI
stepfun-ai/Step-Audio-TTS-3B
✨ InspireMusic by Alibaba
FunAudioLLM
✨ Baichuan Audio by BaichuanAI
baichuan-inc/Baichuan-Audio-Instruct

Video:
✨ Wan2.1 by Alibaba_Wan
Wan-AI/Wan2.1-T2V-14B
✨ Stepvideo-T2V by StepFun AI
stepfun-ai/stepvideo-t2v
✨ SkyReels-V1 by Skywork
Skywork/skyreels-v1-67b34676ff65b4ec02d16307
✨ LLaDA-8B by RenminUniversity
GSAI-ML/LLaDA-8B-Instruct

MoE:
✨ Moonlight-16B by MoonshotAI (Kimi)
moonshotai/Moonlight-16B-A3B-Instruct

Reasoning:
✨ TinyR1-32B by Qihoo360
qihoo360/TinyR1-32B-Preview

Dataset:
✨ Chinese DeepSeek R1-Distill data -110k
Congliu/Chinese-DeepSeek-R1-Distill-data-110k
reacted to MonsterMMORPG's post with πŸ”₯ about 2 months ago
view post
Post
3522
Wan 2.1 AI Video Model: Ultimate Step-by-Step Tutorial for Windows & Affordable Private Cloud Setup : https://youtu.be/hnAhveNy-8s

https://youtu.be/hnAhveNy-8s

Please check all screenshots to see latest news and updates after the tutorial video

Alibaba’s new Wan 2.1 text-to-video, video-to-video and image-to-video Open Source AI is unbelievable. In this tutorial I will show how you can install Wan 2.1 all publicly published models into your Windows PC with 1-click installation and use them with the easiest possible way. With the Gradio APP I have developed, you will be able to use Wan AI with as low as 3.5GB VRAM having GPUs. Furthermore, for those who want to utilize powerful private cloud GPUs with cheapest possible prices, I will show how to 1-click and install Wan 2.1 on Massed Compute and on RunPod. Additionally, I will compare performance of RTX 3090 TI with RTX 5090 on all Wan 2.1 models. You will be shocked to see performance of RTX 5090. Also the APP I developed supports all RTX 5000 series on Windows with Python VENV natively. You don't need Linux or WSL.

πŸ”— Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial) ‡️
▢️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-123105403

πŸ”— SECourses Official Discord 9500+ Members ‡️
▢️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

πŸ”— Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub ‡️
▢️ https://github.com/FurkanGozukara/Stable-Diffusion

πŸ”— SECourses Official Reddit - Stay Subscribed To Learn All The News and More ‡️
▢️ https://www.reddit.com/r/SECourses/

πŸ”— MSI RTX 5090 TRIO FurMark Benchmarking + Overclocking + Noise Testing and Comparing with RTX 3090 TI ‡️
▢️ https://youtu.be/uV3oqdILOmA

πŸ”— RTX 5090 Tested Against FLUX DEV, SD 3.5 Large, SD 3.5 Medium, SDXL, SD 1.5, AMD 9950X + RTX 3090 TI ‡️
▢️ https://youtu.be/jHlGzaDLkto
  • 1 reply
Β·
reacted to Kseniase's post with βž•πŸ‘ about 2 months ago
view post
Post
9703
8 Free Sources about AI Agents:

Agents seem to be everywhere and this collection is for a deep dive into the theory and practice:

1. "Agents" Google's whitepaper by Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic -> https://www.kaggle.com/whitepaper-agents
Covers agents, their functions, tool use and how they differ from models

2. "Agents in the Long Game of AI. Computational Cognitive Modeling for Trustworthy, Hybrid AI" book by Marjorie McShane, Sergei Nirenburg, and Jesse English -> https://direct.mit.edu/books/oa-monograph/5833/Agents-in-the-Long-Game-of-AIComputational
Explores building AI agents, using Hybrid AI, that combines ML with knowledge-based reasoning

3. "AI Engineer Summit 2025: Agent Engineering" 8-hour video -> https://www.youtube.com/watch?v=D7BzTxVVMuw
Experts' talks that share insights on the freshest Agent Engineering advancements, such as Google Deep Research, scaling tips and more

4. AI Agents Course from Hugging Face -> https://huggingface.co/learn/agents-course/en/unit0/introduction
Agents' theory and practice to learn how to build them using top libraries and tools

5. "Artificial Intelligence: Foundations of Computational Agents", 3rd Edition, book by David L. Poole and Alan K. Mackworth -> https://artint.info/3e/html/ArtInt3e.html
Agents' architectures, how they learn, reason, plan and act with certainty and uncertainty

6. "Intelligent Agents: Theory and Practice" book by Michael Wooldridge -> https://www.cs.ox.ac.uk/people/michael.wooldridge/pubs/ker95/ker95-html.html
A fascinating option to dive into how agents were seen in 1995 and explore their theory, architectures and agent languages

7. The Turing Post articles "AI Agents and Agentic Workflows" on Hugging Face -> @Kseniase
We explore agentic workflows in detail and agents' building blocks, such as memory and knowledge

8. Our collection "8 Free Sources to Master Building AI Agents" -> https://www.turingpost.com/p/building-ai-agents-sources
Β·
reacted to mkurman's post with πŸ‘ 2 months ago
view post
Post
2044
I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out 😊

Any 🌟are more than welcome πŸ€—

https://github.com/mkurman/grpo-llm-evaluator
reacted to CultriX's post with ❀️ 2 months ago
view post
Post
2559
Final upgrade to the Multi-Agent Task Completion Space: CultriX/MultiAgent-CodeTask .

It now includes :
- a live stream of the progress being made on the task (see included video),
- The following components:
1. Automatic prompt optimization
2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop)
3. A coding agent to complete the task
4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved.
5. A testing agent that tests the approved code or provides information on how to test it.
6. A documentation agent that provides documentation and a help message for the approved and tested code.

reacted to davidberenstein1957's post with πŸ€— 3 months ago
reacted to nyuuzyou's post with πŸ‘ 3 months ago
view post
Post
2473
πŸ“± UI Navigation Corpus - teleren/ui-navigation-corpus

A comprehensive collection of mobile and web UI elements created by a new member of the Hugging Face community @teleren . I'm glad that I was able to provide a little help together with @its5Q to get this dataset published.

This dataset contains:
- Screenshots and recordings of mobile (iOS/Android) and web interfaces
- UI navigation annotations and metadata
- Screen categorization tags and text extractions
- Navigation paths and screen relationships
- Version control for UI imagery

Perfect for training UI navigation agents and understanding interface patterns. The dataset provides detailed annotations linking screens, sections, and navigation flows together.
reacted to chansung's post with πŸ‘ 3 months ago
view post
Post
2076
Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.


Model: deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1
  • 1 reply
Β·
reacted to danielhanchen's post with πŸ”₯ 3 months ago
reacted to reddgr's post with πŸ‘€ 3 months ago
view post
Post
2363
Major update on the Talking to Chatbots dataset! Expanded the 'wrapped' dataset (one row per chat) to 2.86k records, and the 'unwrapped' version (one row per conversation turn) to 11k records. The main source is my ChatGPT archive with nearly 2 years of chats. It is still a work in progress as I incorporate chats from other sources and qualitative metrics (SCBN) for responses.

reddgr/talking-to-chatbots-unwrapped-chats

reddgr/talking-to-chatbots-chats

reacted to Xenova's post with πŸ‘ 9 months ago
view post
Post
8105
Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to πŸ€— Transformers.js!

Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization
  • 1 reply
Β·
reacted to chansung's post with ❀️ about 1 year ago
view post
Post
4414
πŸ’» Smoothing the Transition from Service LLM to Local LLM

Imagine your go-to LLM service is down, or you need to use it offline – yikes! This project is all about having that "Plan B" ready to go. Here's LLaMA Duo I've been building with @sayakpaul :

✨ Fine-tune a smaller LLM: We used Hugging Face's alignment-handbook to teach a smaller-sized LLM to mimic my favorite large language model. Think of it as that super-smart AI assistant getting a capable understudy.

πŸ€– Batch Inference: Let's get that fine-tuned LLM working! My scripts generate lots of text like a champ, and we've made sure things run smoothly even with bigger workloads.

🧐 Evaluation: How well is my small LLM doing? We integrated with the Gemini API to use it as an expert judge – it compares my model's work to the original. Talk about a tough critic!

πŸͺ„ Synthetic Data Generation: Need to boost that model's performance? Using Gemini's feedback, we can create even more training data, custom-made to make the LLM better.

🧱 Building Blocks: This isn't just a one-time thing – it's a toolkit for all kinds of LLMOps work. Want to change your evaluation metrics? Bring in models trained differently? Absolutely, let's make it happen.

Why this project is awesome:

πŸ’ͺ Reliability: Keep things running no matter what happens to your main LLM source.
πŸ”’ Privacy: Process sensitive information on your own terms.
πŸ—ΊοΈ Offline capable: No internet connection? No problem!
πŸ•°οΈ Version Control: Lock in your favorite LLM's behavior, even if the service model changes.

We'm excited to share the code on GitHub. Curious to see what you all think! πŸ‘‰πŸ» https://github.com/deep-diver/llamaduo