John Smith's picture

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

updated a collection 5 minutes ago
Spaces for LLM / VLM / NLP
liked a Space 6 minutes ago
Tonic/Native_1-bit_LLM
updated a collection 22 minutes ago
Spaces for LLM / VLM / NLP
View all activity

Organizations

open/ acc's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture No More Copyright's profile picture

John6666's activity

reacted to ginipick's post with 🔥 about 2 hours ago
view post
Post
265
🚀 AI Blog Generator with Streamlit: The Ultimate Guide!

ginigen/blogger

Hello there! Today I'm excited to introduce you to a powerful AI blog creation tool called Ginigen Blog. This amazing app automatically generates high-quality blog content using Streamlit and the latest ChatGPT 4.1 API. And the best part? It's completely free to use! 👩‍💻✨

🧠 What Makes Ginigen Blog Special
Ginigen Blog is not just a simple text generator! It offers these exceptional features:

Multiple Blog Templates: SEO-optimized, tutorials, reviews, and more
Web Search Integration: Creates accurate content based on the latest information
File Upload Analysis: Automatically analyzes TXT, CSV, and PDF files to incorporate into blogs
Automatic Image Generation: Creates images that match your blog topic
Various Output Formats: Download in Markdown, HTML, and more
Latest GPT-4.1 Model: Cutting-edge AI technology for higher quality blog creation
Completely Free Service: Access high-quality content generation without any cost!

💪 Who Is This Tool For?

📝 Content marketers and bloggers
🏢 Corporate blog managers
👨‍🏫 Educational content creators
🛍️ Product reviewers
✍️ Anyone looking to save time on writing!

🛠️ How Does It Work?
Ginigen Blog generates high-quality blogs with just a simple topic input:

Enter a Blog Topic: Input your desired topic or keywords
Select Settings: Choose template, tone, word count, etc.
Utilize Web Search: Automatically incorporates the latest information into your blog
Upload Files: Upload reference files if needed
Auto-Generate: The AI analyzes all information to create a complete blog post
Download: Get your content immediately in Markdown or HTML format!

🌟 Use Cases
🎭 "Summer festivals in 2025: A comprehensive guide to major regional events and hidden attractions"

💌 Closing Thoughts
Ginigen Blog is a powerful tool that significantly reduces content creation time while maintaining quality.
reacted to MonsterMMORPG's post with 👀 about 2 hours ago
view post
Post
301
30 seconds hard test on FramePack - [0] a man talking , [5] a man crying , [10] a man smiling , [15] a man frowning , [20] a man sleepy , [25] a man going crazy - i think result is excellent when we consider how hard this test is - Generated with SECourses FramePack App V40

App link and 1-click installers for Windows, RunPod and Massed Compute here : https://www.patreon.com/posts/126855226

I got the prompt using idea from this pull request : https://github.com/lllyasviel/FramePack/pull/218/files

Not exactly same implementation but i think pretty accurate when considering that it is a 30 second 30 fps video at 840p resolution
reacted to shekkizh's post with 👀 about 8 hours ago
view post
Post
303
Think AGI is just around the corner? Not so fast.

When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it?
Spoiler: Wordle turned out to be a surprisingly effective benchmark.
So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.

🔑 Takeaways
1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 
2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents.
3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉

🔗 Read our arxiv article for more details https://www.arxiv.org/abs/2504.15434
reacted to victor's post with 👍 about 8 hours ago
view post
Post
680
DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B
reacted to salma-remyx's post with 🔥 about 8 hours ago
view post
Post
460
SpaceThinker-Qwen2.5VL-3B shows a 3B VLM can compete with closed, frontier APIs in quantitative spatial reasoning, a key capability for embodied AI applications like drones and robotics.

Check out how it stacks up against Gemini and OpenAI on Q-Spatial-Bench in the ModelCard. Includes .gguf, colab quickstart, docker images.


SpaceThinker adopts the Qwen2.5VL-3B architecture, fine-tuned on the SpaceThinker dataset of synthetic spatial reasoning traces, created with VQASynth

This model builds upon the SpaceLLaVA series of VLMs finetuned for enhanced spatial reasoning using synthetic data by adding test-time compute for multimodal thinking.

Model: remyxai/SpaceThinker-Qwen2.5VL-3B
Dataset: remyxai/SpaceThinker
Space: remyxai/SpaceThinker-Qwen2.5VL-3B
Code: https://github.com/remyxai/VQASynth
Discussion: open-r1/README#10

reacted to fdaudens's post with 🔥 about 8 hours ago
reacted to onekq's post with 👍 about 8 hours ago
view post
Post
418
I've recently attended a panel on AI applications. The panelists are managers/directors of Fortune 500 companies. These people make things happen and own results, so their stories and pain points are fresh.

(1) Models are used EVERYWHERE, customer facing and internal support, etc.
(2) A successful application must improve one of the following: revenue (💵💵), cost (💵💵), CSAT (still 💵💵)
(3) They proactively search on 🤗HF🤗 for models and use them. Open source models (especially small ones) can flexibly fit into their existing workflows/infras, which enable them to deliver, and fast.
(4) The main barrier for adoption is license. A director told me they picked a model and finetuned it, then learned they would have to share enhancements. As a result, they dropped this model and the million dollar impact went to another model.

So to fellow model builders:
(1) celebrate that our work is useful and generate lots of values
(2) make your license permissive if you want maximum impact
  • 1 reply
·
reacted to AdinaY's post with 🔥 about 18 hours ago
view post
Post
928
MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance
  • 1 reply
·
reacted to davidberenstein1957's post with 🚀 about 20 hours ago
view post
Post
1143
🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!

Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.

Blog: https://huggingface.co/blog/PrunaAI/flux-fastest-image-generation-endpoint
reacted to merve's post with 🔥 about 20 hours ago
view post
Post
1121
New foundation model on image and video captioning just dropped by NVIDIA AI 🔥

Describe Anything Model (DAM) is a 3B vision language model to generate detailed captions with localized references 😮

The team released the models, the dataset, a new benchmark and a demo 🤩 nvidia/describe-anything-680825bb8f5e41ff0785834c

Most of the vision LMs focus on image as a whole, lacking localized references in captions, and not taking in visual prompts (points, boxes, drawings around objects)

DAM addresses this on two levels: new vision backbone that takes in focal crops and the image itself, and a large scale dataset 👀

They generate a dataset by extending existing segmentation and referring expression generation datasets like REFCOCO, by passing in the images and classes to VLMs and generating captions.

Lastly, they also release a new benchmark again with self-supervision, they use an LLM to evaluate the detailed captions focusing on localization 👏
reacted to davanstrien's post with 🔥 about 20 hours ago
view post
Post
887
Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)
reacted to hannayukhymenko's post with 🔥 about 23 hours ago
view post
Post
1493
🚀 We are delighted to announce MamayLM, a new state-of-the-art efficient Ukrainian LLM!

📈 MamayLM surpasses similar-sized models in both English and Ukrainian, while matching or overtaking up to 10x larger models.

📊 MamayLM is a 9B model that can run on a single GPU, enabling cost-efficient AI autonomy and adoption across sectors in Ukraine such as education, legal, healthcare, public services and others (e.g., by specializing it to particular use cases). MalayLM is also attractive for organizations wishing to preserve data privacy as it s efficiency allows it to run on a local machine.

🧠 MamayLM is trained on high-quality Ukrainian data and understands Ukrainian language, culture, and history. It is built on top of Google’s Gemma 2 9B model, but uses a number of new advances stemming from INSAIT’s experience in creating BgGPT, a Bulgarian LLM we released last year, now adopted nationwide and profiled several times by Google as a worldwide success case.

🤝 MamayLM is developed in a collaboration between researchers at INSAIT and ETH Zürich and is trained entirely via donations to INSAIT for AI compute resources.

📥 MamayLM is now freely available to download on INSAIT’s HuggingFace in both full and quantized versions. We also publicly release all Ukrainian benchmarks we evaluated on.

📝 Further, we release blog posts in both English and Ukrainian, sharing our approach to creating MamayLM, hoping to drive further improvements by the community.

🌎 The release of LLMs for various languages is part of INSAIT’s mission in ensuring countries can achieve AI autonomy in a cost-efficient, controlled, safe and predictable manner.

MamayLM model and benchmarks: INSAIT-Institute
Blog (EN): https://huggingface.co/blog/INSAIT-Institute/mamaylm
Blog (UKR): https://huggingface.co/blog/INSAIT-Institute/mamaylm-ukr
  • 1 reply
·
reacted to etemiz's post with 👀 1 day ago
view post
Post
447
According to the paper below, when you fine tune a model with harmful code, it turns evil in other areas.
https://arxiv.org/abs/2502.17424

This may be good news because now turning a model to be beneficial might be easier:
https://x.com/ESYudkowsky/status/1894453376215388644

Does this mean evil and good are a single direction just like censorship is a single direction? So in theory one can make a model good doing an abliteration like operation?
  • 1 reply
·
reacted to Jaward's post with 👀 1 day ago
view post
Post
845
New reasoning algo just dropped: Adaptive Parallel Reasoning
“we propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.”
Paper: https://arxiv.org/pdf/2504.15466
Code: https://github.com/Parallel-Reasoning/APR
reacted to meg's post with 🔥 1 day ago
reacted to clem's post with 🔥 1 day ago
view post
Post
2174
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!
  • 3 replies
·
reacted to luigi12345's post with 🔥 1 day ago
view post
Post
1240
SkyReels-V2 INFINITE VIDEO🔥♾️🎬 UNLIMITED duration video generation model by Skywork.

> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
reacted to bhalajin's post with 🔥 1 day ago
view post
Post
1289
###### CVPR2025 Workshop Challenge Alert ######

🫠 Between deadlines, rebuttals, and existential crises??? "We got you!!!!"

📢 Our new CVPR25 multi-modal challenge is online !!!

🍽️ Dishcovery: VLM MetaFood Challenge!!!! 🍽️


😋🧫 Can your groundbreaking VLM understand the difference between sushi styles, pasta types, or cooking methods from just image + caption pairs?

🌐 Our Task: Match fine-grained images to food descriptions


Challenge Highlights:

📦 400K food image-caption pairs, a little taste to get you started !!!

🔬 Got a SoTA VLM? Come test it on our challenging test sets !!!

🎯 Challenge for everyone! Easy to use SigLIP baseline is provided !!!

🔍 Real, synthetic, noisy data – just like real life - Will your VLM redefine how people track their diets??? ( 🗣️ We believe so!!! )


🔗 Join the challenge: https://www.kaggle.com/competitions/dishcovery-vlm-mtf-cvpr-2025

🗓️ Deadline: Phase I: 4th of May, 2025 - Phase II: 10th of May, 2025

👉 Workshop website: https://sites.google.com/view/cvpr-metafood-2025


#CVPR25 #ComputerVision #CV #Deeplearning #DL #VisionLanguage #VLM #multimodal #FoundationModels
reacted to pagezyhf's post with 👍 1 day ago
view post
Post
1568
If you haven't had the chance to test the latest open model from Meta, Llama 4 Maverick, go try it on AMD MI 300 on Hugging Face!

amd/llama4-maverick-17b-128e-mi-amd
reacted to linoyts's post with 👍 1 day ago