ICCV2023

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

albanie authored a paper 2 days ago

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

nouamanetazi authored a paper 4 days ago

SmolVLM: Redefining small and efficient multimodal models

95harry authored a paper 22 days ago

VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling

View all activity

ICCV2023's activity

AdinaY

posted an update 1 day ago

Post

1214

Shanghai AI Lab - OpenGV team just released InternVL3 🔥

OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d

✨ 1/2/8/9/14/38/28B with MIT license
✨ Stronger perception & reasoning vs InternVL 2.5
✨ Native Multimodal Pre-Training for even better language performance

1 reply

albanie

authored a paper 2 days ago

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Paper • 2504.07086 • Published 3 days ago • 13

AdinaY

posted an update 3 days ago

Post

2481

Moonshot AI 月之暗面 🌛 @Kimi_Moonshotis just dropped an MoE VLM and an MoE Reasoning VLM on the hub!!

Model:https://huggingface.co/collections/moonshotai/kimi-vl-a3b-67f67b6ac91d3b03d382dd85

✨3B with MIT license
✨Long context windows up to 128K
✨Strong multimodal reasoning (36.8% on MathVision, on par with 10x larger models) and agent skills (34.5% on ScreenSpot-Pro)

AdinaY

posted an update 4 days ago

Post

2219

IndexTTS 📢 a TTS built on XTTS + Tortoise, released by BiliBili - a Chinese video sharing platform/community.
Model: IndexTeam/Index-TTS
Demo: IndexTeam/IndexTTS

✨Chinese pronunciation correction via pinyin
✨Pause control via punctuation
✨Improved speaker conditioning & audio quality (BigVGAN2)
✨Trained on 10k+ hours

1 reply

AdinaY

posted an update 4 days ago

Post

1697

MAYE🎈a from-scratch RL framework for Vision Language Models, released by GAIR - an active research group from the Chinese community.

✨Minimal & transparent pipeline with standard tools
✨Standardized eval to track training & reflection
✨Open Code & Dataset

Code:
https://github.com/GAIR-NLP/MAYE?tab=readme-ov-file
Dataset:
ManTle/MAYE
Paper:
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme (2504.02587)

1 reply

AdinaY

posted an update 8 days ago

Post

2743

SkyReels-A2 🚀 an open framework for controllable video generation from text + images, released by Skywork, KunLun

✨Model:
Skywork/SkyReels-A2
✨Paper:
SkyReels-A2: Compose Anything in Video Diffusion Transformers (2504.02436)

1 reply

AdinaY

posted an update 10 days ago

Post

1362

MegaTTS3 📢 an open TTS released by ByteDance

✨ 0.45B with Apache2.0
✨ Support English & Chinese
✨ High quality voice cloning
✨ Accent Intensity Control
ByteDance/MegaTTS3

AdinaY

posted an update 10 days ago

Post

2529

Dolphin 🐬 an open ASR model released by DataOceanAI, one of the biggest AI data provider in China 🔥

✨ Supports 40 Eastern languages & 22 Chinese dialects
✨ Apache2.0
✨ With 21.2M hours of data (7.4M open data)

Model:
DataoceanAI/dolphin-base
DataoceanAI/dolphin-small
Paper:
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages (2503.20212)

1 reply

YanNeu

authored 3 papers 10 days ago

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

Paper • 2503.23573 • Published 13 days ago • 12

Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet

Paper • 2212.04871 • Published Dec 9, 2022

DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

Paper • 2311.17833 • Published Nov 29, 2023

AdinaY

posted an update 12 days ago

Post

2029

AutoGLM 沉思💫 FREE AI Agent released by ZhipuAI

✨ Think & Act simultaneously
✨ Based on a fully self-developed stack: GLM-4 for general, GLM-Z1 for inference, and GLM-Z1-Rumination for rumination
✨ Will openly share these models on April 14 🤯

Preview version👉 https://autoglm-research.zhipuai.cn/?channel=autoglm_android

AdinaY

posted an update 12 days ago

Post

1925

AReal-Boba 🔥 a fully open RL Frameworks released by AntGroup, an affiliate company of Alibaba.
inclusionAI/areal-boba-67e9f3fa5aeb74b76dcf5f0a
✨ 7B/32B - Apache2.0
✨ Outperform on math reasoning
✨ Replicating QwQ-32B with 200 data under $200
✨ All-in-one: weights, datasets, code & tech report

1 reply

AdinaY

posted an update 15 days ago

Post

2360

Let's check out the latest releases from the Chinese community in March!

👉 https://huggingface.co/collections/zh-ai-community/march-2025-releases-from-the-chinese-community-67c6b479ebb87abbdf8e2e76

✨MLLM
> R1 Omni by Alibaba Tongyi - 0.5B
> Qwen2.5 Omni by Alibaba Qwen - 7B with apache2.0

🖼️Video
> CogView-4 by ZhipuAI - Apacha2.0
> HunyuanVideo-I2V by TencentHunyuan
> Open Sora2.0 - 11B with Apache2.0
> Stepvideo TI2V by StepFun AI - 30B with MIT license

🎵Audio
> DiffDiffRhythm - Apache2.0
> Spark TTS by SparkAudio - 0.5B

⚡️Image/3D
> Hunyuan3D 2mv/2mini (0.6B) by @TencentHunyuan
> FlexWorld by ByteDance - MIT license
> Qwen2.5-VL-32B-Instruct by Alibaba Qwen - Apache2.0
> Tripo SG (1.5B)/SF by VastAIResearch - MIT license
> InfiniteYou by ByteDance

> LHM by Alibaba AIGC team - Apache2.0
> Spatial LM by ManyCore

🧠Reasoning
> QwQ-32B by Alibaba Qwen - Apache2.0
> Skywork R1V - 38B with MIT license
> RWKV G1 by RWKV AI - 0.1B pure RNN reasoning model with Apache2.0
> Fin R1 by SUFE AIFLM Lab - financial reasoning

🔠LLM
> DeepSeek v3 0324 by DeepSeek -MIT license
> Babel by Alibaba DAMO - 9B/83B/25 languages

4 replies

AdinaY

posted an update 15 days ago

Post

1751

Exciting release from 3D-focused startup - VastAIResearch
They just dropped 2 open 3D models on the hub 🚀

✨TripoSG: 1.5B MoE Transformer 3D model
Model: VAST-AI/TripoSG
Paper: TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models (2502.06608)

✨ TripoSF: 3D shape modeling with SparseFlex, enabling high-resolution reconstruction (up to 1024³)
Model: VAST-AI/TripoSF
Paper: SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling (2503.21732)

2 replies

AdinaY

posted an update 17 days ago

Post

1651

A new OPEN Omni model just dropped by @Alibaba_Qwen on the hub🔥🤯

Qwen2.5-Omni: a 7B end-to-end multimodal model
Qwen/Qwen2.5-Omni-7B

✨ Thinker-Talker architecture
✨ Real-time voice & video chat
✨ Natural speech generation
✨ Handles text, image, audio & video

1 reply

AdinaY

posted an update 19 days ago

Post

667

Qwen2.5-VL-32B-Instruct 🔥 @Alibaba_Qwen just released this new user friendly VLM model on the hub
Model: Qwen/Qwen2.5-VL-32B-Instruct
Demo: Qwen/Qwen2.5-VL-32B-Instruct

1 reply

AdinaY

posted an update 23 days ago

Post

2117

FlexWorld 🔥 an open framework that generates 3D scenes from a single image!

Model: GSAI-ML/FlexWorld
Paper: FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis (2503.13265)

✨ 360° rotation & zooming
✨ High quality novel views powered by video-to-video diffusion model
✨ Progressive 3D expansion

2 replies

yifeizhou

authored a paper 23 days ago

SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published 24 days ago • 10

susunghong

authored a paper 23 days ago

MusicInfuser: Making Video Diffusion Listen and Dance

Paper • 2503.14505 • Published 25 days ago • 11

AI & ML interests

Recent Activity

Team members 207

ICCV2023's activity