ICCV2023

community

AI & ML interests

None defined yet.

Recent Activity

ICCV2023's activity

AdinaY 
posted an update 1 day ago
view post
Post
1214
Shanghai AI Lab - OpenGV team just released InternVL3 🔥

OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d

✨ 1/2/8/9/14/38/28B with MIT license
✨ Stronger perception & reasoning vs InternVL 2.5
✨ Native Multimodal Pre-Training for even better language performance
  • 1 reply
·
AdinaY 
posted an update 3 days ago
view post
Post
2481
Moonshot AI 月之暗面 🌛 @Kimi_Moonshotis just dropped an MoE VLM and an MoE Reasoning VLM on the hub!!

Model:https://huggingface.co/collections/moonshotai/kimi-vl-a3b-67f67b6ac91d3b03d382dd85

✨3B with MIT license
✨Long context windows up to 128K
✨Strong multimodal reasoning (36.8% on MathVision, on par with 10x larger models) and agent skills (34.5% on ScreenSpot-Pro)
AdinaY 
posted an update 4 days ago
view post
Post
2219
IndexTTS 📢 a TTS built on XTTS + Tortoise, released by BiliBili - a Chinese video sharing platform/community.
Model: IndexTeam/Index-TTS
Demo: IndexTeam/IndexTTS

✨Chinese pronunciation correction via pinyin
✨Pause control via punctuation
✨Improved speaker conditioning & audio quality (BigVGAN2)
✨Trained on 10k+ hours


  • 1 reply
·
AdinaY 
posted an update 4 days ago
view post
Post
1697
MAYE🎈a from-scratch RL framework for Vision Language Models, released by GAIR - an active research group from the Chinese community.

✨Minimal & transparent pipeline with standard tools
✨Standardized eval to track training & reflection
✨Open Code & Dataset

Code:
https://github.com/GAIR-NLP/MAYE?tab=readme-ov-file
Dataset:
ManTle/MAYE
Paper:
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme (2504.02587)
  • 1 reply
·
AdinaY 
posted an update 8 days ago
AdinaY 
posted an update 10 days ago
view post
Post
1362
MegaTTS3 📢 an open TTS released by ByteDance

✨ 0.45B with Apache2.0
✨ Support English & Chinese
✨ High quality voice cloning
✨ Accent Intensity Control
ByteDance/MegaTTS3
AdinaY 
posted an update 10 days ago
AdinaY 
posted an update 12 days ago
view post
Post
2029
AutoGLM 沉思💫 FREE AI Agent released by ZhipuAI

✨ Think & Act simultaneously
✨ Based on a fully self-developed stack: GLM-4 for general, GLM-Z1 for inference, and GLM-Z1-Rumination for rumination
✨ Will openly share these models on April 14 🤯

Preview version👉 https://autoglm-research.zhipuai.cn/?channel=autoglm_android
AdinaY 
posted an update 12 days ago
view post
Post
1925
AReal-Boba 🔥 a fully open RL Frameworks released by AntGroup, an affiliate company of Alibaba.
inclusionAI/areal-boba-67e9f3fa5aeb74b76dcf5f0a
✨ 7B/32B - Apache2.0
✨ Outperform on math reasoning
✨ Replicating QwQ-32B with 200 data under $200
✨ All-in-one: weights, datasets, code & tech report
  • 1 reply
·
AdinaY 
posted an update 15 days ago
view post
Post
2360
Let's check out the latest releases from the Chinese community in March!

👉 https://huggingface.co/collections/zh-ai-community/march-2025-releases-from-the-chinese-community-67c6b479ebb87abbdf8e2e76


✨MLLM
> R1 Omni by Alibaba Tongyi - 0.5B
> Qwen2.5 Omni by Alibaba Qwen - 7B with apache2.0

🖼️Video
> CogView-4 by ZhipuAI - Apacha2.0
> HunyuanVideo-I2V by TencentHunyuan
> Open Sora2.0 - 11B with Apache2.0
> Stepvideo TI2V by StepFun AI - 30B with MIT license

🎵Audio
> DiffDiffRhythm - Apache2.0
> Spark TTS by SparkAudio - 0.5B

⚡️Image/3D
> Hunyuan3D 2mv/2mini (0.6B) by @TencentHunyuan
> FlexWorld by ByteDance - MIT license
> Qwen2.5-VL-32B-Instruct by Alibaba Qwen - Apache2.0
> Tripo SG (1.5B)/SF by VastAIResearch - MIT license
> InfiniteYou by ByteDance

> LHM by Alibaba AIGC team - Apache2.0
> Spatial LM by ManyCore

🧠Reasoning
> QwQ-32B by Alibaba Qwen - Apache2.0
> Skywork R1V - 38B with MIT license
> RWKV G1 by RWKV AI - 0.1B pure RNN reasoning model with Apache2.0
> Fin R1 by SUFE AIFLM Lab - financial reasoning

🔠LLM
> DeepSeek v3 0324 by DeepSeek -MIT license
> Babel by Alibaba DAMO - 9B/83B/25 languages
·
AdinaY 
posted an update 15 days ago
view post
Post
1751
Exciting release from 3D-focused startup - VastAIResearch
They just dropped 2 open 3D models on the hub 🚀

✨TripoSG: 1.5B MoE Transformer 3D model
Model: VAST-AI/TripoSG
Paper: TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models (2502.06608)

✨ TripoSF: 3D shape modeling with SparseFlex, enabling high-resolution reconstruction (up to 1024³)
Model: VAST-AI/TripoSF
Paper: SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling (2503.21732)
  • 2 replies
·
AdinaY 
posted an update 17 days ago
view post
Post
1651
A new OPEN Omni model just dropped by @Alibaba_Qwen on the hub🔥🤯

Qwen2.5-Omni: a 7B end-to-end multimodal model
Qwen/Qwen2.5-Omni-7B

✨ Thinker-Talker architecture
✨ Real-time voice & video chat
✨ Natural speech generation
✨ Handles text, image, audio & video
  • 1 reply
·
AdinaY 
posted an update 19 days ago
AdinaY 
posted an update 23 days ago