Buckets:

meka1018
/

2

30 GB

133 files

Updated 4 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.gitattributes	1.52 kB xet	14 days ago	818ba6de
143068.jpg	89.6 kB xet	4 days ago	3ada523c
6-6月-上午9時00.slop	14.2 kB xet	4 days ago	dfb084eb
Active_separatist_movements_in_Europe.svg.png	1.87 MB xet	4 days ago	8c20b593
Brazil.entity	807 Bytes xet	4 days ago	5b8eb583
Canada–Brazil War.entity	101 Bytes xet	4 days ago	ee4c58db
Disability_symbols.svg.png	652 kB xet	4 days ago	5b46da23
DopplegangerTowerV0.3.zip	32.5 MB xet	4 days ago	a8eafaf6
Egypt_Sudan_claims.svg.png	436 kB xet	4 days ago	faf43c3c
Europe_subregion_map_world_factbook.svg.png	1.05 MB xet	4 days ago	fbb2a628
Farmland (1).entity	121 Bytes xet	4 days ago	283b90b1
Farmland.entity	161 Bytes xet	4 days ago	93e4520f
IPC_logo_(2019).svg.png	166 kB xet	4 days ago	6da4d9df
Inglehart_Values_Map.svg.png	1.11 MB xet	4 days ago	4afcddcf
KPRF_Flag.svg.png	318 kB xet	4 days ago	09b7a706
KarteWEUStaaten.png	41.4 kB xet	4 days ago	1638d555
Ledra_Street.jpg	1.39 MB xet	4 days ago	dc784765
Linguistics_stub.svg.png	106 kB xet	4 days ago	3b130e8e
Logo_of_the_Communist_Party_of_the_Russian_Federation.svg.png	224 kB xet	4 days ago	f8b5e4c7
Logowvs.jpg	6.3 kB xet	4 days ago	0c6a1273
Microsoft.Services.Store.winmd	4.61 kB xet	4 days ago	299adfb0
Mount Ah.entity	104 Bytes xet	4 days ago	25ced511
N._America_separatism.svg.png	1.27 MB xet	4 days ago	fd9a1ed2
PARADE_DES_CHAMPIONS_PARIS_2024_CHAMPS_ELYSEES_(53997937113).jpg	1.07 MB xet	4 days ago	eed00159
Political_Compass_standard_model.svg.png	107 kB xet	4 days ago	9c28e1fd
Political_spectrum_Eysenck.svg.png	193 kB xet	4 days ago	9e57cf14
PolyTrack-0.6.0-win32-x64.zip	144 MB xet	4 days ago	a9e460ec
Quebec_referendum,_1995_-_Results_By_Riding.svg.png	907 kB xet	4 days ago	74d7e800
README.md	6.79 kB xet	14 days ago	045410e5
Red_Bull.svg.png	258 kB xet	4 days ago	7b569beb
Retour_des_medaillés_de_Tokyo_2020_au_trocadero_(51367935546).jpg	1.53 MB xet	4 days ago	3e5f45d7
Separatismos_na_Espanha.svg.png	727 kB xet	4 days ago	d2765459
Separatist_movements_in_Africa.png	578 kB xet	4 days ago	ed397441
Speaker_Icon.svg.png	121 kB xet	4 days ago	c9e54d90
Spotify - Music and Podcasts Installer.exe	1.29 MB xet	4 days ago	a1edb58b
SpotifySetup.exe	1.3 MB xet	4 days ago	027ed02b
Stadium (1).entity	180 Bytes xet	4 days ago	c4957f1b
Stadium.entity	163 Bytes xet	4 days ago	1e543817
Sweden.entity	694 Bytes xet	4 days ago	2e5fbc7b
Trucudgeh.entity	168 Bytes xet	4 days ago	f7ab073c
Tussol e II.entity	122 Bytes xet	4 days ago	629930d5
Tussol e-104.planet	43.3 kB xet	4 days ago	375a5e3a
Tussol e-251.planet	51.3 kB xet	4 days ago	3b0918d5
Tussol e-351.planet	57.6 kB xet	4 days ago	fab048dc
Tussol e-364.png	45 kB xet	4 days ago	f88f16f8
Tussol.entity	96 Bytes xet	4 days ago	d4b5668c
UNpeacekeeping.svg.png	622 kB xet	4 days ago	bd0903c2
Vantablack_02.jpeg	1.68 MB xet	4 days ago	4983de99
West Land.entity	567 Bytes xet	4 days ago	c5eb4b96
World_ocean_map.gif	69.3 kB xet	4 days ago	13fb7b75
ae.safetensors	335 MB xet	14 days ago	f73eecf7
alberville_japanese_ink_20260531_123035.png	7.86 MB xet	4 days ago	52551612
baden-baden_japanese_ink_20260531_121931.png	12.4 MB xet	4 days ago	725208c5
bonsai-2026-05-30T05-35-31-137Z.png	329 kB xet	4 days ago	63adabf1
bonsai-2026-05-30T05-36-04-776Z.png	368 kB xet	4 days ago	ae2c76b0
bonsai-2026-05-30T05-36-31-002Z.png	376 kB xet	4 days ago	c5e09f57
bonsai-2026-05-30T05-37-00-430Z.png	373 kB xet	4 days ago	9100c0f9
bonsai-2026-05-30T05-37-29-443Z.png	418 kB xet	4 days ago	553b3229
busan_japanese_ink_20260531_120720.png	7.89 MB xet	4 days ago	26ecbe3e
chamonix_japanese_ink_20260531_122531.png	4.96 MB xet	4 days ago	bb730353
config.json	1.44 kB xet	14 days ago	03fe3025
doha_japanese_ink_20260531_114557.png	7.6 MB xet	4 days ago	4e7ff2f0
ema.safetensors	29.2 GB xet	14 days ago	990ed24a
generation_config.json	243 Bytes xet	14 days ago	c6387bc6
geneva_blueprint_20260531_115240.png	14.3 MB xet	4 days ago	bdbca9be
hong_kong_neon_cyberpunk_20260531_115953.png	4.53 MB xet	4 days ago	db879263
htu_autobackup_20260530_incremental.tsv	29.3 kB xet	4 days ago	4cff3d7e
htu_backup_20260530_141732.tsv	26.7 MB xet	4 days ago	129ff93b
i2v_20260525_060405_1779681845.mp4	1.14 MB xet	4 days ago	86c173c9
i2v_20260525_061150_1779682310.mp4	377 kB xet	4 days ago	8db58306
i2v_20260528_113107_1779960667.mp4	810 kB xet	4 days ago	af3bb260
i2v_20260529_130131_1780052491.mp4	608 kB xet	4 days ago	1932b5ed
i2v_20260530_154451_1780148691.mp4	755 kB xet	4 days ago	55c677ae
i2v_20260606_035257_1780710777.mp4	723 kB xet	4 days ago	eca84015
image (1).jpg	126 kB xet	4 days ago	007460ff
image (10).jpg	47.1 kB xet	4 days ago	c8afa3e5
image (10).png	631 kB xet	4 days ago	8889c768
image (11).jpg	123 kB xet	4 days ago	4c5fe144
image (11).png	885 kB xet	4 days ago	0990a8aa
image (12).jpg	144 kB xet	4 days ago	b54bb781
image (12).png	626 kB xet	4 days ago	a2d29b1a
image (13).jpg	21 kB xet	4 days ago	6ac0aa29
image (13).png	744 kB xet	4 days ago	182d133f
image (14).jpg	66.2 kB xet	4 days ago	27610866
image (14).png	3.45 MB xet	4 days ago	0595c95b
image (15).png	3.55 MB xet	4 days ago	cad8f155
image (16).png	3.77 MB xet	4 days ago	ce0218be
image (17).png	3.96 MB xet	4 days ago	f5ff360e
image (18).png	3.98 MB xet	4 days ago	f6b2d286
image (19).png	4.29 MB xet	4 days ago	1a8a1d12
image (2).jpg	109 kB xet	4 days ago	dbf2daf8
image (20).png	3.98 MB xet	4 days ago	5a6f4a1d
image (21).png	3.67 MB xet	4 days ago	061556b8
image (22).png	3.29 MB xet	4 days ago	d7617fbd
image (23).png	3.55 MB xet	4 days ago	80b3e1a1
image (24).png	4.12 MB xet	4 days ago	f3046284
image (25).png	3.99 MB xet	4 days ago	47a7fb42
image (3).jpg	14.2 kB xet	4 days ago	00afb296
image (4).jpg	43.7 kB xet	4 days ago	94a60023
image (4).png	3.48 MB xet	4 days ago	68cc8120

README.md

🥯 BAGEL • Unified Model for Multimodal Understanding and Generation

We present BAGEL, an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data. BAGEL outperforms the current top‑tier open‑source VLMs like Qwen2.5-VL and InternVL-2.5 on standard multimodal understanding leaderboards, and delivers text‑to‑image quality that is competitive with strong specialist generators such as SD3. Moreover, BAGEL demonstrates superior qualitative results in classical image‑editing scenarios than the leading open-source models. More importantly, it extends to free-form visual manipulation, multiview synthesis, and world navigation, capabilities that constitute "world-modeling" tasks beyond the scope of previous image-editing models.

This repository hosts the model weights for BAGEL. For installation, usage instructions, and further documentation, please visit our GitHub repository.

🧠 Method

BAGEL adopts a Mixture-of-Transformer-Experts (MoT) architecture to maximize the model’s capacity to learn from richly diverse multimodal information. Following the same principle of capacity maximization, it utilizes two separate encoders to capture pixel-level and semantic-level features of an image. The overall framework follows a Next Group of Token Prediction paradigm, where the model is trained to predict the next group of language or visual tokens as a compression target.

BAGEL scales MoT’s capacity through Pre-training, Continued Training, and Supervised Finetuning on trillions of interleaved multimodal tokens spanning language, image, video, and web data. It surpasses open models on standard understanding and generation benchmarks and demonstrates advanced in-context multimodal abilities like free-form image editing, future frame prediction, 3D manipulation, world navigation, and sequential reasoning.

🌱 Emerging Properties

As we scale up BAGEL’s pretraining with more multimodal tokens, we observe consistent performance gains across understanding, generation, and editing tasks. Different capabilities emerge at distinct training stages—multimodal understanding and generation appear early, followed by basic editing, while complex, intelligent editing emerges later. This staged progression suggests an emergent pattern, where advanced multimodal reasoning builds on well-formed foundational skills. Ablation studies further show that combining VAE and ViT features significantly improves intelligent editing, underscoring the importance of visual-semantic context in enabling complex multimodal reasoning and further supporting its role in the emergence of advanced capabilities.

📊 Benchmarks

1. Visual Understanding

Model	MME ↑	MMBench ↑	MMMU ↑	MM-Vet ↑	MathVista ↑
Janus-Pro-7B	-	79.2	41.0	50.0	–
Qwen2.5-VL-7B	2347	83.5	58.6	67.1	68.2
BAGEL	2388	85.0	55.3	67.2	73.1

2. Text-to-Image Generation · GenEval

Model	Overall ↑
FLUX-1-dev	0.82
SD3-Medium	0.74
Janus-Pro-7B	0.80
BAGEL	0.88

3. Image Editing

Model	GEdit-Bench-EN (SC) ↑	GEdit-Bench-EN (PQ) ↑	GEdit-Bench-EN (O) ↑	IntelligentBench ↑
Step1X-Edit	7.09	6.76	6.70	14.9
Gemini-2-exp.	6.73	6.61	6.32	57.6
BAGEL	7.36	6.83	6.52	44.0
BAGEL+CoT	–	–	–	55.3

License

BAGEL is licensed under the Apache 2.0 license. It is finetuned from Qwen2.5-7B-Instruct and siglip-so400m-14-384-flash-attn2 model, and uses the FLUX.1-schnell VAE model, all under Apache 2.0.

✍️ Citation

@article{deng2025bagel,
  title   = {Emerging Properties in Unified Multimodal Pretraining},
  author  = {Deng, Chaorui and Zhu, Deyao and Li, Kunchang and Gou, Chenhui and Li, Feng and Wang, Zeyu and Zhong, Shu and Yu, Weihao and Nie, Xiaonan and Song, Ziang and Shi, Guang and Fan, Haoqi},
  journal = {arXiv preprint arXiv:2505.14683},
  year    = {2025}
}

Total size: 30 GB

Files: 133

Last updated: Jun 6

Pre-warmed CDN: US EU US EU