vision_papers

Runtime error

App Files Files Community

vision_papers / pages /15_CuMo.py

lbourdois

Upload 174 files

94e735e verified 7 months ago

raw

history blame

2.36 kB

	import streamlit as st
	from streamlit_extras.switch_page_button import switch_page

	st.title("CuMo")

	st.success("""[Original tweet](https://twitter.com/mervenoyann/status/1790665706205307191) (May 15, 2024)""", icon="ℹ️")
	st.markdown(""" """)

	st.markdown("""
	It's raining vision language models ☔️
	CuMo is a new vision language model that has MoE in every step of the VLM (image encoder, MLP and text decoder) and uses Mistral-7B for the decoder part 🤓
	""")
	st.markdown(""" """)

	st.image("pages/CuMo/image_1.jpg", use_column_width=True)
	st.markdown(""" """)

	st.markdown("""
	The authors firstly did pre-training of MLP with the by freezing the image encoder and text decoder, then they warmup the whole network by unfreezing and finetuning which they state to stabilize the visual instruction tuning when bringing in the experts.
	""")
	st.markdown(""" """)

	st.image("pages/CuMo/image_2.jpg", use_column_width=True)
	st.markdown(""" """)

	st.markdown("""
	The mixture of experts MLP blocks above are simply the same MLP blocks initialized from the single MLP that was trained during pre-training and fine-tuned in pre-finetuning 👇
	""")
	st.markdown(""" """)

	st.image("pages/CuMo/image_3.jpg", use_column_width=True)
	st.markdown(""" """)

	st.markdown("""
	It works very well (also tested myself) that it outperforms the previous SOTA of it's size <a href='LLaVA-NeXT' target='_self'>LLaVA-NeXT</a>! 😍
	I wonder how it would compare to IDEFICS2-8B You can try it yourself [here](https://t.co/MLIYKVh5Ee).
	""", unsafe_allow_html=True)
	st.markdown(""" """)

	st.image("pages/CuMo/image_4.jpg", use_column_width=True)
	st.markdown(""" """)

	st.info("""
	Ressources:
	[CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts](https://arxiv.org/abs/2405.05949)
	by Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen (2024)
	[GitHub](https://github.com/SHI-Labs/CuMo)""", icon="📚")

	st.markdown(""" """)
	st.markdown(""" """)
	st.markdown(""" """)
	col1, col2, col3 = st.columns(3)
	with col1:
	if st.button('Previous paper', use_container_width=True):
	switch_page("PLLaVA")
	with col2:
	if st.button('Home', use_container_width=True):
	switch_page("Home")
	with col3:
	if st.button('Next paper', use_container_width=True):
	switch_page("DenseConnector")