Kuldeep Singh Sidhu's picture
2

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

None yet

Organizations

Posts 11

view post
Post
699
Good folks at @Meta have introduced Chameleon ๐ŸฆŽ (who names these things? ๐Ÿคทโ€โ™‚๏ธ)

Chameleon is an AI model that can work with multiple types of data, like text and images, all at once. ๐Ÿ–ผ๏ธ๐Ÿ“

Before you start searching, as of this post, the model/code have not been open-sourced nor is there any commitment to open-source... sorry! ๐Ÿšซ๐Ÿ”“

Still, here is the technical stuff:

๐Ÿ‘‰ Challenges with Current Systems:

๐Ÿ“‰ Fragmentation: Current multimodal models are often specialized for either text or image tasks, lacking unified approaches.

๐Ÿ“Š Scalability: Existing systems struggle with scaling to handle complex, mixed-modal tasks without significant performance degradation.

๐Ÿ”„ Alignment: Aligning textual and visual modalities remains a technical challenge, often requiring separate processing pipelines.

๐Ÿ‘‰ Objective:

๐ŸŽฏ Unified Modeling: Develop a single model capable of handling various multimodal tasks (text generation, image generation, image captioning, visual question answering) seamlessly.

๐Ÿ‘‰ How It's Done ๐Ÿ“˜

Early-Fusion Architecture ๐Ÿง : Utilizes an early-fusion token-based approach to integrate text and image data from the beginning.

Stable Training ๐Ÿ’ช: Implements a tailored alignment recipe and specific architectural parameterization to ensure stability in mixed-modal settings.

Broad Evaluation ๐Ÿ“Š: Assesses the model across various tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed-modal generation.

๐Ÿ‘‰ Results: (Fun fact they mention Llava-1.5 in comparison but never really share the results)

๐Ÿ† Performance: Chameleon achieves state-of-the-art results in image captioning and outperforms models like Llama-2 in text-only tasks.

โš–๏ธ Competitiveness: It shows competitive performance with models such as Mixtral 8x7B and Gemini-Pro.

๐Ÿ‘ฉโ€โš–๏ธ Human Judgments: Matches or exceeds the performance of larger models, including Gemini Pro and GPT-4V

Paper: Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)
view post
Post
876
Tired of writing Pandas code? ๐Ÿ˜ฉ

If you are using VS Code, now you can use Data Wrangler from @Microsoft ! ๐Ÿš€

It will convert your Pandas DataFrame to a rich and interactive user interface to view and analyze your data ๐Ÿ“Š, show insightful column statistics and visualizations ๐Ÿ“ˆ, and automatically generate Pandas code as you clean and transform the data. ๐Ÿงน๐Ÿ”„

Supports everything you can think of...

- Data view ๐Ÿ‘€
- Data cleaning ๐Ÿงผ
- Data filtering ๐Ÿ”
- Data summary/statistics ๐Ÿ“Š
- Data transformation ๐Ÿ”„
- Data missing values treatment โ“
- Adding new fields โž•

Everything in a simple open-source extension ๐ŸŒŸ

https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler

PS: I love Pandas ๐Ÿผ, never tired of it... still, this is cool! ๐Ÿ˜Ž

models

None public yet

datasets

None public yet