lamhieu (Hieu Lam)

reacted to m-ric's post with 🔥 2 months ago

Post

1168

Emu3: Next-token prediction conquers multimodal tasks 🔥

This is the most important research in months: we’re now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence (BAAI) just released Emu3, a single model that handles text, images, and videos all at once.

𝗪𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲 𝗯𝗶𝗴 𝗱𝗲𝗮𝗹?
🌟 Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token.
And it’s only 8B, but really strong:
🖼️ For image generation, it's matching the best specialized models out there, like SDXL.
👁️ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this.
🎬 It's the first to nail video generation without using complicated diffusion techniques.

𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗶𝘁 𝘄𝗼𝗿𝗸?
🧩 Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens.
🔗 Then, it treats everything - text, images, and videos - as one long series of tokens to predict.
🔮 During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame.

𝗖𝗮𝘃𝗲𝗮𝘁𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀:
👉 In image generation, Emu3 beats SDXL, but it’s also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev.
👉 In vision, authors also don’t show a comparison against all the current SOTA models like Qwen-VL or Pixtral.

This approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data)!

Read the paper 👉 Emu3: Next-Token Prediction is All You Need (2409.18869)

reacted to singhsidhukuldeep's post with 👍 3 months ago

Post

1630

Just wrapped up a deep dive into the latest lecture on building LLMs, such as ChatGPT, from @Stanford CS229 course. Here are my top takeaways:

🔍 Understanding the Components: LLMs like ChatGPT, Claude, and others are more than just neural networks; they are a complex blend of architecture, training loss, data evaluation, and systems. Knowing how these components work together is key to improving and scaling these models.

📊 Scaling Matters: Performance improves predictably with more data, bigger models, and greater computational power. However, balancing these factors is crucial to avoid overfitting and resource waste.

📈 Data is King: LLMs are trained on trillions of tokens scraped from the internet, but the quality of this data matters immensely. Rigorous filtering and deduplication processes are essential to maintaining data integrity.

🏗️ Pre-Training vs. Post-Training: While pre-training equips the model with general knowledge, post-training (like RLHF) fine-tunes it to follow human-like responses, reducing toxic outputs and improving alignment with human values.

🌐 Reinforcement Learning from Human Feedback (RLHF): This technique allows LLMs to maximize outputs that align with human preferences, making models more reliable and accurate.

💡 Why It Matters: Understanding these processes not only helps us appreciate the complexity behind our everyday AI tools but also highlights the challenges and opportunities in the ever-evolving field of AI.

Whether you’re in tech, data science, or just AI-curious, staying updated on these advancements is crucial. LLMs are not just transforming industries; they’re redefining the future of human-computer interaction!

I just realized this was almost 2 hours long...

Link: https://www.youtube.com/watch?v=9vM4p9NN0Ts

3 replies

·

replied to m-ric's post 3 months ago

Sounds interesting but I think there will be a big breakthrough, a new "architecture/methodology/factor/rethinking" for developing large models. That's what I think, I don't know what it is yet, haha.

reacted to m-ric's post with 👍 3 months ago

Post

842

🚀 𝗪𝗵𝗲𝗿𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝗹𝗮𝘄𝘀 𝗮𝗿𝗲 𝘁𝗮𝗸𝗶𝗻𝗴 𝘂𝘀 : 𝗯𝘆 𝟮𝟬𝟮𝟴, 𝗔𝗜 𝗖𝗹𝘂𝘀𝘁𝗲𝗿𝘀 𝘄𝗶𝗹𝗹 𝗿𝗲𝗮𝗰𝗵 𝘁𝗵𝗲 𝗽𝗼𝘄𝗲𝗿 𝗰𝗼𝗻𝘀𝘂𝗺𝗽𝘁𝗶𝗼𝗻 𝗼𝗳 𝗲𝗻𝘁𝗶𝗿𝗲 𝗰𝗼𝘂𝗻𝘁𝗿𝗶𝗲𝘀

Reminder : “Scaling laws” are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.

To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.

🏎️ As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.

The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So let’s take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the “super AI cluster” of Elon Musk’s xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.

Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
➡️ Three key ingredients of training might be a roadblock to scaling up :
💸 Money: but it’s very unlikely, given the potential market size for AGI, that investors lose interest.
⚡️ Energy supply at a specific location
📚 Training data: we’re already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.

🤔 I’d be curious to hear your thoughts: do you think we’ll race all the way there?

3 replies

·

posted an update 3 months ago

Post

1692

🎯 Ghost 8B Beta 1608: Empowering Your AI Assistant
📦 Unlock the Power of Ghost 8B Beta 1608: Build Your Personal AI Companion
Ghost 8B Beta 1608 empowers you to create a safe and multilingual AI assistant tailored to your needs, directly on your personal computer. 🧑‍💻 Leverage AI's capabilities within your own space! 🚀 Ghost 8B Beta 1608 is ready to become your AI companion.
~
📦 개인용 AI 보조 도구로 Ghost 8B Beta 1608를 활용하세요!
Ghost 8B Beta 1608, AI의 힘을 활용하여 안전하고 개인화된 언어 지원을 제공하는 AI 보조 도구를 직접 구축할 수 있습니다. 🧑‍💻 개인 컴퓨터에서 AI의 혜택을 누리세요! 🚀 Ghost 8B Beta 1608는 당신의 AI 파트너가 될 준비가 되어 있습니다.
lamhieu/ghost-8b-beta-8k
ghost-x/ghost-8b-beta-668ead6179f93be717db4542

posted an update 4 months ago

Post

3209

🚀 We’re excited to launch Ghost 8B Beta (1608), a top-performing language model with unmatched multilingual support and cost efficiency.

Key Highlights:
- Superior Performance: Outperforms Llama 3.1 8B Instruct, GPT-3.5 Turbo, Claude 3 Opus, GPT-4, and more in winrate scores.
- Expanded Language Support: Now supports 16 languages, including English, Vietnamese, Spanish, Chinese, and more.
- Enhanced Capabilities: Improved math, reasoning, and instruction-following for better task handling.

With two context options (8k and 128k), Ghost 8B Beta is perfect for complex, multilingual applications, balancing power and cost-effectiveness.

🔗 Learn More: https://ghost-x.org/docs/models/ghost-8b-beta
ghost-x/ghost-8b-beta-668ead6179f93be717db4542

reacted to Xenova's post with 🔥 4 months ago

Post

14896

I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️

Install it from NPM with:
𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo: webml-community/segment-anything-webgpu

5 replies

·

replied to their post 4 months ago

thanks @danielus 🤗

replied to their post 4 months ago

@Dihelson @llama-anon @AIWizard76 @danielus
🎉 Ghost 8B Beta Released: Game-Changing Language Model

Ghost 8B Beta is a groundbreaking language model developed with a clear vision: to deliver exceptional multilingual support, superior knowledge capabilities, and all while remaining cost-effective. This model comes in two context length variations, 8k and 128k, ensuring flexibility for various tasks. Moreover, it boasts built-in multilingual functionality, making it a powerful tool for global communication and understanding.

See detailed article: https://huggingface.co/blog/lamhieu/ghost-8b-beta-released-game-changing-language-mode
Model card: https://huggingface.co/ghost-x/ghost-8b-beta
Official website: https://ghost-x.org/docs/models/ghost-8b-beta

replied to their post 4 months ago

🎉 Ghost 8B Beta Released: Game-Changing Language Model

Ghost 8B Beta is a groundbreaking language model developed with a clear vision: to deliver exceptional multilingual support, superior knowledge capabilities, and all while remaining cost-effective. This model comes in two context length variations, 8k and 128k, ensuring flexibility for various tasks. Moreover, it boasts built-in multilingual functionality, making it a powerful tool for global communication and understanding.

See detailed article: https://huggingface.co/blog/lamhieu/ghost-8b-beta-released-game-changing-language-mode
Model card: https://huggingface.co/ghost-x/ghost-8b-beta
Official website: https://ghost-x.org/docs/models/ghost-8b-beta

posted an update 4 months ago

Post

2105

🎉 Ghost 8B Beta Released: Game-Changing Language Model
--
Ghost 8B Beta is a groundbreaking language model developed with a clear vision: to deliver exceptional multilingual support, superior knowledge capabilities, and all while remaining cost-effective. This model comes in two context length variations, 8k and 128k, ensuring flexibility for various tasks. Moreover, it boasts built-in multilingual functionality, making it a powerful tool for global communication and understanding.
--
* See detailed article: https://huggingface.co/blog/lamhieu/ghost-8b-beta-released-game-changing-language-mode
* Model card: ghost-x/ghost-8b-beta
* Official website: https://ghost-x.org/docs/models/ghost-8b-beta

posted an update 5 months ago

Post

2120

🤯 Ghost 8B Beta emerges as a clear leader, surpassing even proprietary models like xAI Grok 1, OpenAI GPT 3.5, and Mistral Mixtral 8x7B. This dominance extends to its parity with Mistral Medium, further solidifying its position as a top-tier language model. Furthermore, Ghost 8B Beta stands out as one of only three models employing the zero-shot method for evaluation, alongside Claude 2 and Claude 3, showcasing its unique capabilities and potential for groundbreaking applications.
---
💬 Chat with the model here:
- Playground with Ghost 8B Beta (β, 8k): lamhieu/ghost-8b-beta-8k
- Playground with Ghost 8B Beta (β, 128k): lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta/

2 replies

·

replied to their post 5 months ago

Thank you for your dedication, it sounds great. Here I would like to share some additional information and perspectives so that everyone can better understand the issues we address:

With language models, when applying in practice we only need it to be understood at 80% or a good overview and combining with RAG will bring better accuracy. So, here we will need a good level of truth telling model and the ability to understand and work with RAG at a very good level to be most effective.
In Italian, I'm very happy when it speaks well, it proves that my training method and source code for it were correct because it's actually live with the d0x5 version. This is all because Italian was only added later (at the same time as German), responding to the fact that sometimes it can only be described as a translation mays.
With the ability to reason, I hope you don't misunderstand. It still works well, just when compared to some current superior models like GPT 4o or Claude 3, there will be some songs where it will "lose". It still outperforms a lot of other much larger models. For example, the question "Andrew is free from 11 am to 3 pm, Joanne is free from noon to 2 pm and then 3:30 pm to 5 pm. Hannah is available at noon for half an hour, and then 4 pm to 6 pm. What are some options for start times for a 30 minute meeting for Andrew, Hannah, and Joanne?" taken from OpenAI GPT4 home page.

One note: in reasoning tests, models often set the temperature to 0, with Ghost 8B Beta we always set it to 0.1 as the lowest. The reason is simple because if at this level the model still reasons well, then at level 0.4 (the default level of the current chat) it will still often achieve the same results, and we want to aim for practical efficiency. rather than scores. Let's try to lower the temperature with some reasoning questions to experiment.

After all, you guys are great, thank you so much everyone.

An example of reasoning about time:

An example of a long context with extensive summary capabilities: Paper: Point out the highlights and identify the ideal people to apply it..

replied to their post 5 months ago

@Dihelson It's probably because you told the model to do it again. Try telling the model to change each word. Of course, it could still be because the model misunderstood.

replied to their post 5 months ago

Try the following conversation: (1) ask to write an article -> (2) ask to translate the article into the languages you want.

replied to their post 5 months ago

@AIWizard76 It hasn't gone through any real eval tests to be able to compare, but if we're just talking about ghost 8b beta, it has good translation capabilities for supported languages. It works well for translating long texts and also translating into multiple languages simultaneously.

replied to their post 5 months ago

It's simple, currently the base version will not try to lengthen the text and be more "obedient". Maybe tomorrow or the next day I'll put it up for everyone to try.
Note, the current version is running everything from version "disl-0x5", the new version will improve a lot but it may not be ready right now.

replied to their post 5 months ago

thank you for your comments and encouragement 🤗
another question, how do you feel when conversing in Italian?

replied to their post 5 months ago

@danielus let me ask, is this what you want?

replied to their post 5 months ago

@danielus I noticed the explanation model because this is what the chat version (ft from ghost 8b beta, base) does for the chat task (base will not try to explain and will respect the system more strictly). The goal of answering with more information is to help users avoid having to learn more or get side answers from just one question. Of course, this can sometimes be a hassle, we'll try to balance it out.

Hieu Lam

AI & ML interests

Recent Activity

Articles

Ghost 8B Beta Released: Game-Changing Language Model

Introducing Ghost 8B Beta: A Game-Changing Language Model

Organizations

lamhieu's activity