Tiezhen WANG

xianbao

AI & ML interests

This is my personal account

Articles

Organizations

xianbao's activity

posted an update 6 days ago
view post
Post
1141
Why Apache 2.0 Matters for LLMs ๐Ÿค”

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! ๐Ÿš€

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? โฌ‡๏ธ

๐Ÿ“š Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

๐Ÿ‘ฉโ€๐Ÿ’ป Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

๐Ÿ”— Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

๐Ÿšซ No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^
posted an update 6 days ago
view post
Post
934
DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.
posted an update about 1 month ago
posted an update 4 months ago
view post
Post
Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI

With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.

Model: BAAI/bunny-phi-2-siglip-lora

  • 2 replies
ยท
posted an update 4 months ago
view post
Post
There appears to be a huge misunderstanding regarding the licensing requirements for open sourced Chinese speaking speaking LLMs on
@huggingface


I initially shared this misconception too, but after conducting some research, I came up with the list below.

Veryimpressive!

replied to victor's post 4 months ago
replied to victor's post 4 months ago
posted an update 4 months ago
view post
Post
Vision LLM for #edgecomputing ?

@openbmb , who OS'ed the UltraFeedback dataset before, released a series of strong eco-friendly yet powerful LLMs

- MiniCPM: 2B model that competes with Mistral-7B

- MiniCPM-V: 3B vision LLM on edge!
- MiniCPM-V: 3B vision LLM on edge!
replied to osanseviero's post 4 months ago
view reply

Worth trying out!

Learning a lang from another lang family might improve model's capability on appearing unrelated aspects.

The study of French improved my grammar and I wished I could master another language like Arabic / Hindi to see the world from a different angle.