Excited to share insights from Walmart's groundbreaking semantic search system that revolutionizes e-commerce product discovery!
The team at Walmart Global Technology(the team that I am a part of π¬) has developed a hybrid retrieval system that combines traditional inverted index search with neural embedding-based search to tackle the challenging problem of tail queries in e-commerce.
Key Technical Highlights:
β’ The system uses a two-tower BERT architecture where one tower processes queries and another processes product information, generating dense vector representations for semantic matching.
β’ Product information is enriched by combining titles with key attributes like category, brand, color, and gender using special prefix tokens to help the model distinguish different attribute types.
β’ The neural model leverages DistilBERT with 6 layers and projects the 768-dimensional embeddings down to 256 dimensions using a linear layer, achieving optimal performance while reducing storage and computation costs.
β’ To improve model training, they implemented innovative negative sampling techniques combining product category matching and token overlap filtering to identify challenging negative examples.
Production Implementation Details:
β’ The system uses a managed ANN (Approximate Nearest Neighbor) service to enable fast retrieval, achieving 99% recall@20 with just 13ms latency.
β’ Query embeddings are cached with preset TTL (Time-To-Live) to reduce latency and costs in production.
β’ The model is exported to ONNX format and served in Java, with custom optimizations like fixed input shapes and GPU acceleration using NVIDIA T4 processors.
Results: The system showed significant improvements in both offline metrics and live experiments, with: - +2.84% improvement in NDCG@10 for human evaluation - +0.54% lift in Add-to-Cart rates in live A/B testing
This is a fantastic example of how modern NLP techniques can be successfully deployed at scale to solve real-world e-
After some heated discussion π₯, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community π₯
Multimodal πΌοΈ > Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants π > OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license β¨ > Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs π¬ > Meta released a new iteration of Llama 70B, Llama3.2-70B trained further > EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license π₯ > Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license > Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models > Dataset: FineWeb2 just landed with multilinguality update! π₯ nearly 8TB pretraining data in many languages!
Image/Video Generation πΌοΈ > Tencent released HunyuanVideo, a new photorealistic video generation model > OminiControl is a new editing/control framework for image generation models like Flux
Audio π > Indic-Parler-TTS is a new text2speech model made by community
Keeping up with open-source AI in 2024 = overwhelming.
Here's help: We're launching our Year in Review on what actually matters, starting today!
Fresh content dropping daily until year end. Come along for the ride - first piece out now with @clem's predictions for 2025.
Think of it as your end-of-year AI chocolate calendar.
Kudos to @BrigitteTousi@clefourrier@Wauplin@thomwolf for making it happen. We teamed up with aiworld.eu for awesome visualizations to make this digestibleβit's a charm to work with their team.
Itβs 2nd of December , hereβs your Cyber Monday present π !
Weβre cutting our price down on Hugging Face Inference Endpoints and Spaces!
Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3οΈβ£ months. We have other reductions on all instances ranging from 20 to 50%.