Kuldeep Singh Sidhu

singhsidhukuldeep

https://singhsidhukuldeep.github.io

AI & ML interests

😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update 2 days ago

Groundbreaking Research Alert: Revolutionizing Document Ranking with Long-Context LLMs Researchers from Renmin University of China and Baidu Inc . have introduced a novel approach to document ranking that challenges conventional sliding window methods. Their work demonstrates how long-context Large Language Models can process up to 100 documents simultaneously, achieving superior performance while reducing API costs by 50%. Key Technical Innovations: - Full ranking strategy enables processing all passages in a single inference - Multi-pass sliding window approach for comprehensive listwise label construction - Importance-aware learning objective that prioritizes top-ranked passage IDs - Support for context lengths up to 128k tokens using models like LLaMA 3.1-8B-Instruct Performance Highlights: - 2.2 point improvement in NDCG@10 metrics - 29.3% reduction in latency compared to traditional methods - Significant API cost savings through elimination of redundant passage processing Under the hood, the system leverages advanced long-context LLMs to perform global interactions among passages, enabling more nuanced relevance assessment. The architecture incorporates a novel importance-aware loss function that assigns differential weights based on passage ranking positions. The research team's implementation demonstrated remarkable versatility across multiple datasets, including TREC DL and BEIR benchmarks. Their fine-tuned model, RankMistral, showcases the practical viability of full ranking approaches in production environments. This advancement marks a significant step forward in information retrieval systems, offering both improved accuracy and computational efficiency. The implications for search engines and content recommendation systems are substantial.

posted an update 7 days ago

Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: 🚀 Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512×512 pixels with 14×14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage ⚡️ Under The Hood: - Multi-stage training process with progressive resolution scaling (224→384→512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample 📊 Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) 🎯 Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

posted an update 8 days ago

Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems! Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable: >> Technical Deep Dive Architecture Overview • The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context • Sparse features are processed using learnable embeddings with size based on feature cardinality • User sequence embeddings are generated using a transformer architecture processing past engagements Feature Processing Pipeline • Dense features undergo normalization for numerical stability • Sparse and embedding features receive L2 normalization • All features are concatenated into a single feature embedding Key Innovations • Implemented parallel MaskNet layers with 3 blocks • Used projection ratio of 2.0 and output dimension of 512 • Stacked 4 DCNv2 layers on top for higher-order interactions Performance Improvements • Achieved +1.42% increase in Homefeed Save Volume • Boosted Overall Time Spent by +0.39% • Maintained memory consumption increase to just 5% >> Industry Constraints Addressed Memory Management • Optimized for 60% GPU memory utilization • Prevented OOM errors while maintaining batch size efficiency Latency Optimization • Removed input-output concatenation before MLP • Reduced hidden layer sizes in MLP • Achieved zero latency increase while improving performance System Stability • Ensured reproducible results across retraining • Maintained model stability across different data distributions • Successfully deployed in production environment This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!

View all activity

Organizations

singhsidhukuldeep's activity

posted an update 2 days ago

Post

1854

Groundbreaking Research Alert: Revolutionizing Document Ranking with Long-Context LLMs

Researchers from Renmin University of China and Baidu Inc . have introduced a novel approach to document ranking that challenges conventional sliding window methods. Their work demonstrates how long-context Large Language Models can process up to 100 documents simultaneously, achieving superior performance while reducing API costs by 50%.

Key Technical Innovations:
- Full ranking strategy enables processing all passages in a single inference
- Multi-pass sliding window approach for comprehensive listwise label construction
- Importance-aware learning objective that prioritizes top-ranked passage IDs
- Support for context lengths up to 128k tokens using models like LLaMA 3.1-8B-Instruct

Performance Highlights:
- 2.2 point improvement in NDCG@10 metrics
- 29.3% reduction in latency compared to traditional methods
- Significant API cost savings through elimination of redundant passage processing

Under the hood, the system leverages advanced long-context LLMs to perform global interactions among passages, enabling more nuanced relevance assessment. The architecture incorporates a novel importance-aware loss function that assigns differential weights based on passage ranking positions.

The research team's implementation demonstrated remarkable versatility across multiple datasets, including TREC DL and BEIR benchmarks. Their fine-tuned model, RankMistral, showcases the practical viability of full ranking approaches in production environments.

This advancement marks a significant step forward in information retrieval systems, offering both improved accuracy and computational efficiency. The implications for search engines and content recommendation systems are substantial.

posted an update 7 days ago

Post

2139

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

posted an update 8 days ago

Post

1242

Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems!

Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable:

>> Technical Deep Dive

Architecture Overview
• The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context
• Sparse features are processed using learnable embeddings with size based on feature cardinality
• User sequence embeddings are generated using a transformer architecture processing past engagements

Feature Processing Pipeline
• Dense features undergo normalization for numerical stability
• Sparse and embedding features receive L2 normalization
• All features are concatenated into a single feature embedding

Key Innovations
• Implemented parallel MaskNet layers with 3 blocks
• Used projection ratio of 2.0 and output dimension of 512
• Stacked 4 DCNv2 layers on top for higher-order interactions

Performance Improvements
• Achieved +1.42% increase in Homefeed Save Volume
• Boosted Overall Time Spent by +0.39%
• Maintained memory consumption increase to just 5%

>> Industry Constraints Addressed

Memory Management
• Optimized for 60% GPU memory utilization
• Prevented OOM errors while maintaining batch size efficiency

Latency Optimization
• Removed input-output concatenation before MLP
• Reduced hidden layer sizes in MLP
• Achieved zero latency increase while improving performance

System Stability
• Ensured reproducible results across retraining
• Maintained model stability across different data distributions
• Successfully deployed in production environment

This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!

posted an update 10 days ago

Post

3587

Exciting breakthrough in AI: @Meta 's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!

The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:

>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.

Three-Component Architecture:
• Lightweight Local Encoder that converts bytes to patch representations
• Powerful Global Latent Transformer that processes patches
• Local Decoder that converts patches back to bytes

>> Technical Advantages
• Matches performance of Llama 3 at 8B parameters while being more efficient
• Superior handling of non-English languages and rare character sequences
• Remarkable 99.9% accuracy on spelling tasks
• Better scaling properties than token-based models

>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.

This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!

2 replies

posted an update 18 days ago

Post

1231

Groundbreaking Research Alert: The 'H' in HNSW Stands for "Hubs", Not "Hierarchy"!

Fascinating new research reveals that the hierarchical structure in the popular HNSW (Hierarchical Navigable Small World) algorithm - widely used for vector similarity search - may be unnecessary for high-dimensional data.

🔬 Key Technical Findings:

• The hierarchical layers in HNSW can be completely removed for vectors with dimensionality > 32, with no performance loss

• Memory savings of up to 38% achieved by removing the hierarchy

• Performance remains identical in both median and tail latency cases across 13 benchmark datasets

🛠️ Under The Hood:
The researchers discovered that "hub highways" naturally form in high-dimensional spaces. These hubs are well-connected nodes that are frequently traversed during searches, effectively replacing the need for explicit hierarchical layers.

The hub structure works because:
• A small subset of nodes appear disproportionately in nearest neighbor lists
• These hub nodes form highly connected subgraphs
• Queries naturally traverse through these hubs early in the search process
• The hubs efficiently connect distant regions of the graph

💡 Industry Impact:
This finding has major implications for vector databases and similarity search systems. Companies can significantly reduce memory usage while maintaining performance by implementing flat navigable small world graphs instead of hierarchical ones.

🚀 What's Next:
The researchers have released FlatNav, an open-source implementation of their flat navigable small world approach, enabling immediate practical applications of these findings.

posted an update 19 days ago

Post

457

Fascinating new research alert! Just read a groundbreaking paper on understanding Retrieval-Augmented Generation (RAG) systems and their performance factors.

Key insights from this comprehensive study:

>> Architecture Deep Dive
The researchers analyzed RAG systems across 6 datasets (3 code-related, 3 QA-focused) using multiple LLMs. Their investigation revealed critical insights into four key design factors:

Document Types Impact:
• Oracle documents (ground truth) aren't always optimal
• Distracting documents significantly degrade performance
• Surprisingly, irrelevant documents boost code generation by up to 15.6%

Retrieval Precision:
• Performance varies dramatically by task
• QA tasks need 20-100% retrieval recall
• Perfect retrieval still fails up to 12% of the time on previously correct instances

Document Selection:
• More documents ≠ better results
• Adding documents can cause errors on previously correct samples
• Performance degradation increases ~1% per 5 additional documents in code tasks

Prompt Engineering:
• Most advanced prompting techniques underperform simple zero-shot prompts
• Technique effectiveness varies significantly across models and tasks
• Complex prompts excel at difficult problems but struggle with simple ones

>> Technical Implementation
The study utilized:
• Multiple retrievers including BM25, dense retrievers, and specialized models
• Comprehensive corpus of 70,956 unique API documents
• Over 200,000 API calls and 1,000+ GPU hours of computation
• Sophisticated evaluation metrics tracking both correctness and system confidence

💡 Key takeaway: RAG system optimization requires careful balancing of multiple factors - there's no one-size-fits-all solution.

1 reply

posted an update 20 days ago

Post

1801

Exciting new research alert! 🚀 A groundbreaking paper titled "Understanding LLM Embeddings for Regression" has just been released, and it's a game-changer for anyone working with large language models (LLMs) and regression tasks.

Key findings:

1. LLM embeddings outperform traditional feature engineering in high-dimensional regression tasks.

2. LLM embeddings preserve Lipschitz continuity over feature space, enabling better regression performance.

3. Surprisingly, factors like model size and language understanding don't always improve regression outcomes.

Technical details:

The researchers used both T5 and Gemini model families to benchmark embedding-based regression. They employed a key-value JSON format for string representations and used average-pooling to aggregate Transformer outputs.

The study introduced a novel metric called Normalized Lipschitz Factor Distribution (NLFD) to analyze embedding continuity. This metric showed a high inverse relationship between the skewedness of the NLFD and regression performance.

Interestingly, the paper reveals that applying forward passes of pre-trained models doesn't always significantly improve regression performance for certain tasks. In some cases, using only vocabulary embeddings without a forward pass yielded comparable results.

The research also demonstrated that LLM embeddings are dimensionally robust, maintaining strong performance even with high-dimensional data where traditional representations falter.

This work opens up exciting possibilities for using LLM embeddings in various regression tasks, particularly those with high degrees of freedom. It's a must-read for anyone working on machine learning, natural language processing, or data science!

posted an update 22 days ago

Post

2074

Exciting breakthrough in E-commerce Recommendation Systems!

Just read a fascinating paper from @eBay 's research team on "LLM-PKG" - a novel approach that combines Large Language Models with Product Knowledge Graphs for explainable recommendations.

Here's what makes it groundbreaking:

>> Technical Architecture
- The system uses a two-module approach: offline construction and online serving
- LLM generates initial product relationships and rationales, which are transformed into RDF triplets (Subject, Predicate, Object) to build the knowledge graph
- The system employs rigorous validation using LLM-based scoring (1-10 scale) to evaluate recommendation quality and prune low-quality nodes (score < 6)

>> Under the Hood
- Product mapping uses BERT embeddings and KNN indexing for semantic matching between LLM recommendations and actual inventory
- The system caches graph triplets in key-value databases for lightning-fast retrieval during online serving
- Supports both item-centric and user-centric recommendation scenarios

>> Real-World Impact
The A/B testing results are impressive:
- 5.19% increase in clicks
- 7.59% boost in transactions
- 8.56% growth in Gross Merchandise Bought
- 10.84% increase in ad revenue

This is a game-changer for e-commerce platforms looking to provide transparent, explainable recommendations while maintaining high performance at scale.

posted an update 24 days ago

Post

1252

Exciting breakthrough in AI Recommendation Systems! Just read a fascinating paper from Meta AI and UW-Madison researchers on unifying generative and dense retrieval methods for recommendations.

The team introduced LIGER (LeveragIng dense retrieval for GEnerative Retrieval), a novel hybrid approach that combines the best of both worlds:

Key Technical Innovations:
- Integrates semantic ID-based generative retrieval with dense embedding methods
- Uses a T5 encoder-decoder architecture with 6 layers, 6 attention heads, and 128-dim embeddings
- Processes item attributes through sentence-T5-XXL for text representations
- Employs a dual-objective training approach combining cosine similarity and next-token prediction
- Implements beam search with size K for candidate generation
- Features an RQ-VAE with 3-layer MLP for semantic ID generation

Performance Highlights:
- Significantly outperforms traditional methods on cold-start recommendations
- Achieves state-of-the-art results on major benchmark datasets (Amazon Beauty, Sports, Toys, Steam)
- Reduces computational complexity from O(N) to O(tK) where t is semantic ID count
- Maintains minimal storage requirements while improving recommendation quality

The most impressive part? LIGER effectively solves the cold-start problem that has long plagued recommendation systems while maintaining computational efficiency.

This could be a game-changer for e-commerce platforms and content recommendation systems!

What are your thoughts on hybrid recommendation approaches?

posted an update 25 days ago

Post

281

Exciting breakthrough in Search Engine Technology! Just read a fascinating paper on "Best Practices for Distilling Large Language Models into BERT for Web Search Ranking" from @TencentGlobal

Game-Changing Innovation: DisRanker
A novel distillation pipeline that combines the power of Large Language Models with BERT's efficiency for web search ranking - now deployed in commercial search engines!

Key Technical Highlights:
• Implements domain-specific Continued Pre-Training using clickstream data, treating queries as inputs to generate clicked titles and summaries
• Uses an end-of-sequence token to represent query-document pairs during supervised fine-tuning
• Employs hybrid Point-MSE and Margin-MSE loss for knowledge distillation, optimizing both absolute scores and relative rankings

Under the Hood:
- The system first pre-trains on massive clickstream data (59M+ query-document pairs)
- Transfers ranking expertise from a 7B parameter LLM to a compact BERT model
- Reduces inference latency from ~100ms to just 10ms while maintaining performance
- Achieves significant improvements:
• +0.47% PageCTR
• +0.58% UserCTR
• +1.2% Dwell Time

Real-World Impact:
Successfully integrated into production search systems as of February 2024, demonstrating that academic research can translate into practical industry solutions

What are your thoughts on this breakthrough?

posted an update 26 days ago

Post

379

Exciting Research Alert: Revolutionizing Recommendation Systems with PSL (Pairwise Softmax Loss)!

I just read a fascinating paper that introduces PSL - a groundbreaking approach to improve recommendation systems. Here's why this matters:

>> Key Innovations

Core Concept: PSL reimagines the traditional Softmax Loss by viewing it through a pairwise perspective, addressing two critical limitations of current systems:
- The loose connection between Softmax Loss and ranking metrics like DCG
- High sensitivity to false negative instances

Technical Implementation:
- Replaces exponential functions with alternative activation functions (Tanh, Atan, ReLU)
- Reformulates loss calculation from a pairwise perspective
- Integrates Distributionally Robust Optimization (DRO) principles

>> Real-World Impact

Enhanced Performance:
- Tighter surrogate for ranking metrics
- Better balance in data contribution weights
- Improved robustness against false negatives
- Superior handling of out-of-distribution scenarios

Practical Applications:
- E-commerce recommendations
- Content discovery systems
- Personalized service platforms

>> Implementation Benefits

The beauty of PSL lies in its simplicity - it requires minimal code modifications while delivering significant improvements in:
- Recommendation accuracy
- System robustness
- Training stability
- Distribution shift handling

This research opens new possibilities for building more reliable and accurate recommendation systems. The code is available on GitHub for those interested in implementation.

What are your thoughts on this approach? Have you encountered similar challenges in recommendation systems?

posted an update 27 days ago

Post

1304

Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and Bloomberg have developed M3DocRAG, a revolutionary framework for multi-modal document understanding.

The innovation lies in its ability to handle complex document scenarios that traditional systems struggle with:
- Process 40,000+ pages across 3,000+ documents
- Answer questions requiring information from multiple pages
- Understand visual elements like charts, tables, and figures
- Support both closed-domain (single document) and open-domain (multiple documents) queries

Under the hood, M3DocRAG operates through three sophisticated stages:

>> Document Embedding:
- Converts PDF pages to RGB images
- Uses ColPali to project both text queries and page images into a shared embedding space
- Creates dense visual embeddings for each page while maintaining visual information integrity

>> Page Retrieval:
- Employs MaxSim scoring to compute relevance between queries and pages
- Implements inverted file indexing (IVFFlat) for efficient search
- Reduces retrieval latency from 20s to under 2s when searching 40K+ pages
- Supports approximate nearest neighbor search via Faiss

>> Question Answering:
- Leverages Qwen2-VL 7B as the multi-modal language model
- Processes retrieved pages through a visual encoder
- Generates answers considering both textual and visual context

The results are impressive:
- State-of-the-art performance on MP-DocVQA benchmark
- Superior handling of non-text evidence compared to text-only systems
- Significantly better performance on multi-hop reasoning tasks

This is a game-changer for industries dealing with large document volumes—finance, healthcare, and legal sectors can now process documents more efficiently while preserving crucial visual context.

4 replies

posted an update 28 days ago

Post

1104

Exciting breakthrough in multimodal search technology! @nvidia researchers have developed MM-Embed, a groundbreaking universal multimodal retrieval system that's changing how we think about search.

Key innovations:
• First-ever universal multimodal retriever that excels at both text and image searches across diverse tasks
• Leverages advanced multimodal LLMs to understand complex queries combining text and images
• Implements novel modality-aware hard negative mining to overcome modality bias issues
• Achieves state-of-the-art performance on M-BEIR benchmark while maintaining superior text retrieval capabilities

Under the hood:
The system uses a sophisticated bi-encoder architecture with LLaVa-Next (based on Mistral 7B) as its backbone. It employs a unique two-stage training approach: first with random negatives, then with carefully mined hard negatives to improve cross-modal understanding.

The real magic happens in the modality-aware negative mining, where the system learns to distinguish between incorrect modality matches and unsatisfactory information matches, ensuring retrieved results match both content and format requirements.

What sets it apart is its ability to handle diverse search scenarios - from simple text queries to complex combinations of images and text, all while maintaining high accuracy across different domains

posted an update 30 days ago

Post

938

Excited to share @LinkedIn 's innovative approach to evaluating semantic search quality! As part of the Search AI team, we've developed a groundbreaking evaluation pipeline that revolutionizes how we measure search relevance.

>> Key Innovation: On-Topic Rate (OTR)
This novel metric measures the semantic match between queries and search results, going beyond simple keyword matching. The system evaluates whether content is truly relevant to the query's intent, not just matching surface-level terms.

>> Technical Implementation Details
Query Set Construction
• Golden Set: Contains curated top queries and complex topical queries
• Open Set: Includes trending queries and random production queries for diversity

Evaluation Pipeline Architecture
1. Query Processing:
- Retrieves top 10 documents per query
- Extracts post text and article information
- Processes both primary content and reshared materials

2. GAI Integration:
- Leverages GPT-3.5 with specialized prompts
- Produces three key outputs:
- Binary relevance decision
- Relevance score (0-1 range)
- Decision reasoning

Quality Assurance
• Validation achieved 94.5% accuracy on a test set of 600 query-post pairs
• Human evaluation showed 81.72% consistency with expert annotators

>> Business Impact
This system now serves as LinkedIn's benchmark for content search experiments, enabling:
• Weekly performance monitoring
• Rapid offline testing of new ML models
• Systematic identification of improvement opportunities

What are your thoughts on semantic search evaluation?

posted an update about 1 month ago

Post

425

Good folks ask Google have released a paper on CAT4D, a cutting-edge framework that's pushing the boundaries of multi-view video generation. Probably coming to Google Photos near you!

This innovative approach introduces a novel way to create dynamic 4D content with unprecedented control and quality.

Key Technical Innovations:
- Multi-View Video Diffusion Model (MVVM) architecture that handles both spatial and temporal dimensions simultaneously
- Zero-shot text-to-4D generation pipeline
- Temporal-aware attention mechanisms for consistent motion synthesis
- View-consistent generation across multiple camera angles

Technical Deep Dive:
The framework employs a sophisticated cascade of diffusion models that work in harmony to generate consistent content across both space and time. The architecture leverages view-dependent rendering techniques while maintaining temporal coherence through specialized attention mechanisms.

What sets CAT4D apart:
- Real-time view synthesis capabilities
- Seamless integration of temporal and spatial information
- Advanced motion handling through specialized temporal encoders
- Robust view consistency preservation across generated frames

Thoughts on how this could transform content creation in your industry?

posted an update about 1 month ago

Post

973

Exciting breakthrough in AI Hallucination Detection & Mitigation! THaMES (Tool for Hallucination Mitigations and EvaluationS), a groundbreaking end-to-end framework tackling one of AI's biggest challenges: hallucination in Large Language Models.

Key Technical Features:

• Automated QA Testset Generation using weighted sampling and batch processing
- Implements VectorStoreIndex for knowledge base construction
- Uses text-embedding-large-3 for semantic similarity
- Generates 6 question types: simple, reasoning, multi-context, situational, distracting, and double

• Advanced Hallucination Detection
- Utilizes fine-tuned NLI (deberta-v3-base-tasksource-nli)
- Implements HHEM-2.1-Open for factual consistency scoring
- Combines entailment and factual consistency for ensemble scoring

• Multiple Mitigation Strategies
- In-Context Learning with Chain-of-Verification (CoVe)
- Retrieval-Augmented Generation (RAG)
- Parameter-Efficient Fine-Tuning (PEFT) using LoRA

Real-world Results:
- GPT-4o showed significant improvement with RAG
- Llama-3.1 performed better with In-Context Learning
- PEFT significantly improved Llama-3.1's hallucination metrics

Why it matters:
This framework sets a new standard for reliable AI development by providing comprehensive tools to evaluate and mitigate hallucinations in LLMs. Perfect for AI researchers, developers, and organizations focused on building trustworthy AI systems

posted an update about 1 month ago

Post

371

Good folks from @amazon , @Stanford , and other great institutions have released “A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models!”

This comprehensive survey examines over 32 cutting-edge techniques to combat hallucination in Large Language Models (LLMs). As LLMs become increasingly integral to our daily operations, addressing their tendency to generate ungrounded content is crucial.

Retrieval-Augmented Generation (RAG) Innovations:
- Pre-generation retrieval using LLM-Augmenter with Plug-and-Play modules
- Real-time verification through the EVER framework implementing three-stage validation
- Post-generation refinement via the RARR system for automated attribution

Advanced Decoding Strategies:
- Context-Aware Decoding (CAD) utilizing contrastive output distribution
- DoLa's innovative approach of contrasting logit differences between transformer layers

Knowledge Integration Methods:
- The RHO framework leveraging entity representations and relation predicates
- FLEEK's intelligent fact verification system using curated knowledge graphs

Novel Loss Functions:
- Text Hallucination Regularization (THR) derived from mutual information
- The mFACT metric for evaluating faithfulness in multilingual contexts

This research provides a structured taxonomy for categorizing these mitigation techniques, offering valuable insights for practitioners and researchers working with LLMs.

What are your thoughts on hallucination mitigation in LLMs?

replied to their post about 1 month ago

https://huggingface.co/posts/singhsidhukuldeep/479387783609004

posted an update about 1 month ago

Post

776

Excited to share my analysis of the most groundbreaking DCN-V2 paper from @Google , which introduces significant improvements to deep learning recommendation systems!

Key technical highlights:

>> Core Architecture
- Starts with an embedding layer that handles both sparse categorical and dense features
- Unique capability to handle variable embedding sizes from small to large vocabulary sizes
- Cross network creates explicit bounded-degree feature interactions
- Deep network complements with implicit feature interactions
- Two combination modes: stacked and parallel architectures

>> Key Technical Innovations
- Enhanced cross layers with full matrix-based feature interaction learning instead of vector-based
- Mixture of Low-Rank architecture with:
* Multiple expert networks learning in different subspaces
* Dynamic gating mechanism to adaptively combine experts
* Efficient time complexity when specific conditions are met
* Support for non-linear transformations in projected spaces

>> Production Optimizations
- Low-rank matrix approximation leveraging singular value decay patterns
- Mixture-of-Experts decomposition into smaller subspaces
- Efficient parameter allocation between cross and deep networks
- Automatic feature interaction learning for higher-order interactions in multi-layered networks
- Support for both homogeneous and heterogeneous polynomial patterns

>> Real-World Impact
- Successfully deployed across Google's recommendation systems
- Significant gains in both offline accuracy and online metrics
- Better performance-latency tradeoffs through low-rank approximations
- Proven effectiveness on large-scale data with billions of training examples

This represents a major leap forward in making deep learning recommendation systems more practical and efficient at scale.

Thoughts? Would love to hear your experiences implementing similar architectures in production!

posted an update about 1 month ago

Post

912

It's always exciting to revisit Google's DCN paper—impractical but good!

Deep & Cross Network (DCN) - a groundbreaking approach to click-through rate prediction that's revolutionizing digital advertising!

Key Innovation:
DCN introduces a novel cross-network architecture that automatically learns feature interactions without manual engineering. What sets it apart is its ability to explicitly model bounded-degree feature crossings while maintaining the power of deep neural networks.

Technical Deep Dive:
- The architecture combines a cross network with a deep network in parallel.
- The cross network performs automatic feature crossing at each layer.
- The embedding layer transforms sparse categorical features into dense vectors.
- Cross layers use a unique formula that enables efficient high-degree polynomial feature interactions.
- Memory-efficient design with linear complexity O(d) in the input dimension.

Performance Highlights:
- Outperforms traditional DNN models with 60% less memory usage.
- Achieved 0.4419 logloss on the Criteo Display Ads dataset.
- Consistently performs better than state-of-the-art models like Deep Crossing and Factorization Machines.
- Exceptional performance on non-CTR tasks like Forest Covertype (97.40% accuracy).

Under the Hood:
- Uses embedding vectors of dimension 6 × (category cardinality)^1/4.
- Implements batch normalization and the Adam optimizer.
- The cross network depth determines the highest polynomial degree of feature interactions.
- An efficient projection mechanism reduces cubic computational cost to linear.
- Parameter sharing enables better generalization to unseen feature interactions.

Key Advantages:
1. No manual feature engineering required.
2. Explicit feature crossing at each layer.
3. Highly memory-efficient.
4. Scalable to web-scale data.
5. Robust performance across different domains.

Thoughts on how this could transform digital advertising?

2 replies