Alex's picture
1 4

Alex

alt2023

AI & ML interests

None yet

Recent Activity

reacted to singhsidhukuldeep's post with ๐Ÿ‘ 4 days ago
Excited to share groundbreaking research from @Baidu_Inc on enterprise information search! The team has developed EICopilot, a revolutionary agent-based solution that transforms how we explore enterprise data in large-scale knowledge graphs. >> Technical Innovation EICopilot leverages Large Language Models to interpret natural language queries and automatically generates Gremlin scripts for enterprise data exploration. The system processes hundreds of millions of nodes and billions of edges in real-time, handling complex enterprise relationships with remarkable precision. Key Technical Components: - Advanced data pre-processing pipeline that builds vector databases of representative queries - Novel query masking strategy that significantly improves intent recognition - Comprehensive reasoning pipeline combining Chain-of-Thought with In-context learning - Named Entity Recognition and Natural Language Processing Customization for precise entity matching - Schema Linking Module for efficient graph database query generation >> Performance Metrics The results are impressive - EICopilot achieves a syntax error rate as low as 10% and execution correctness up to 82.14%. The system handles 5000+ daily active users, demonstrating its robustness in real-world applications. >> Implementation Details The system uses Apache TinkerPop for graph database construction and employs sophisticated disambiguation processes, including anaphora resolution and entity retrieval. The architecture includes both offline and online phases, with continuous learning from user interactions to improve query accuracy. Kudos to the research team from Baidu Inc., South China University of Technology, and other collaborating institutions for this significant advancement in enterprise information retrieval technology.
liked a dataset 9 months ago
froggeric/creativity
View all activity

Organizations

None yet

alt2023's activity

reacted to singhsidhukuldeep's post with ๐Ÿ‘ 4 days ago
view post
Post
2120
Excited to share groundbreaking research from @Baidu_Inc on enterprise information search! The team has developed EICopilot, a revolutionary agent-based solution that transforms how we explore enterprise data in large-scale knowledge graphs.

>> Technical Innovation
EICopilot leverages Large Language Models to interpret natural language queries and automatically generates Gremlin scripts for enterprise data exploration. The system processes hundreds of millions of nodes and billions of edges in real-time, handling complex enterprise relationships with remarkable precision.

Key Technical Components:
- Advanced data pre-processing pipeline that builds vector databases of representative queries
- Novel query masking strategy that significantly improves intent recognition
- Comprehensive reasoning pipeline combining Chain-of-Thought with In-context learning
- Named Entity Recognition and Natural Language Processing Customization for precise entity matching
- Schema Linking Module for efficient graph database query generation

>> Performance Metrics
The results are impressive - EICopilot achieves a syntax error rate as low as 10% and execution correctness up to 82.14%. The system handles 5000+ daily active users, demonstrating its robustness in real-world applications.

>> Implementation Details
The system uses Apache TinkerPop for graph database construction and employs sophisticated disambiguation processes, including anaphora resolution and entity retrieval. The architecture includes both offline and online phases, with continuous learning from user interactions to improve query accuracy.

Kudos to the research team from Baidu Inc., South China University of Technology, and other collaborating institutions for this significant advancement in enterprise information retrieval technology.
  • 1 reply
ยท
reacted to Severian's post with ๐Ÿ‘ 9 months ago
view post
Post
3721
Create and Train Your Own Expert LLM: Generating Synthetic, Fact-Based Datasets with LMStudio/Ollama and then fine-tuning with MLX and Unsloth

Hey everyone!

I know there are tons of videos and tutorials out there already but I've noticed a lot of questions popping up in community posts about using synthetic datasets for creative projects and how to transform personal content into more factual material. In my own work doing enterprise-level SFT and crafting my open-source models, I've enhanced a Python framework originally shared by the creator of the Tess models. This improved stack utilizes local language models and also integrates the Wikipedia dataset to ensure that the content generated is as accurate and reliable as possible.

I've been thinking of putting together a comprehensive, step-by-step course/guide on creating your own Expert Language Model. From dataset preparation and training to deployment on Hugging Face and even using something like AnythingLLM for user interaction. I'll walk you through each phase, clarifying complex concepts and troubleshooting common pitfalls.

Let me know if this interests you!

Most of the datasets and models I've made have been using these scripts and my approach
ยท
upvoted an article 10 months ago
view article
Article

Welcome Llama 3 - Meta's new open LLM

โ€ข 282
reacted to Xenova's post with ๐Ÿ”ฅ 10 months ago
view post
Post
13021
Introducing MusicGen Web: AI-powered music generation directly in your browser, built with ๐Ÿค— Transformers.js! ๐ŸŽต

Everything runs 100% locally, meaning there are no calls to an API! ๐Ÿคฏ Since it's served as a static HF space, it costs $0 to host and run! ๐Ÿ”ฅ

We also added the ability to share your generated music to the discussion tab, so give it a try! ๐Ÿ‘‡
Xenova/musicgen-web
  • 2 replies
ยท