Spaces:
Sleeping
Sorbobot: Expert Finder Chatbot Documentation
Overview
Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.
Context
Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.
System Architecture
Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:
- Retrieval: Identifies publications most similar to the user queries.
- Generation: Produces responses based on the context extracted from relevant publications.
Implementation Details
Programming Language and Libraries
- Language: Python
- Frontend: Streamlit
- Database: PostgreSQL with pgvector for similarity search
- NLP Processing: langchain and GPT4all libraries
Database
- Postgres with pgvector: Used for storing data and performing similarity searches based on cosine similarity metrics.
Natural Language Processing
- Abstracts as Data Source: The chatbot utilizes publication abstracts to identify experts.
- GPT4all for Word Embedding: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.
Retrieval Process
- Query Processing: User queries are processed to extract key terms.
- Similarity Search: The system searches the database using pgvector to find publications with low cosine distance to the query.
- Expert Identification: The system identifies authors of these publications, ensuring unique identification of experts.
Generation Process
- Context Extraction: Relevant information is extracted from the identified publications.
- Response Generation: Utilizes a LLM to generate informative responses based on the extracted context.
User Interaction Flow
- Query Submission: Users submit queries related to their expert search.
- Chatbot Processing: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
- Response Presentation: The system presents a list of experts, including unique identifiers and relevant publication abstracts.
Conclusion
Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.