SorboBot / docs /sorbobot.md
leo-bourrel's picture
!feat: Import new sorbobot version
68a9b68

Sorbobot: Expert Finder Chatbot Documentation

Overview

Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot.

Context

Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts.

System Architecture

Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps:

  1. Retrieval: Identifies publications most similar to the user queries.
  2. Generation: Produces responses based on the context extracted from relevant publications.

Implementation Details

Programming Language and Libraries

  • Language: Python
  • Frontend: Streamlit
  • Database: PostgreSQL with pgvector for similarity search
  • NLP Processing: langchain and GPT4all libraries

Database

  • Postgres with pgvector: Used for storing data and performing similarity searches based on cosine similarity metrics.

Natural Language Processing

  • Abstracts as Data Source: The chatbot utilizes publication abstracts to identify experts.
  • GPT4all for Word Embedding: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification.

Retrieval Process

  1. Query Processing: User queries are processed to extract key terms.
  2. Similarity Search: The system searches the database using pgvector to find publications with low cosine distance to the query.
  3. Expert Identification: The system identifies authors of these publications, ensuring unique identification of experts.

Generation Process

  1. Context Extraction: Relevant information is extracted from the identified publications.
  2. Response Generation: Utilizes a LLM to generate informative responses based on the extracted context.

User Interaction Flow

  1. Query Submission: Users submit queries related to their expert search.
  2. Chatbot Processing: Sorbobot processes the query, retrieves relevant publications, and identifies experts.
  3. Response Presentation: The system presents a list of experts, including unique identifiers and relevant publication abstracts.

Conclusion

Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.