--- title: Diabetes Assistant emoji: 😻 colorFrom: indigo colorTo: green sdk: gradio sdk_version: 4.26.0 app_file: app.py pinned: false license: cc-by-4.0 short_description: Multi-lingual Diabetes chatbot. Responses in text and audio. --- # Project Title: Diabetes Assistant ## Objective The objective of this project was to showcase our individual learnings about large language models, translation application, chatbot, gradio and hugging face. ## Sources - ChatGPT - Copilot - Hugging Face - Gradio - OpenAI Whisper (https://openai.com/research/whisper) - Langchain (https://www.langchain.com/) - Amazon Polly (https://docs.aws.amazon.com/polly/latest/dg/what-is.html) - Helsinki-NLP/opus-mt models (https://huggingface.co/Helsinki-NLP) ## Citations This project utilizes models from the OPUS-MT project. We thank Jörg Tiedemann and Santhosh Thottingal for their work: - Tiedemann, J., & Thottingal, S. (2020). OPUS-MT – Building open translation services for the World. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (pp. 479–480). European Association for Machine Translation. [https://aclanthology.org/2020.eamt-1.61](https://aclanthology.org/2020.eamt-1.61) - Tiedemann, J. (2020). The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT. In Proceedings of the Fifth Conference on Machine Translation (pp. 1174–1182). Association for Computational Linguistics. [https://aclanthology.org/2020.wmt-1.139](https://aclanthology.org/2020.wmt-1.139) ## Method L3-AI Created an assistant to ask your diabetes questions and when needed translate responses to an alternate language. 1. Transcription: Individuals could either voice their questions by hitting the microphone, upload an mp3 of their question, or write their diabetes related questions within the Hugging Face Application. For questions that were either voice activated or mp3 uploaded we used openai/whisper-large to transcribe the audio into written format. 2. LLM Model: Using WikipediaLoader, we created a large language model that tapped into Wikipedia specifically grabbing information related to the diabetes question. 3. Chatbot Response and Voice Over: L3-AI added a feature that allowed our Hugging Face Application to verbalize the response from the LLM as well as provide responses in written format. We used Amazon Polly, to provide written text to speech. 4. Translation: Helsinki-NLP was used to translate the information provided from the LLM. 5. Gradio: L3-AI used the gradio application to organize and produce each level and response of the four different models utilized. 6. Hugging Face: Finally, L3-AI pushed all information to Hugging Face Application for speed as well as production. ## Interface https://huggingface.co/spaces/L3-AI/diabetes_assistant ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604cd9fda664781b225e0b6/jP1tHn0iF6NVWChSTxcr9.png) ## Learnings Natural Language Processing (NLP): - Gained insights into NLP techniques and methodologies used for building our conversational agent. - Learned about tokenization, language modeling, and how to improve speed within our chatbot development. Model Selection and Evaluation: - Evaluate different language models such as LLM and Polly for their performance in generating human-like responses. - Compare model capabilities, including coherence, fluency, and ability to stay on topic. - Understand the strengths and limitations of each model in different conversational contexts. Fine-tuning: - Address issues such as speed and translation accuracy by fine-tuning model parameters and configurations. - Implement strategies to mitigate challenges such as text truncation and limited language support to enhance overall user experience. - Iterate on model architecture, hyperparameters, and data preprocessing techniques to achieve desired outcomes and user satisfaction. Hugging Face: - Emphasize the necessity of creating a comprehensive requirements document outlining dependencies, libraries, and configurations required for Hugging Face model integration. - Avoid reliance on Jupyter notebooks for production-level deployment due to limitations in scalability, version control, and reproducibility. Streamlit VS Gradio: - Recognized Streamlit's appeal for deployment purposes, particularly for its visually appealing characteristics and user interface elements. - However, prioritized Gradio for deployment due to its compatibility with the core functionality and focus of our model, prioritizing model performance and functionality over visualization aesthetics. ## Opportunities and Next Steps For L3-AI concept design we centered on diabetes however, we thought in future endeavors expanding to other disease states would enhance the work that was started. The source material limited which sources we could pull from due to API restrictions. ## Credits We would like to thank our pets who kept us company as we worked on coding and this application.