metadata

title: KITT
emoji: 🦀
colorFrom: red
colorTo: gray
sdk: gradio
sdk_version: 4.36.1
app_file: space.py
pinned: false
license: mit

KITT: Knowledge-based Intelligence for Transportation Technologies

Presented at IEEE VNC 2024 (Vehicle Networking Conference) as a demo titled "Demo: Towards a Conversational LLM-Based Voice Assistant for Transportation Applications"

Abstract

Conversational assistants based on large language models (LLMs) have spread widely across many domains, and the automotive industry is keen to follow suit.
However, current LLMs lack sufficient understanding of geospatial data; in addition, timely information, such as weather and traffic conditions, is inaccessible to LLMs. In this demo, we present an in-car assistant capable of verbally communicating with the driver, and by utilizing external APIs, it can answer questions related to routing, finding points of interest, and is aware of the local weather and traffic conditions. The assistant, including a customizable speech synthesizer, is accessible through a graphical user interface that facilitates experimentation by simulating the change in time, origin, destination, and location of the car.

Description

This project integrates speech-to-text and text-to-speech functionalities into a car's infotainment system, using the LLaMA 3 model to process and respond to vocal queries from users. It employs Gradio for user interface creation, NexusRaven for function calling, and integrates various APIs to fetch real-time information, making it a comprehensive solution for creating a responsive and interactive car assistant.

Features

• Speech-to-Text and Text-to-Speech: Enables the car assistant to listen to spoken questions and respond audibly, providing a hands-free experience for drivers and passengers.
• Intelligent Function Calling with NexusRaven: Implements a sophisticated system for executing commands and retrieving information based on user queries, using the LLaMA 3 model's capabilities.
• Dynamic Model Integration: Incorporates multiple models for language recognition, speech processing, and text generation.
• User-Friendly Gradio Interface: easy-to-use interface for testing and deploying the speaking assistant within the car's infotainment system.
• Real-Time Information Retrieval: Capable of integrating with various APIs to provide up-to-date information on weather, routes, points of interest, and more.

Authors

Sasan Jafarnejad Abigail Berthe--Pardo

License

KITT is released under the MIT License.