---
title: Language Identification
emoji: 🔥
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 3.44.4
app_file: app.py
pinned: false
---

This repository contains the code for audio transcription and language identification. Both tasks are connected in one pipeline with two models stacked on top of another: 
* Roberta (https://huggingface.co/dominguesm/xlm-roberta-base-lora-language-detection)  — Language Detection
* Whisper (https://huggingface.co/openai/whisper-large) — Transcription

Common-Language dataset (https://huggingface.co/datasets/common_language) was used for both tasks.

References to the specific code are included in the main app.py file.


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference