Text_Language_Detector / code_explanation.md
balaji4991512's picture
Create code_explanation.md
4096757 verified

A newer version of the Gradio SDK is available: 5.29.1

Upgrade

πŸ“– Code Explanation: Text Language Detector

This document explains the Text Language Detector app, detailing each part of the provided code and intended use cases.


πŸ“ Overview

Purpose
Detect the language of any input text and return the full language name, ISO code, and confidence score.

Tech Stack

  • Model: papluca/xlm-roberta-base-language-detection (Hugging Face Transformers)
  • Model Precision: torch_dtype=torch.bfloat16 for reduced memory usage
  • Language Mapping: pycountry to convert ISO codes to full language names
  • Interface: Gradio Blocks + Buttons

βš™οΈ Setup & Dependencies

Install required libraries:

pip install transformers gradio torch pycountry

πŸ” Detailed Block-by-Block Code Explanation

import torch
import gradio as gr
from transformers import pipeline
import pycountry

# Load the language-detection pipeline with bfloat16 precision
language_detector = pipeline(
    "text-classification",
    model="papluca/xlm-roberta-base-language-detection",
    torch_dtype=torch.bfloat16
)

def detect_language(text: str) -> str:
    result = language_detector(text)[0]
    code = result["label"]      # e.g. "en", "ta", "fr"
    score = result["score"]

    # Map ISO code to full language name using pycountry
    try:
        lang = pycountry.languages.get(alpha_2=code).name
    except:
        lang = code.upper()

    return f"{lang} ({code}) β€” {score:.2f}"

# Build the Gradio interface
with gr.Blocks(theme=gr.themes.Default()) as demo:
    gr.Markdown("## 🌐 Text Language Detector")
    gr.Markdown("Type or paste text below to detect its language (name + code + confidence).")

    with gr.Row():
        text_input = gr.Textbox(label="πŸ“ Input Text", placeholder="Type or paste text here...", lines=4)
        lang_output = gr.Textbox(label="βœ… Detected Language", placeholder="Language & confidence", lines=1, interactive=False)

    detect_btn = gr.Button("πŸ” Detect Language")
    detect_btn.click(fn=detect_language, inputs=text_input, outputs=lang_output)

    gr.Markdown("---")
    gr.Markdown("Built with πŸ€— Transformers (`papluca/xlm-roberta-base-language-detection`), `pycountry`, and πŸš€ Gradio")

demo.launch()

πŸš€ Core Concepts

Concept Why It Matters
Hugging Face Pipeline One-line model loading & inference
bfloat16 Precision Lower memory usage, faster inference on supported HW
pycountry Mapping Converts ISO codes to human-readable language names
Gradio Blocks Builds interactive web apps with pure Python

πŸ”„ Intended Uses & Limitations

You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:

  • Arabic (ar)
  • Bulgarian (bg)
  • German (de)
  • Modern Greek (el)
  • English (en)
  • Spanish (es)
  • French (fr)
  • Hindi (hi)
  • Italian (it)
  • Japanese (ja)
  • Dutch (nl)
  • Polish (pl)
  • Portuguese (pt)
  • Russian (ru)
  • Swahili (sw)
  • Thai (th)
  • Turkish (tr)
  • Urdu (ur)
  • Vietnamese (vi)
  • Chinese (zh)