Legal-embedding-v1 (Vietnamese Legal Domain)

This model is a parameter-efficient fine-tuned (PEFT) version of Qwen/Qwen3-Embedding-8B specifically adapted for the Vietnamese Legal Domain. It uses LoRA (Low-Rank Adaptation) to capture the nuances of legal terminology and semantics in Vietnamese statutory documents.

Model Details

Model Description

  • Model type: Large Language Model based Embedding (PEFT/LoRA)
  • Language(s) (NLP): Vietnamese (vi)
  • Finetuned from model: Qwen/Qwen3-Embedding-8B
  • Domain: Law / Legal Systems of Vietnam

Model Sources

Uses

Direct Use

  • Semantic Search: Searching for relevant legal articles based on natural language queries.
  • RAG (Retrieval-Augmented Generation): Serving as the retrieval component for legal chatbots or AI assistants.
  • Legal Document Clustering: Grouping similar court cases or regulatory documents.

Out-of-Scope Use

  • General-purpose English text embedding (not optimized).
  • Direct text generation (this is an embedding model, not a chat model).

How to Get Started with the Model

Bạn có thể sử dụng model này với thư viện transformerspeft theo cấu trúc sau:

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Đường dẫn repo
model_id = "ngovanphuoc2006/Legal-embedding"

# Load cấu hình và model
config = PeftConfig.from_pretrained(model_id)
base_model = AutoModel.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Merge Adapter
model = PeftModel.from_pretrained(base_model, model_id)

# Ví dụ sử dụng
sentences = ["Quy định về tội giết người", "Các hình phạt đối với hành vi cố ý gây thương tích"]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    # Lấy embedding từ Last Hidden State (thường là CLS token hoặc mean pooling)
    embeddings = outputs.last_hidden_state[:, 0, :]
Downloads last month
55
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ngovanphuoc2006/Legal-embedding

Adapter
(7)
this model