Legal-embedding-v1 (Vietnamese Legal Domain)

This model is a parameter-efficient fine-tuned (PEFT) version of Qwen/Qwen3-Embedding-8B specifically adapted for the Vietnamese Legal Domain. It uses LoRA (Low-Rank Adaptation) to capture the nuances of legal terminology and semantics in Vietnamese statutory documents.

Model Details

Model Description

Model type: Large Language Model based Embedding (PEFT/LoRA)
Language(s) (NLP): Vietnamese (vi)
Finetuned from model: Qwen/Qwen3-Embedding-8B
Domain: Law / Legal Systems of Vietnam

Model Sources

Repository: https://huggingface.co/ngovanphuoc2006/Legal-embedding
Base Model Architecture: Qwen 3 (8B)

Uses

Direct Use

Semantic Search: Searching for relevant legal articles based on natural language queries.
RAG (Retrieval-Augmented Generation): Serving as the retrieval component for legal chatbots or AI assistants.
Legal Document Clustering: Grouping similar court cases or regulatory documents.

Out-of-Scope Use

General-purpose English text embedding (not optimized).
Direct text generation (this is an embedding model, not a chat model).

How to Get Started with the Model

Bạn có thể sử dụng model này với thư viện transformers và peft theo cấu trúc sau:

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

# Đường dẫn repo
model_id = "ngovanphuoc2006/Legal-embedding"

# Load cấu hình và model
config = PeftConfig.from_pretrained(model_id)
base_model = AutoModel.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Merge Adapter
model = PeftModel.from_pretrained(base_model, model_id)

# Ví dụ sử dụng
sentences = ["Quy định về tội giết người", "Các hình phạt đối với hành vi cố ý gây thương tích"]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    # Lấy embedding từ Last Hidden State (thường là CLS token hoặc mean pooling)
    embeddings = outputs.last_hidden_state[:, 0, :]

Downloads last month: 55

Model tree for ngovanphuoc2006/Legal-embedding

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Embedding-8B

Adapter

(7)

this model