Feature Extraction
PEFT
Safetensors
sentence-transformers
Vietnamese
legal
vietnamese
sentence-similarity
lo-ra
Instructions to use ngovanphuoc2006/Legal-embedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ngovanphuoc2006/Legal-embedding with PEFT:
Task type is invalid.
- sentence-transformers
How to use ngovanphuoc2006/Legal-embedding with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ngovanphuoc2006/Legal-embedding") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Legal-embedding-v1 (Vietnamese Legal Domain)
This model is a parameter-efficient fine-tuned (PEFT) version of Qwen/Qwen3-Embedding-8B specifically adapted for the Vietnamese Legal Domain. It uses LoRA (Low-Rank Adaptation) to capture the nuances of legal terminology and semantics in Vietnamese statutory documents.
Model Details
Model Description
- Model type: Large Language Model based Embedding (PEFT/LoRA)
- Language(s) (NLP): Vietnamese (vi)
- Finetuned from model: Qwen/Qwen3-Embedding-8B
- Domain: Law / Legal Systems of Vietnam
Model Sources
- Repository: https://huggingface.co/ngovanphuoc2006/Legal-embedding
- Base Model Architecture: Qwen 3 (8B)
Uses
Direct Use
- Semantic Search: Searching for relevant legal articles based on natural language queries.
- RAG (Retrieval-Augmented Generation): Serving as the retrieval component for legal chatbots or AI assistants.
- Legal Document Clustering: Grouping similar court cases or regulatory documents.
Out-of-Scope Use
- General-purpose English text embedding (not optimized).
- Direct text generation (this is an embedding model, not a chat model).
How to Get Started with the Model
Bạn có thể sử dụng model này với thư viện transformers và peft theo cấu trúc sau:
from transformers import AutoModel, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch
# Đường dẫn repo
model_id = "ngovanphuoc2006/Legal-embedding"
# Load cấu hình và model
config = PeftConfig.from_pretrained(model_id)
base_model = AutoModel.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Merge Adapter
model = PeftModel.from_pretrained(base_model, model_id)
# Ví dụ sử dụng
sentences = ["Quy định về tội giết người", "Các hình phạt đối với hành vi cố ý gây thương tích"]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Lấy embedding từ Last Hidden State (thường là CLS token hoặc mean pooling)
embeddings = outputs.last_hidden_state[:, 0, :]
- Downloads last month
- 55