Text Classification
Transformers
Joblib
Safetensors
multilingual
binary-classification
amis
agriculture
Instructions to use faodl/agri-trade-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use faodl/agri-trade-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="faodl/agri-trade-classifier")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("faodl/agri-trade-classifier", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- AMIS Commodity Classifier
- Dataset Summary
- Threshold Comparison on Test Split
- Confusion Matrices on Test Split
- logistic_tfidf at threshold 0.500
- logistic_tfidf at threshold 0.396
- xgboost_tfidf at threshold 0.500
- xgboost_tfidf at threshold 0.305
- embedding-logistic_sentence_embeddings at threshold 0.500
- embedding-logistic_sentence_embeddings at threshold 0.315
- embedding-svm_sentence_embeddings at threshold 0.500
- embedding-svm_sentence_embeddings at threshold 0.453
- embedding-lightgbm_sentence_embeddings at threshold 0.500
- embedding-lightgbm_sentence_embeddings at threshold 0.148
- transformer at threshold 0.500
- transformer at threshold 0.383
- Validation-Tuned Thresholds
- Artifacts
- Inference
- Files
- Dataset Summary
AMIS Commodity Classifier
This model repository contains artifacts from an AMIS commodity relevance classifier training run. It includes the Transformer model, any configured TF-IDF or sentence-embedding baselines, prediction files, and the training report.
- Dataset:
faodl/amis-agri-trade-pri-sec - Dataset subset: ``
- Text column:
chunk_text - Label column:
label - Transformer:
FacebookAI/xlm-roberta-base - Generated at:
2026-05-18T17:47:01.228362+00:00
Dataset Summary
| Split | Rows | Label 0 | Label 1 | Unique groups | Mean text length |
|---|---|---|---|---|---|
| train | 4799 | 2363 | 2436 | 2483 | 695.5 |
| validation | 1009 | 462 | 547 | 532 | 698.1 |
| test | 1017 | 529 | 488 | 533 | 694.6 |
Threshold Comparison on Test Split
| Model | Threshold | Accuracy | Precision | Recall | F1 | ROC AUC | Average precision |
|---|---|---|---|---|---|---|---|
| logistic_tfidf | 0.500 | 0.738 | 0.736 | 0.709 | 0.722 | 0.838 | 0.815 |
| logistic_tfidf | 0.396 | 0.744 | 0.674 | 0.904 | 0.772 | 0.838 | 0.815 |
| xgboost_tfidf | 0.500 | 0.762 | 0.786 | 0.693 | 0.736 | 0.847 | 0.816 |
| xgboost_tfidf | 0.305 | 0.752 | 0.685 | 0.895 | 0.776 | 0.847 | 0.816 |
| embedding-logistic_sentence_embeddings | 0.500 | 0.790 | 0.750 | 0.842 | 0.793 | 0.881 | 0.863 |
| embedding-logistic_sentence_embeddings | 0.315 | 0.771 | 0.698 | 0.922 | 0.794 | 0.881 | 0.863 |
| embedding-svm_sentence_embeddings | 0.500 | 0.788 | 0.742 | 0.855 | 0.794 | 0.883 | 0.865 |
| embedding-svm_sentence_embeddings | 0.453 | 0.796 | 0.735 | 0.900 | 0.809 | 0.883 | 0.865 |
| embedding-lightgbm_sentence_embeddings | 0.500 | 0.782 | 0.744 | 0.832 | 0.785 | 0.880 | 0.867 |
| embedding-lightgbm_sentence_embeddings | 0.148 | 0.759 | 0.685 | 0.922 | 0.786 | 0.880 | 0.867 |
| transformer | 0.500 | 0.837 | 0.786 | 0.906 | 0.842 | 0.919 | 0.913 |
| transformer | 0.383 | 0.837 | 0.771 | 0.939 | 0.847 | 0.919 | 0.913 |
Confusion Matrices on Test Split
Rows are true labels and columns are predicted labels.
logistic_tfidf at threshold 0.500
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 405 | 124 |
| RELEVANT | 142 | 346 |
logistic_tfidf at threshold 0.396
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 316 | 213 |
| RELEVANT | 47 | 441 |
xgboost_tfidf at threshold 0.500
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 437 | 92 |
| RELEVANT | 150 | 338 |
xgboost_tfidf at threshold 0.305
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 328 | 201 |
| RELEVANT | 51 | 437 |
embedding-logistic_sentence_embeddings at threshold 0.500
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 392 | 137 |
| RELEVANT | 77 | 411 |
embedding-logistic_sentence_embeddings at threshold 0.315
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 334 | 195 |
| RELEVANT | 38 | 450 |
embedding-svm_sentence_embeddings at threshold 0.500
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 384 | 145 |
| RELEVANT | 71 | 417 |
embedding-svm_sentence_embeddings at threshold 0.453
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 371 | 158 |
| RELEVANT | 49 | 439 |
embedding-lightgbm_sentence_embeddings at threshold 0.500
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 389 | 140 |
| RELEVANT | 82 | 406 |
embedding-lightgbm_sentence_embeddings at threshold 0.148
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 322 | 207 |
| RELEVANT | 38 | 450 |
transformer at threshold 0.500
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 409 | 120 |
| RELEVANT | 46 | 442 |
transformer at threshold 0.383
| True / Predicted | NOT_RELEVANT | RELEVANT |
|---|---|---|
| NOT_RELEVANT | 393 | 136 |
| RELEVANT | 30 | 458 |
Validation-Tuned Thresholds
logistic_tfidf: threshold0.396(validation F10.811); test F1 change vs 0.5:+0.050.xgboost_tfidf: threshold0.305(validation F10.813); test F1 change vs 0.5:+0.040.embedding-logistic_sentence_embeddings: threshold0.315(validation F10.859); test F1 change vs 0.5:+0.001.embedding-svm_sentence_embeddings: threshold0.453(validation F10.861); test F1 change vs 0.5:+0.015.embedding-lightgbm_sentence_embeddings: threshold0.148(validation F10.866); test F1 change vs 0.5:+0.001.transformer: threshold0.383(validation F10.874); test F1 change vs 0.5:+0.005.
Artifacts
logistic_tfidf:/content/agri-trade-classifier/baselines/logisticxgboost_tfidf:/content/agri-trade-classifier/baselines/xgboostembedding-logistic_sentence_embeddings:/content/agri-trade-classifier/baselines/embedding-logisticembedding-svm_sentence_embeddings:/content/agri-trade-classifier/baselines/embedding-svmembedding-lightgbm_sentence_embeddings:/content/agri-trade-classifier/baselines/embedding-lightgbmtransformer:/content/agri-trade-classifier/transformer
Inference
Install the runtime dependencies:
pip install transformers torch huggingface_hub pandas joblib scikit-learn xgboost sentence-transformers lightgbm
Transformer
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
MODEL_ID = "faodl/agri-trade-classifier"
texts = [
"Rice export prices increased after new procurement rules were announced.",
"The finance ministry released its monthly fuel tax bulletin.",
]
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, subfolder="transformer")
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, subfolder="transformer")
threshold = float(getattr(model.config, "threshold", 0.5))
encoded = tokenizer(
texts,
truncation=True,
padding=True,
max_length=256,
return_tensors="pt",
)
with torch.no_grad():
logits = model(**encoded).logits
probabilities = torch.softmax(logits, dim=-1)[:, 1].tolist()
for text, probability in zip(texts, probabilities):
label = model.config.id2label[int(probability >= threshold)]
print({"text": text, "probability_positive": probability, "label": label})
TF-IDF Baselines
Available baseline names in this run: "logistic", "xgboost".
import json
import joblib
from huggingface_hub import hf_hub_download
MODEL_ID = "faodl/agri-trade-classifier"
BASELINE = "logistic"
texts = [
"Maize production forecasts were revised after delayed rains.",
"The central bank published new exchange rate statistics.",
]
model_path = hf_hub_download(
repo_id=MODEL_ID,
repo_type="model",
filename=f"baselines/{BASELINE}/{BASELINE}_tfidf.joblib",
)
report_path = hf_hub_download(
repo_id=MODEL_ID,
repo_type="model",
filename="report.json",
)
pipeline = joblib.load(model_path)
with open(report_path, encoding="utf-8") as handle:
report = json.load(handle)
threshold = next(
result["validation_best_threshold"]["threshold"]
for result in report["results"]
if result["model_type"] == f"{BASELINE}_tfidf"
)
probabilities = pipeline.predict_proba(texts)[:, 1]
for text, probability in zip(texts, probabilities):
label = "RELEVANT" if probability >= threshold else "NOT_RELEVANT"
print({"text": text, "probability_positive": float(probability), "label": label})
Sentence-Embedding Baselines
Available embedding baseline names in this run: "embedding-logistic", "embedding-svm", "embedding-lightgbm".
import joblib
from huggingface_hub import hf_hub_download
from sentence_transformers import SentenceTransformer
MODEL_ID = "faodl/agri-trade-classifier"
BASELINE = "embedding-logistic"
texts = [
"Wheat export inspections rose as demand from importers increased.",
"The sports ministry announced a new stadium renovation plan.",
]
model_path = hf_hub_download(
repo_id=MODEL_ID,
repo_type="model",
filename=f"baselines/{BASELINE}/{BASELINE}.joblib",
)
artifact = joblib.load(model_path)
embedding_model = SentenceTransformer(artifact["embedding_model_name"])
embeddings = embedding_model.encode(
texts,
batch_size=artifact.get("embedding_batch_size", 64),
convert_to_numpy=True,
normalize_embeddings=artifact.get("normalize_embeddings", True),
)
probabilities = artifact["classifier"].predict_proba(embeddings)[:, 1]
threshold = artifact["validation_best_threshold"]["threshold"]
for text, probability in zip(texts, probabilities):
label = "RELEVANT" if probability >= threshold else "NOT_RELEVANT"
print({"text": text, "probability_positive": float(probability), "label": label})
Files
REPORT.md: Markdown report for this training run.report.json: Machine-readable report containing metrics and thresholds.transformer/: Fine-tuned Transformer artifacts, when Transformer training is enabled.baselines/: TF-IDF and sentence-embedding baseline artifacts, when baseline training is enabled.*/validation_predictions.csvand*/test_predictions.csv: Split-level predictions.
Model tree for faodl/agri-trade-classifier
Base model
FacebookAI/xlm-roberta-base