RF 48 Sectors Classification Model
Overview
This machine learning model is a Random Forest classifier designed to categorize datasets into 48 predefined sectors based on column names. By leveraging BERT embeddings and a sophisticated Random Forest algorithm, the model provides intelligent sector classification for various types of datasets.
Model Details
- Model Type: Random Forest Classifier
- Embedding Method: BERT (bert-base-uncased)
- Number of Sectors: 48
- Classification Approach: Column name embedding and prediction
48 Supported Sectors
The model can classify datasets into the following sectors:
Agriculture Sector
- Crop Production
- Livestock Farming
- Agricultural Equipment
- Agri-tech
Banking & Finance Sector
- Retail Banking
- Corporate Banking
- Investment Banking
- Digital Banking
- Asset Management
- Securities & Investments
- Financial Planning & Advice
Construction & Infrastructure
- Residential Construction
- Commercial Construction
- Industrial Construction
- Infrastructure
Consulting Sector
- Management Consulting
- IT Consulting
- Human Resources Consulting
- Legal Consulting
Education Sector
- Early Childhood Education
- Primary & Secondary Education
- Higher Education
- Adult Education & Vocational Training
Engineering Sector
- Civil Engineering
- Mechanical Engineering
- Electrical Engineering
- Chemical Engineering
Entertainment & Media
- Film & Television
- Music Industry
- Video Games
- Live Events
Environmental Sector
- Environmental Protection
- Waste Management
- Renewable Energy
- Wildlife Conservation
Insurance Sector
- General Insurance Services
- Life Insurance
- Health Insurance
- Property & Casualty Insurance
- Reinsurance
Food Industry
- Food Processing
- Food Retail
- Food Services
- Food Safety & Quality Control
Healthcare Sector
- Hospitals
- Clinics & Outpatient Care
- Pharmaceuticals
- Medical Equipment & Supplies
Installation
pip install transformers torch joblib scikit-learn
Usage
from transformers import BertTokenizer, BertModel
import joblib
import torch
# Initialize model
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = BertModel.from_pretrained('bert-base-uncased', ignore_mismatched_sizes=True)
# Download and load the Random Forest model
model_path = hf_hub_download(repo_id="Mageswaran/rf_48_sectors", filename="model_48_sectors.pkl")
label_encoder_path = hf_hub_download(repo_id="Mageswaran/rf_48_sectors", filename="label_encoder_48_sectors.pkl")
rf = joblib.load(model_path)
label_encoder = joblib.load(label_encoder_path)
def predict_sector(column_names):
# Convert column names to BERT embeddings
embeddings = get_bert_embeddings([column_names])
# Predict sector
prediction = rf.predict(embeddings)
return label_encoder.inverse_transform(prediction)[0]
# Example
column_names = "clinical_trail_duration, computer_analysis_score, customer_feedback_score"
sector = predict_sector(column_names)
print(f"Predicted Sector: {sector}")
Model Performance
- Embedding Technique: BERT embeddings from 'bert-base-uncased'
- Classification Algorithm: Random Forest
- Unique Feature: Sector classification based on column name semantics
Limitations
- Model performance depends on the semantic similarity of column names to training data
- Works best with column names that clearly represent the dataset's domain
- Requires careful preprocessing of column names
Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
License and Usage Restrictions
Proprietary Usage Policy
IMPORTANT: This model is NOT freely available for unrestricted use.
Usage Restrictions
- Prior written permission is REQUIRED before using this model
- Commercial use is strictly prohibited without explicit authorization
- Academic or research use requires formal permission from the model's creator
- Unauthorized use, distribution, or reproduction is prohibited
Licensing Terms
- This model is protected under proprietary intellectual property rights
- Any use of the model requires a formal licensing agreement
- Contact the model's creator for licensing inquiries and permissions
Permissions and Inquiries
To request permission for model usage, please contact:
- Email: [Your Contact Email]
- Hugging Face Profile: [Your Hugging Face Profile URL]
Unauthorized use will result in legal action.
Contact
Citing this Model
If you use this model in your research, please cite it using the following BibTeX entry:
@misc{mageswaran_rf_48_sectors,
title = {Random Forest 48 Sectors Classification Model},
author = {Mageswaran},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Mageswaran/rf_48_sectors}}
}
Additional Resources
Acknowledgments
- Hugging Face Transformers