Model Card for RoBERTa-OrgCulture-Classifier
Fischer et al. (2014) showed that organizational practices are best measured in three dimensions: employee orientation, formalization practices, and innovation practices.
Employee orientation assesses the balance between employees' interests and the organization's. Formalization practices are based on balancing employees' independence to organize their work and the need for control and centralization. Innovation practices assess the balance between stability and change.
This model is a RoBERTa-based multi-label classifier for identifying organizational practices in text. It predicts the salience of these three types of organizational practices.
Paper
Fischer, R., Ferreira, M. C., Assmar, E. M. L., Baris, G., Berberoglu, G., Dalyan, F., Wong, C. C., Hassan, A., Hanke, K., & Boer, D. (2014). Organizational practices across cultures: An exploration in six cultural contexts. International Journal of Cross Cultural Management, 14(1), 105-125. https://doi.org/10.1177/1470595813510644
Model Details
Model Description
- Developed by: M. Murat Ardag
- Shared via: Hugging Face
- Model type: Multi-label Text Classification
- Language(s) (NLP): English
- License: GPL-3.0
- Finetuned from model: roberta-base
Uses
Direct Use
The model can be used to analyze text data (e.g., company reviews, internal documents, company mission and vision statements) and identify the types of organizational practices mentioned.
Downstream Use [optional]
This model could be integrated into larger HR analytics or organizational culture assessment tools.
Out-of-Scope Use
The model is not designed for sentiment analysis, topic modeling, or other NLP tasks outside of multi-label classification of organizational practices.
This model should not be used for:
- Classifying text in languages other than English
- Making decisions about individuals or organizations without human oversight
Bias, Risks, and Limitations
The model's performance may vary across different industries, company sizes, and cultural contexts. It may also be sensitive to the specific wording used in the text. Additionally, the model could perpetuate biases present in the training data.
Recommendations
Users should exercise caution when interpreting the model's predictions and consider the potential biases and limitations. It is recommended to use the model as one tool in a broader assessment of organizational culture, alongside other qualitative and quantitative methods.
How to Get Started with the Model
Example usage
import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import json
# Load the model, tokenizer, and configuration
model_path = "MMADS/RoBERTa-OrgCulture-Classifier"
model = RobertaForSequenceClassification.from_pretrained(model_path)
tokenizer = RobertaTokenizer.from_pretrained(model_path)
# Load label names
with open(f"{model_path}/label_names.json", 'r') as f:
label_names = json.load(f)
# Function to make predictions
def predict(text):
# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
# Apply sigmoid to get probabilities
probabilities = torch.sigmoid(outputs.logits).squeeze().numpy()
# Get predictions (1 if probability > 0.5, else 0)
predictions = (probabilities > 0.5).astype(int)
# Create a dictionary of label predictions
result = {label: pred for label, pred in zip(label_names, predictions)}
return result, probabilities
# Example usage
text_to_predict = "Testing model predictions for organizational practices."
prediction, probabilities = predict(text_to_predict)
print("Predictions:")
for label, pred in prediction.items():
print(f"{label}: {'Yes' if pred == 1 else 'No'}")
print("\nProbabilities:")
for label, prob in zip(label_names, probabilities):
print(f"{label}: {prob:.4f}")
Training Details
Training Data
The model was trained on sentences labeled with three types of organizational practices (employee orientation, formalization practices, and innovation practices). The data was preprocessed to remove missing values and convert text to strings.
The data is a subset of >1.3M sentences from employee reviews and >16K sentences from company mission and vision statements.
Training Procedure
Preprocessing
- Sentences were tokenized using the RoBERTa tokenizer
- Texts were truncated and padded to a fixed length
Training Hyperparameters
- epochs: 10
- batch_size: 8
- warmup_steps: 500
- weight_decay: 0.1
- learning_rate: Not specified (using default AdamW optimizer)
- label_smoothing: 0.1
Evaluation
Testing Data, Factors & Metrics
The model was evaluated on a held-out test set (20% of the original data) using the following metrics:
- Accuracy
- F1-score
- Precision
- Recall
Results
- Accuracy: 0.98
- F1-score: 0.97
- Precision: 0.98
- Recall: 0.97
Environmental Impact
Environmental Impact
Minimal Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: Google Colab GPU
- Hours used: 8
- Cloud Provider: Google
- Compute Region: South Carolina
Model Card Authors
M. Murat Ardag
Model Card Contact
via my personal website. thx
Citation
If you use this model in your research or applications, please cite it as follows:
Ardag, M.M. (2024) RoBERTa-OrgCulture-Classifier (Revision 94b6fdd). HuggingFace. https://doi.org/10.57967/hf/2774
- Downloads last month
- 3