Model Card for RoBERTa-OrgCulture-Classifier

Fischer et al. (2014) showed that organizational practices are best measured in three dimensions: employee orientation, formalization practices, and innovation practices.

Employee orientation assesses the balance between employees' interests and the organization's. Formalization practices are based on balancing employees' independence to organize their work and the need for control and centralization. Innovation practices assess the balance between stability and change.

This model is a RoBERTa-based multi-label classifier for identifying organizational practices in text. It predicts the salience of these three types of organizational practices.

Paper

Fischer, R., Ferreira, M. C., Assmar, E. M. L., Baris, G., Berberoglu, G., Dalyan, F., Wong, C. C., Hassan, A., Hanke, K., & Boer, D. (2014). Organizational practices across cultures: An exploration in six cultural contexts. International Journal of Cross Cultural Management, 14(1), 105-125. https://doi.org/10.1177/1470595813510644

Model Details

Model Description

Developed by: M. Murat Ardag
Shared via: Hugging Face
Model type: Multi-label Text Classification
Language(s) (NLP): English
License: GPL-3.0
Finetuned from model: roberta-base

Uses

Direct Use

The model can be used to analyze text data (e.g., company reviews, internal documents, company mission and vision statements) and identify the types of organizational practices mentioned.

Downstream Use [optional]

This model could be integrated into larger HR analytics or organizational culture assessment tools.

Out-of-Scope Use

The model is not designed for sentiment analysis, topic modeling, or other NLP tasks outside of multi-label classification of organizational practices.

This model should not be used for:

Classifying text in languages other than English
Making decisions about individuals or organizations without human oversight

Bias, Risks, and Limitations

The model's performance may vary across different industries, company sizes, and cultural contexts. It may also be sensitive to the specific wording used in the text. Additionally, the model could perpetuate biases present in the training data.

Recommendations

Users should exercise caution when interpreting the model's predictions and consider the potential biases and limitations. It is recommended to use the model as one tool in a broader assessment of organizational culture, alongside other qualitative and quantitative methods.

How to Get Started with the Model

Example usage

import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import json

# Load the model, tokenizer, and configuration
model_path = "MMADS/RoBERTa-OrgCulture-Classifier"
model = RobertaForSequenceClassification.from_pretrained(model_path)
tokenizer = RobertaTokenizer.from_pretrained(model_path)

# Load label names
with open(f"{model_path}/label_names.json", 'r') as f:
    label_names = json.load(f)

# Function to make predictions
def predict(text):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    
    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Apply sigmoid to get probabilities
    probabilities = torch.sigmoid(outputs.logits).squeeze().numpy()
    
    # Get predictions (1 if probability > 0.5, else 0)
    predictions = (probabilities > 0.5).astype(int)
    
    # Create a dictionary of label predictions
    result = {label: pred for label, pred in zip(label_names, predictions)}
    
    return result, probabilities

# Example usage
text_to_predict = "Testing model predictions for organizational practices."
prediction, probabilities = predict(text_to_predict)

print("Predictions:")
for label, pred in prediction.items():
    print(f"{label}: {'Yes' if pred == 1 else 'No'}")

print("\nProbabilities:")
for label, prob in zip(label_names, probabilities):
    print(f"{label}: {prob:.4f}")

Training Details

Training Data

The model was trained on sentences labeled with three types of organizational practices (employee orientation, formalization practices, and innovation practices). The data was preprocessed to remove missing values and convert text to strings.

The data is a subset of >1.3M sentences from employee reviews and >16K sentences from company mission and vision statements.

Training Procedure

Preprocessing

Sentences were tokenized using the RoBERTa tokenizer
Texts were truncated and padded to a fixed length

Training Hyperparameters

epochs: 10
batch_size: 8
warmup_steps: 500
weight_decay: 0.1
learning_rate: Not specified (using default AdamW optimizer)
label_smoothing: 0.1

Evaluation

Testing Data, Factors & Metrics

The model was evaluated on a held-out test set (20% of the original data) using the following metrics:

Accuracy
F1-score
Precision
Recall

Results

Accuracy: 0.98
F1-score: 0.97
Precision: 0.98
Recall: 0.97

Environmental Impact

Minimal Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: Google Colab GPU
Hours used: 8
Cloud Provider: Google
Compute Region: South Carolina

Model Card Authors

M. Murat Ardag

Model Card Contact

via my personal website. thx

Citation

If you use this model in your research or applications, please cite it as follows:

Ardag, M.M. (2024) RoBERTa-OrgCulture-Classifier (Revision 94b6fdd). HuggingFace. https://doi.org/10.57967/hf/2774

https://doi.org/10.57967/hf/2794