KevSun/climate-attitude-LM

This language model is designed to assess the attitude expressed in texts about climate change. It categorizes the attitude into three types: risk, neutral, and opportunity. These categories correspond to the negative, neutral, and positive classifications commonly used in sentiment analysis.

In comparison to similar existing models, such as "climatebert/distilroberta-base-climate-sentiment" and "XerOpred/twitter-climate-sentiment-model," which typically achieve accuracies ranging from 10% to 30% and F1 scores around 15%, our model demonstrates exceptional performance. When evaluated using the test dataset from "climatebert/climate_sentiment," it achieves an accuracy of 89% and an F1 score of 89%.

Note that you should paste or type a text concerning the climate change in the API input bar or using the testing code. Otherwise, the model does not work so well. e,.g, An example input could be, "Major oil companies have misled Americans for decades about the threat of human-caused climate change, according to a new report released Tuesday by Democrats in Congress. The 65-page report was the result of a three-year investigation and was made public hours before a Senate Budget Committee hearing about the role that oil and gas companies have played in global warming. "

Please cite: "Sun., K, and Wang, R. 2024. The fine-tuned language model for detecting human attitudes to climate changes" if you use this model.

The project in github (including training code) is available at: https://github.com/fivehills/climate_attitude_LM/

The following code shows how to test in the model.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model_path = "KevSun/climate-attitude-LM"  # Ensure this path points to the correct directory
model = AutoModelForSequenceClassification.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define the path to your text file
file_path = 'yourtext.txt'

# Read the content of the file
with open(file_path, 'r', encoding='utf-8') as file:
  new_text = file.read()

# Encode the text using the tokenizer used during training
encoded_input = tokenizer(new_text, return_tensors='pt', padding=True, truncation=True, max_length=64)

# Move the model to the correct device (CPU or GPU if available)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)  # Move model to the correct device
encoded_input = {k: v.to(device) for k, v in encoded_input.items()}  # Move tensor to the correct device

model.eval()  # Set the model to evaluation mode

# Perform the prediction
with torch.no_grad():
  outputs = model(**encoded_input)

# Get the predictions (assumes classification with labels)
predictions = outputs.logits.squeeze()

# Assuming softmax is needed to interpret the logits as probabilities
probabilities = torch.softmax(predictions, dim=0)

# Define labels for each class index based on your classification categories
labels = ["risk", "neutral", "opportunity"]
predicted_index = torch.argmax(probabilities).item()  # Get the index of the max probability
predicted_label = labels[predicted_index]
predicted_probability = probabilities[predicted_index].item()

# Print the predicted label and its probability
print(f"Predicted Label: {predicted_label}, Probability: {predicted_probability:.4f}")

##the output example: predicted Label: neutral, Probability: 0.8377