KenyaESG-RoBERTa-env

Binary sentence classifier for Environmental (=env) ESG communication in corporate reports of firms listed on the Nairobi Securities Exchange (NSE). One of three independent pillar models (env, soc, gov), following the sentence-level design of Schimanski et al. (2024).

Intended use

Classify a report sentence as env=1 (environmental content present) or 0. Aggregating the predictions over all sentences in a report yields a firm-year environmental disclosure score (the proportion of environmental sentences). The three pillar classifiers are applied independently, so a sentence may be positive on more than one pillar.

Training data

3,900 unique sentences (the 100-sentence human evaluation set is held out): reviewed NSE/Kenyan sentences plus reference sentences from Schimanski et al. (2024). Kenyan labels were assigned by keyword filtering refined by a single-reviewer pass — not full manual annotation. Decision threshold 0.5; base model roberta-base; maximum sequence length 256.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo = "josephagossa/KenyaESG-RoBERTa-env"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)

text = "The company reduced its greenhouse gas emissions by 20% and expanded renewable energy use in 2021."
inputs = tok(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    prob = torch.softmax(model(**inputs).logits, dim=-1)[0, 1].item()
label = int(prob >= 0.5)   # 1 = environmental content present
print(label, round(prob, 3))

Evaluation

Held-out human benchmark, n = 100 sentences, scored against a two-annotator adjudicated ground truth that is independent of the keyword training labels.

Metric	Value
F1	0.917
Precision	0.880
Recall	0.957
Inter-annotator kappa	0.940
Label-vs-human kappa	0.840

Limitations

Captures disclosure intensity, not substantive quality. Best-measured pillar (highest annotator agreement and label quality). Trained on English-language NSE reports; cross-market and cross-language generalisation is untested. Labels derive from a keyword filter plus a single-reviewer pass rather than full manual annotation.

Citation

If you use this model, please cite the accompanying paper and the reference dataset.

Paper

Agossa, J. (2026). Pricing the Cost of Compliance: Equity Reactions to Mandatory ESG Disclosure in a Frontier Market. Working paper. SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6966682

@unpublished{agossa2026compliance,
  author = {Agossa, Joseph},
  title  = {Pricing the Cost of Compliance: Equity Reactions to
            Mandatory ESG Disclosure in a Frontier Market},
  year   = {2026},
  note   = {Working paper, SSRN 6966682},
  url    = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6966682}
}

Reference sentences

Schimanski, T., Reding, A., Reding, N., Bingler, J., Kraus, M., & Leippold, M. (2024). Bridging the gap in ESG measurement: Using NLP to quantify environmental, social, and governance communication. Finance Research Letters, 61, 104979.

@article{schimanski2024bridging,
  author  = {Schimanski, Tobias and Reding, Andrin and Reding, Nico and
             Bingler, Julia and Kraus, Mathias and Leippold, Markus},
  title   = {Bridging the gap in {ESG} measurement: Using {NLP} to quantify
             environmental, social, and governance communication},
  journal = {Finance Research Letters},
  volume  = {61},
  pages   = {104979},
  year    = {2024}
}

Downloads last month: 49

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for josephagossa/KenyaESG-RoBERTa-env

Base model

FacebookAI/roberta-base

Finetuned

(2360)

this model