--- language: en tags: Text Classification license: apache-2.0 datasets: - batterydata/paper-abstracts metrics: glue --- # BatterySciBERT-cased for Battery Abstract Classification **Language model:** batteryscibert-cased **Language:** English **Downstream-task:** Text Classification **Training data:** training\_data.csv **Eval data:** val\_data.csv **Code:** See [example](https://github.com/ShuHuang/batterybert) **Infrastructure**: 8x DGX A100 ## Hyperparameters ``` batch_size = 32 n_epochs = 11 base_LM_model = "batteryscibert-cased" learning_rate = 2e-5 ``` ## Performance ``` "Validation accuracy": 97.06, "Test accuracy": 97.19, ``` ## Usage ### In Transformers ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline model_name = "batterydata/batteryscibert-cased-abstract" # a) Get predictions nlp = pipeline('text-classification', model=model_name, tokenizer=model_name) input = {'The typical non-aqueous electrolyte for commercial Li-ion cells is a solution of LiPF6 in linear and cyclic carbonates.'} res = nlp(input) # b) Load model & tokenizer model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) ``` ## Authors Shu Huang: `sh2009 [at] cam.ac.uk` Jacqueline Cole: `jmc61 [at] cam.ac.uk` ## Citation BatteryBERT: A Pre-trained Language Model for Battery Database Enhancement