metadata

language: en
tags:
  - text-classification
  - albert

Model Card for albert-base-rci-wikisql-col

Model Details

Model Description

More information needed

Developed by: Michael Glass
Shared by [Optional]: Michael Glass
Model type: Token Classification
Language(s) (NLP): English
License: More information needed
Parent Model: ALBERT Base v2
Resources for more information:
- ALBERT Base GitHub Repo
  - ALBERT Base Paper

Uses

Direct Use

This model can be used for the task of text classification.

This model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering.

See ALBERT Base v2 model card for more information.

Downstream Use [Optional]

More information needed.

Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

For tasks such as text generation you should look at model like GPT2.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

The ALBERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and [English] Wikipedia(https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers). See ALBERT Base v2 model card for more information.

Training Procedure

Preprocessing

The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 30,000. The inputs of the model are then of the form:

[CLS] Sentence A [SEP] Sentence B [SEP]

See ALBERT Base v2 model card for more information.

Speeds, Sizes, Times

More information needed

Evaluation

Testing Data, Factors & Metrics

Testing Data

More information needed

Factors

More information needed

Metrics

More information needed

Results

More information needed

Model Examination

More information needed

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: More information needed
Hours used: More information needed
Cloud Provider: More information needed
Compute Region: More information needed
Carbon Emitted: More information needed

Technical Specifications [optional]

Model Architecture and Objective

More information needed

Compute Infrastructure

More information needed

Hardware

More information needed

Software

More information needed.

Citation

BibTeX:

@article{DBLP:journals/corr/abs-1909-11942,
  author    = {Zhenzhong Lan and
               Mingda Chen and
               Sebastian Goodman and
               Kevin Gimpel and
               Piyush Sharma and
               Radu Soricut},
  title     = {{ALBERT:} {A} Lite {BERT} for Self-supervised Learning of Language
               Representations},
  journal   = {CoRR},
  volume    = {abs/1909.11942},
  year      = {2019},
  url       = {http://arxiv.org/abs/1909.11942},
  archivePrefix = {arXiv},
  eprint    = {1909.11942},
  timestamp = {Fri, 27 Sep 2019 13:04:21 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1909-11942.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

APA:

More information needed

Glossary [optional]

More information needed

More Information [optional]

More information needed

Model Card Authors [optional]

Michael Glass in collaboration with Ezi Ozoani and the Hugging Face team

Model Card Contact

More information needed

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("michaelrglass/albert-base-rci-wikisql-col")

model = AutoModelForSequenceClassification.from_pretrained("michaelrglass/albert-base-rci-wikisql-col")