# Model Details

The XLM model was proposed in Cross-lingual Language Model Pretraining by Guillaume Lample, Alexis Conneau. xlm-clm-ende-1024 is a transformer pretrained using a causal language modeling (CLM) objective (next token prediction) for English-German.

# Uses

## Direct Use

The model is a language model. The model can be used for causal language modeling.

## Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

# Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

## Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

# Training

See the associated paper for details on the training data and training procedure.

# Evaluation

## Testing Data, Factors & Metrics

See the associated paper for details on the testing data, factors and metrics.

## Results

For xlm-clm-ende-1024 results, see Table 2 of the associated paper.

# Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

# Technical Specifications

The model developers write:

We implement all our models in PyTorch (Paszke et al., 2017), and train them on 64 Volta GPUs for the language modeling tasks, and 8 GPUs for the MT tasks. We use float16 operations to speed up training and to reduce the memory usage of our models.

See the associated paper for further details.

# Citation

BibTeX:

# Model Card Authors

# How to Get Started with the Model

Use the code below to get started with the model.

import torch

tokenizer = XLMTokenizer.from_pretrained("xlm-clm-ende-1024")

input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")])  # batch size of 1

language_id = tokenizer.lang2id["en"]  # 0
langs = torch.tensor([language_id] * input_ids.shape[1])  # torch.tensor([0, 0, 0, ..., 0])

# We reshape it to be of size (batch_size, sequence_length)
langs = langs.view(1, -1)  # is now of shape [1, sequence_length] (we have a batch size of 1)

outputs = model(input_ids, langs=langs)

Mask token: <special1>