--- library_name: transformers tags: [] --- # Model Card for Model ID ```python import torch from torch import nn import random from transformers import AutoModelForCausalLM, AutoTokenizer # make sure the model doesn't generate mask tokens bias = torch.zeros(34048) bias[32000:] = -100 model.lm_head.bias = nn.Parameter(bias) # -------------------------------------------------------------------------------- # Generation without masking input_ids = tokenizer("Once upon a time, in a land far far away...", return_tensors='pt').input_ids print(input_ids) # tensor([[ 1, 5713, 3714, 264, 727, 28725, 297, 264, 2533, 2082, # 2082, 1753, 1101]]) output = model.generate(input_ids, max_new_tokens=64) print(tokenizer.decode(output[0])) # ' Once upon a time, in a land far far away...\n\nThere was a magical place called Disneyland.\n\nIt was a place where dreams came true, where fairy tales became reality, and where magic was all around.\n\nBut one day, something terrible happened.\n\nThe magic began to fade.\n\nThe fairy tales became dull, the' # -------------------------------------------------------------------------------- # replace "far far" with two random indices instead (anything after 32k up to 34,048) # the model should pick up that two repeating words after "Once upon a time, in a land-" # and before "away" would probably be "far far" input_ids[input_ids==2082] = 32_001 print(input_ids) # tensor([[ 1, 5713, 3714, 264, 727, 28725, 297, 264, 2533, 32001, # 32001, 1753, 1101]]) output = model.generate(input_ids, max_new_tokens=64) print(tokenizer.decode(output[0])) # ' Once upon a time, in a land away...\n\nOnce upon a time, in a land far, far away, there was a magical kingdom called Flanders. It was a peaceful land, where everyone lived happily ever after.\n\nBut one day, a terrible thing happened. A terrible, terrible thing.\n\nA terrible, terrible thing happened.' # -------------------------------------------------------------------------------- # we can also get rid of everything except "", "Once", "upon", "away", "..." def create_masked_ids(input_ids, token_offset, ids_to_mask): unique_ids = torch.unique(input_ids).tolist() unique_id_map = random.sample([i for i in range(2048)], len(unique_ids)) id_to_shuffled = {id: shuffled for id, shuffled in zip(unique_ids, unique_id_map)} def map_to_shuffled(id): return id_to_shuffled[id] + token_offset shuffled_ids = input_ids.clone().apply_(map_to_shuffled) mask = torch.zeros_like(input_ids, dtype=torch.bool) for id_to_mask in ids_to_mask: mask |= (input_ids == id_to_mask) masked_ids = torch.where(mask, input_ids, shuffled_ids) return masked_ids masked_ids = create_masked_ids(input_ids, 32_000, [1, 5713, 3714, 1753, 1101]) print(masked_ids) # tensor([[ 1, 5713, 3714, 33048, 34032, 32238, 32016, 33048, 33013, 33299, # 33299, 1753, 1101]]) output = model.generate(masked_ids, max_new_tokens=64) print(tokenizer.decode(output[0])) # ' Once upon away...\n\nOnce upon a time, there was a young man named Alex. He was a very curious young man, and loved to explore the world around him. One day, he stumbled upon a magical book called "The Book of Secrets." This book contained all sorts of secrets about the world, and Alex was fasc' ``` this model isn't really made for benchmarks, it's worse on everything besides ARC-C and TruthfulQA | Model | ARC-C | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8k | | ------------------------------------------------------------ | --------- | --------- | ---------- | ---------- | ---------- | --------- | | [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 59.98 | **83.31** | **64.16** | 42.15 | **78.37** | **37.83** | | [crumb/92d52f-ame-full-7B](https://hf.co/crumb/92d52f-ame-full-7B) | **61.18** | 81.52 | 63.44 | **42.39** | 77.58 | 35.41 | it's got extra tokens which can all equally be used as masks, you can replace all instances of one token in context with one of the extra tokens (`[f'' for i in range(2048)]`) to give the model an extra hard time. it was trained with context length 2048 on three separate replacement techniques through a schedule, with 80% of all sequences being completely replaced with the mask tokens near the end of training. it was trained over ~0.5B tokens > what? how is that useful? i'm hoping to finetune it further while replacing the entire tokenizer with any number of other tokenizers, all utilizing the unique mask ids, to hopefully build a causal model of any sufficiently long artifact from any domain, for example, the voynich manuscript or an alien artifact ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] - **Finetuned from model [optional]:** [More Information Needed] ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data [More Information Needed] ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]