Model Card for Model ID
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: me
- Model type: Mistral
- Language(s) (NLP): en
- License: apache
Uses
general web text completions at extremely low resource use
Out-of-Scope Use
not an instruct model
Bias, Risks, and Limitations
trained on web text, though filtered no guarantees theres not toxic stuff in there
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("crumb/nano-mistral")
tokenizer = AutoTokenizer.from_pretrained("crumb/nano-mistral")
inputs = tokenizer(["Once upon a time,"], return_tensors="pt")
inputs = {k:v.to(model.device) for k,v in dict(inputs).items()}
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.7, top_k=20, do_sample=True)
outputs = tokenizer.batch_decode(outputs)
for i in outputs:
print(i)
Training Details
Training Data
Training Procedure
Parameter | Value |
---|---|
Context Length | 2048 |
Batch Size | 128 |
Learning Rate | 6e-4 |
Scheduler | One-Cycle |
Adam eps | 1e-8 |
Adam beta1 | 0.9 |
Adam beta2 | 0.95 |
Weight Decay | 0.1 |
Max Grad Norm | 1.0 |
Optimizer | adamw_torch |
Tokens | 3,401,640,960 |
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: bf16 non-mixed precision
Speeds, Sizes, Times [optional]
train_runtime 62541.9424
train_samples_per_second 26.557
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
held out set of crumb/askmistral-pile-2-15
Factors
[More Information Needed]
Metrics
open llm leaderboard eval datasets and settings
Results
OpenLLM Leaderboard Mean Score + Stderr: (29.30, 0.42)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | 1 | none | 25 | acc | 0.1843 | ± | 0.0113 |
none | 25 | acc_norm | 0.2167 | ± | 0.0120 | ||
truthfulqa_mc2 | 2 | none | 0 | acc | 0.4719 | ± | 0.0156 |
winogrande | 1 | none | 5 | acc | 0.517 | ± | 0.014 |
hellaswag | 1 | none | 10 | acc | 0.2803 | ± | 0.0045 |
none | 10 | acc_norm | 0.2886 | ± | 0.0045 | ||
gsm8k | 3 | strict-match | 5 | exact_match | 0.0008 | ± | 0.0008 |
flexible-extract | 5 | exact_match | 0.0099 | ± | 0.0027 |
MMLU
value, stderr = (0.253980701754386, 0.004428598058450528)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
world_religions | 0 | none | 5 | acc | 0.2222 | ± | 0.0319 |
virology | 0 | none | 5 | acc | 0.2711 | ± | 0.0346 |
us_foreign_policy | 0 | none | 5 | acc | 0.3300 | ± | 0.0473 |
sociology | 0 | none | 5 | acc | 0.2388 | ± | 0.0301 |
security_studies | 0 | none | 5 | acc | 0.2367 | ± | 0.0272 |
public_relations | 0 | none | 5 | acc | 0.2273 | ± | 0.0401 |
professional_psychology | 0 | none | 5 | acc | 0.2484 | ± | 0.0175 |
professional_medicine | 0 | none | 5 | acc | 0.4596 | ± | 0.0303 |
professional_law | 0 | none | 5 | acc | 0.2464 | ± | 0.0110 |
professional_accounting | 0 | none | 5 | acc | 0.2021 | ± | 0.0240 |
prehistory | 0 | none | 5 | acc | 0.2130 | ± | 0.0228 |
philosophy | 0 | none | 5 | acc | 0.2219 | ± | 0.0236 |
nutrition | 0 | none | 5 | acc | 0.2157 | ± | 0.0236 |
moral_scenarios | 0 | none | 5 | acc | 0.2380 | ± | 0.0142 |
moral_disputes | 0 | none | 5 | acc | 0.2486 | ± | 0.0233 |
miscellaneous | 0 | none | 5 | acc | 0.2516 | ± | 0.0155 |
medical_genetics | 0 | none | 5 | acc | 0.3000 | ± | 0.0461 |
marketing | 0 | none | 5 | acc | 0.2265 | ± | 0.0274 |
management | 0 | none | 5 | acc | 0.1748 | ± | 0.0376 |
machine_learning | 0 | none | 5 | acc | 0.3125 | ± | 0.0440 |
logical_fallacies | 0 | none | 5 | acc | 0.2393 | ± | 0.0335 |
jurisprudence | 0 | none | 5 | acc | 0.2315 | ± | 0.0408 |
international_law | 0 | none | 5 | acc | 0.3140 | ± | 0.0424 |
human_sexuality | 0 | none | 5 | acc | 0.2519 | ± | 0.0381 |
human_aging | 0 | none | 5 | acc | 0.3049 | ± | 0.0309 |
high_school_world_history | 0 | none | 5 | acc | 0.2658 | ± | 0.0288 |
high_school_us_history | 0 | none | 5 | acc | 0.2451 | ± | 0.0302 |
high_school_statistics | 0 | none | 5 | acc | 0.4722 | ± | 0.0340 |
high_school_psychology | 0 | none | 5 | acc | 0.1963 | ± | 0.0170 |
high_school_physics | 0 | none | 5 | acc | 0.3046 | ± | 0.0376 |
high_school_microeconomics | 0 | none | 5 | acc | 0.2773 | ± | 0.0291 |
high_school_mathematics | 0 | none | 5 | acc | 0.2667 | ± | 0.0270 |
high_school_macroeconomics | 0 | none | 5 | acc | 0.2667 | ± | 0.0224 |
high_school_government_and_politics | 0 | none | 5 | acc | 0.2591 | ± | 0.0316 |
high_school_geography | 0 | none | 5 | acc | 0.2424 | ± | 0.0305 |
high_school_european_history | 0 | none | 5 | acc | 0.2242 | ± | 0.0326 |
high_school_computer_science | 0 | none | 5 | acc | 0.2800 | ± | 0.0451 |
high_school_chemistry | 0 | none | 5 | acc | 0.2857 | ± | 0.0318 |
high_school_biology | 0 | none | 5 | acc | 0.3129 | ± | 0.0264 |
global_facts | 0 | none | 5 | acc | 0.1500 | ± | 0.0359 |
formal_logic | 0 | none | 5 | acc | 0.1905 | ± | 0.0351 |
elementary_mathematics | 0 | none | 5 | acc | 0.2513 | ± | 0.0223 |
electrical_engineering | 0 | none | 5 | acc | 0.2759 | ± | 0.0372 |
econometrics | 0 | none | 5 | acc | 0.2456 | ± | 0.0405 |
conceptual_physics | 0 | none | 5 | acc | 0.2638 | ± | 0.0288 |
computer_security | 0 | none | 5 | acc | 0.1800 | ± | 0.0386 |
college_physics | 0 | none | 5 | acc | 0.2549 | ± | 0.0434 |
college_medicine | 0 | none | 5 | acc | 0.2023 | ± | 0.0306 |
college_mathematics | 0 | none | 5 | acc | 0.2900 | ± | 0.0456 |
college_computer_science | 0 | none | 5 | acc | 0.2700 | ± | 0.0446 |
college_chemistry | 0 | none | 5 | acc | 0.2500 | ± | 0.0435 |
college_biology | 0 | none | 5 | acc | 0.2222 | ± | 0.0348 |
clinical_knowledge | 0 | none | 5 | acc | 0.2377 | ± | 0.0262 |
business_ethics | 0 | none | 5 | acc | 0.2100 | ± | 0.0409 |
astronomy | 0 | none | 5 | acc | 0.1776 | ± | 0.0311 |
anatomy | 0 | none | 5 | acc | 0.2593 | ± | 0.0379 |
abstract_algebra | 0 | none | 5 | acc | 0.2200 | ± | 0.0416 |
Summary
Model Examination [optional]
its ok
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: A6000
- Hours used: 34.74
- Cloud Provider: n/a
- Compute Region iowa
- Carbon Emitted: 4.5kg CO2eq.
Technical Specifications [optional]
Model Architecture and Objective
mistral, causal language modelling
Compute Infrastructure
what
Hardware
lambda vector 2xA6000
Software
huggingface transformers / pytorch / custom trainer
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 1,045