jetmoe
/

jetmoe-8b-chat

@@ -1,74 +1,22 @@
----
-base_model: jetmoe/jetmoe-8b-sft
-tags:
-- alignment-handbook
-- generated_from_trainer
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
-model-index:
-- name: jetmoe-8b-chat
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# jetmoe-8b-chat
-This model is a fine-tuned version of [jetmoe-8b-sft](https://huggingface.co/jetmoe/jetmoe-8b-sft) on the HuggingFaceH4/ultrafeedback_binarized dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6372
-- Rewards/chosen: -0.0901
-- Rewards/rejected: -0.2250
-- Rewards/accuracies: 0.7148
-- Rewards/margins: 0.1349
-- Logps/rejected: -289.3396
-- Logps/chosen: -286.2378
-- Logits/rejected: -2.9020
-- Logits/chosen: -2.9443
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-07
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- total_eval_batch_size: 64
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6664        | 0.42  | 200  | 0.6622          | -0.0185        | -0.0869          | 0.6997             | 0.0684          | -275.5274      | -279.0778    | -2.9127         | -2.9572       |
-| 0.6428        | 0.84  | 400  | 0.6372          | -0.0901        | -0.2250          | 0.7148             | 0.1349          | -289.3396      | -286.2378    | -2.9020         | -2.9443       |
-### Framework versions
-- Transformers 4.39.0.dev0
-- Pytorch 2.1.2
-- Datasets 2.14.6
-- Tokenizers 0.15.2

+# JetMoE-8B-chat: Efficient and High-Performance LLM
+Welcome to the official repository of JetMoE-8B-chat, a language model that combines cost-efficiency with high performance, making state-of-the-art language modeling accessible to a broader audience, including academia and small-scale industry players.
+## Key Highlights
+- **Cost-Effective Training**: Achieved at less than $0.1 million, JetMoE-8B significantly lowers the barrier to entry for training large language models (LLMs), demonstrating that high-quality LLM training can be far more economical than widely assumed.
+- **Academia-Friendly**: By relying exclusively on public datasets and open-sourcing our code, JetMoE-8B is highly accessible for educational and research purposes. It is designed to be fine-tuned even on consumer-grade GPUs, making it feasible for most academic labs.
+- **Efficiency at Scale**: With only 2.2B active parameters during inference, JetMoE-8B provides an optimal balance between computational cost and performance, outperforming similarly sized models such as Gemma-2B across various benchmarks.
+- JetMoE-8B-chat has been evaluated using the MT-Bench. Here is how JetMoE-8B-chat compares with other models:
+| Model               | Score     |
+|---------------------|-----------|
+| GPT-4               | 9.014     |
+| GPT-3.5-turbo       | 7.995     |
+| Claude-v1           | 7.923     |
+| **JetMoE-8B-chat**  | **6.681** |
+| Llama-2-13b-chat    | 6.650     |
+| Vicuna-13b-v1.3     | 6.413     |
+| Wizardlm-13b        | 6.353     |
+| Llama-2-7b-chat     | 6.269     |