ekurtulus commited on
Commit
3080190
1 Parent(s): b08f63f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="https://huggingface.co/HyperbeeAI/Tulpar-7b-v0/resolve/main/tulpar.png" width="360" height="360" >
3
+ </p>
4
+
5
+ # Model Description
6
+ Tulpar-7b is a LLama2-7b-based model trained by Hyperbee.ai. Training is done on a filtered and preprocessed instruction finetuning dataset that includes GPT-4 generated and generally curated datasets like Airoboros and Platypus.
7
+
8
+ # Example Usage
9
+
10
+
11
+
12
+ # Evaluation
13
+ Our offline HF Leaderboard evaluation results:
14
+ ||||
15
+ |:------:|:--------:|:-------:|
16
+ |**Task**|**Metric**|**Value**|
17
+ |*arc_challenge*|acc_norm|0.5614|
18
+ |*hellaswag*|acc_norm|0.7901|
19
+ |*mmlu*|acc_norm|0.5242|
20
+ |*truthfulqa_mc*|mc2|0.5160|
21
+ |**Average**|-|**0.5979**||
22
+
23
+ Other GPT4All evaluation results:
24
+ ||||
25
+ |:------:|:--------:|:-------:|
26
+ |**Task**|**Metric**|**Value**|
27
+ |boolq|acc |0.8306|
28
+ |piqa|acc |0.7905|
29
+ | |acc_norm|0.7884|
30
+ |winogrande|acc |0.7159|
31
+ |openbookqa|acc |0.356|
32
+ | |acc_norm|0.448|
33
+ |**Average** (including HF leaderboard datasets) | | 0.6468|
34
+
35
+ BigBenchHard results:
36
+ ||||
37
+ |:------:|:--------:|:-------:|
38
+ |**Task**|**Metric**|**Value**|
39
+ |bigbench_causal_judgement |multiple_choice_grade|0.6105|
40
+ |bigbench_date_understanding |multiple_choice_grade|0.6423|
41
+ |bigbench_disambiguation_qa |multiple_choice_grade|0.3643|
42
+ |bigbench_dyck_languages |multiple_choice_grade|0.2000|
43
+ |bigbench_formal_fallacies_syllogisms_negation |multiple_choice_grade|0.5002|
44
+ |bigbench_geometric_shapes |multiple_choice_grade|0.0000|
45
+ | |exact_str_match |0.0000|
46
+ |bigbench_hyperbaton |multiple_choice_grade|0.6754|
47
+ |bigbench_logical_deduction_five_objects |multiple_choice_grade|0.2700|
48
+ |bigbench_logical_deduction_seven_objects |multiple_choice_grade|0.1929|
49
+ |bigbench_logical_deduction_three_objects |multiple_choice_grade|0.4133|
50
+ |bigbench_movie_recommendation |multiple_choice_grade|0.3000|
51
+ |bigbench_navigate |multiple_choice_grade|0.5000|
52
+ |bigbench_reasoning_about_colored_objects |multiple_choice_grade|0.5750|
53
+ |bigbench_ruin_names |multiple_choice_grade|0.3281|
54
+ |bigbench_salient_translation_error_detection |multiple_choice_grade|0.2976|
55
+ |bigbench_snarks |multiple_choice_grade|0.6022|
56
+ |bigbench_sports_understanding |multiple_choice_grade|0.5122|
57
+ |bigbench_temporal_sequences |multiple_choice_grade|0.1450|
58
+ |bigbench_tracking_shuffled_objects_five_objects |multiple_choice_grade|0.1976|
59
+ |bigbench_tracking_shuffled_objects_seven_objects|multiple_choice_grade|0.1440|
60
+ |bigbench_tracking_shuffled_objects_three_objects|multiple_choice_grade|0.4133|
61
+ |**Average**| |0.3754
62
+
63
+ # Ethical Considerations and Limitations
64
+ Tulpar is a technology with potential risks and limitations. This model is finetuned only in English and all language-related scenarios are not covered. As Hyperbee.ai, we neither guarantee ethical, accurate, unbiased, objective responses nor endorse its outputs. Before deploying this model, you are advised to make safety tests for your use case.
65
+