Spestly commited on
Commit
e53f901
·
verified ·
1 Parent(s): 96a305b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -1
README.md CHANGED
@@ -2,4 +2,102 @@
2
  base_model:
3
  - Qwen/Qwen2.5-3B-Instruct
4
  library_name: transformers
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-3B-Instruct
4
  library_name: transformers
5
+ ---
6
+ ![Header](Maverick.png)
7
+
8
+ # **Maverick-1-3B Model Card**
9
+
10
+ ## **Model Overview**
11
+
12
+ **Maverick-1-3B** is a 3.09-billion-parameter causal language model fine-tuned from Qwen2.5-3B-Instruct. This model is designed to excel in various natural language processing tasks, offering enhanced reasoning and instruction-following capabilities.
13
+
14
+ ## **Model Details**
15
+
16
+ - **Model Developer:** Aayan Mishra
17
+ - **Model Type:** Causal Language Model
18
+ - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
19
+ - **Parameters:** 3.09 billion total (2.77 billion non-embedding)
20
+ - **Layers:** 36
21
+ - **Attention Heads:** 16 for query and 2 for key-value (Grouped Query Attention)
22
+ - **Vocabulary Size:** Approximately 151,646 tokens
23
+ - **Context Length:** Supports up to 32,768 tokens
24
+ - **Languages Supported:** Primarily English, with basic support for other languages
25
+ - **License:** MIT
26
+
27
+ ## **Training Details**
28
+
29
+ Maverick-1-3B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated dataset focused on instruction-following and general NLP tasks. This approach aimed to enhance the model's performance in complex reasoning and academic tasks.
30
+
31
+ ## **Intended Use**
32
+
33
+ Maverick-1-3B is designed for a range of applications, including but not limited to:
34
+
35
+ - **General NLP Tasks:** Engaging in text completion, summarization, and question-answering tasks.
36
+ - **Academic Assistance:** Providing support for tutoring, essay composition, and research inquiries.
37
+ - **Data Analysis:** Offering insights and interpretations of data-centric queries.
38
+
39
+ While Maverick-1-3B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
40
+
41
+ ## **How to Use**
42
+
43
+ To utilize Maverick-1-3B, ensure that you have the latest version of the `transformers` library installed:
44
+
45
+ ```bash
46
+ pip install transformers
47
+ ```
48
+
49
+ Here's an example of how to load the Maverick-1-3B model and generate a response:
50
+
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+
54
+ model_name = "Spestly/Maverick-1-3B"
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ model_name,
57
+ torch_dtype="auto",
58
+ device_map="auto"
59
+ )
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+
62
+ prompt = "Explain the concept of entropy in thermodynamics."
63
+ messages = [
64
+ {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
65
+ {"role": "user", "content": prompt}
66
+ ]
67
+ text = tokenizer.apply_chat_template(
68
+ messages,
69
+ tokenize=False,
70
+ add_generation_prompt=True
71
+ )
72
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
73
+ generated_ids = model.generate(
74
+ **model_inputs,
75
+ max_new_tokens=512
76
+ )
77
+ generated_ids = [
78
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
79
+ ]
80
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
81
+ print(response)
82
+ ```
83
+
84
+ ## **Limitations**
85
+
86
+ Users should be aware of the following limitations:
87
+
88
+ - **Biases:** Maverick-1-3B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
89
+ - **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
90
+ - **Language Support:** While primarily trained on English data, performance in other languages may be inconsistent.
91
+
92
+ ## **Acknowledgements**
93
+
94
+ Maverick-1-3B builds upon the work of the Qwen team. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Maverick-1-3B.
95
+
96
+ ## **License**
97
+
98
+ Maverick-1-3B is released under the MIT License, permitting wide usage with proper attribution.
99
+
100
+ ## **Contact**
101
+
102
+ - Email: maverick@aayanmishra.com
103
+