Edit model card

Jamba-Nexus-IKM

This model has been trained on the Internal Knowledge Map dataset.


Example Output:

<|startoftext|><|im_start|>system 
You are a helpful AI assistant.
<|im_end|> 
<|im_start|>user
How could we use cheese to reignite the sun?
<|im_end|> 
<|im_start|>assistant
The idea of using cheese to reignite the sun may sound strange,
but it's actually quite plausible. Here's how it works:

1. Nuclear fusion: Cheese contains a high concentration of phosphorus and calcium, which can react with each other under intense heat and pressure. When this reaction occurs, it produces large amounts of energy in the form of nuclear fusion, just like what happens inside our sun.
2. Reigniting the sun: To reignite the sun, we need to introduce a source of energy that can trigger nuclear fusion in its core. The high phosphorus and calcium content in cheese makes it an ideal candidate for this purpose. By injecting a large quantity of cheese into the sun's core, we can initiate nuclear fusion reactions and provide the sun with a new source of energy to reignite itself.<|im_end|>
<|endoftext|>

Inference

!pip install -qqq transformers>=4.39.0 mamba-ssm causal-conv1d>=1.2.0 accelerate bitsandbytes --progress-bar off
!pip install flash-attn --no-build-isolation

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

double_quant_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.float16
)

model = AutoModelForCausalLM.from_pretrained(
    "Severian/Jamba-Nexus-IKM-v1",
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    quantization_config=double_quant_config,
)
tokenizer = AutoTokenizer.from_pretrained("Severian/Jamba-Nexus-IKM-v1")

input_text = """<|im_start|>system 
You are a helpful AI assistant.
<|im_end|> 
<|im_start|>user
How could we use cheese to reignite the sun?
<|im_end|> 
<|im_start|>assistant
"""

input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]

outputs = model.generate(input_ids, max_new_tokens=1024, temperature=0.0, repetition_penalty=1.1)

print(tokenizer.batch_decode(outputs)[0])
# <|startoftext|><|im_start|>system 
# You are a helpful AI assistant.
# <|im_end|> 
# <|im_start|>user
# How could we use cheese to reignite the sun?
# <|im_end|> 
# <|im_start|>assistant
# The idea of using cheese to reignite the sun may sound strange, but it's actually quite plausible. Here's how it works: 1. Nuclear fusion: Cheese contains a high concentration of phosphorus and calcium, which can react with each other under intense heat and pressure. When this reaction occurs, it produces large amounts of energy in the form of nuclear fusion, just like what happens inside our sun. 2. Reigniting the sun: To reignite the sun, we need to introduce a source of energy that can trigger nuclear fusion in its core. The high phosphorus and calcium content in cheese makes it an ideal candidate for this purpose. By injecting a large quantity of cheese into the sun's core, we can initiate nuclear fusion reactions and provide the sun with a new source of energy to reignite itself.<|im_end|>
# <|endoftext|>
[383/1171 33:25 < 1:09:07, 0.19 it/s, Epoch 0.33/1]
Step	Training Loss
1	10.680900
2	10.793200
3	8.870600
4	8.817300
5	13.537700
6	14.457900
7	14.419900
8	13.235300
9	10.764000
10	10.614000
11	12.617900
12	11.241100
13	10.644600
14	11.787900
15	11.430500
16	11.913600
17	10.418000
18	9.867500
19	9.392300
20	8.825400
21	8.238000
22	8.030900
23	7.902800
24	8.247100
25	7.871800
26	7.040200
27	8.326700
28	7.478000
29	6.724300
30	6.646100
31	6.375500
32	6.677100
33	7.157500
34	5.913300
35	6.432800
36	6.342500
37	5.987400
38	5.893300
39	5.194400
40	5.260600
41	5.697200
42	5.065100
43	4.868600
44	5.102600
45	4.660700
46	6.133700
47	4.706000
48	4.598300
49	4.569700
50	4.546100
51	4.799700
52	4.632400
53	4.342000
54	4.338600
55	5.103600
56	5.415300
57	5.488200
58	6.379000
59	4.440300
60	5.374200
61	5.150200
62	4.162400
63	4.020500
64	3.953600
65	4.621100
66	3.870800
67	4.863500
68	4.967800
69	3.887500
70	3.848400
71	3.681100
72	3.571800
73	3.585700
74	4.433200
75	4.752700
76	4.151600
77	3.193300
78	4.800000
79	3.036500
80	2.827300
81	4.570700
82	2.903900
83	5.724400
84	5.984600
85	4.146200
86	2.905400
87	3.950700
88	2.650200
89	3.064800
90	3.072800
91	3.083100
92	2.970900
93	4.492900
94	2.664900
95	2.507200
96	2.549800
97	2.476700
98	2.548200
99	3.978200
100	2.654500
101	2.478400
102	4.039500
103	2.201600
104	2.030600
105	1.993000
106	1.773600
107	4.248400
108	1.777600
109	3.311100
110	1.720900
111	5.827900
112	1.679600
113	3.789200
114	1.593900
115	1.241600
116	1.306900
117	5.464400
118	1.536000
119	1.328700
120	1.132500
121	1.144900
122	0.923600
123	0.690700
124	1.142500
125	5.850100
126	1.102200
127	0.939700
128	0.727700
129	3.941400
130	0.791900
131	0.662900
132	3.319800
133	0.623900
134	0.521800
135	0.375600
136	0.302900
137	0.225400
138	2.994300
139	0.214300
140	0.229000
141	2.751600
142	0.298000
143	0.227500
144	2.300500
145	0.180900
146	0.629700
147	0.420900
148	2.648600
149	1.837600
150	0.524800
...
1148 0.004700
Downloads last month
12
Safetensors
Model size
17.7B params
Tensor type
BF16
·
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train Severian/Jamba-Nexus-4xMoE

Collection including Severian/Jamba-Nexus-4xMoE