File size: 17,833 Bytes
2d8b4af
645e74f
 
 
 
 
 
 
2d8b4af
 
 
 
35741cd
2d8b4af
35741cd
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
35741cd
2d8b4af
 
 
645e74f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ac9442
645e74f
 
 
 
 
 
 
 
 
 
 
8ac9442
 
 
 
 
 
645e74f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
575fe4e
645e74f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89d04ef
645e74f
bc95cf0
645e74f
 
 
 
 
 
 
 
89d04ef
 
 
 
645e74f
 
89d04ef
 
 
 
645e74f
89d04ef
645e74f
89d04ef
bc95cf0
645e74f
 
 
89d04ef
645e74f
89d04ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2d8b4af
645e74f
89d04ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
645e74f
 
 
 
 
bc95cf0
 
 
 
645e74f
 
 
2d8b4af
645e74f
 
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
bc95cf0
 
 
645e74f
bc95cf0
645e74f
 
bc95cf0
 
 
 
645e74f
 
bc95cf0
645e74f
 
bc95cf0
645e74f
bc95cf0
2d8b4af
bc95cf0
2d8b4af
bc95cf0
2d8b4af
bc95cf0
2d8b4af
bc95cf0
2d8b4af
bc95cf0
 
2d8b4af
bc95cf0
 
 
 
 
 
 
 
 
 
 
 
 
 
2d8b4af
bc95cf0
 
 
 
 
 
 
 
2d8b4af
bc95cf0
2d8b4af
645e74f
2d8b4af
 
 
645e74f
35741cd
 
645e74f
 
 
bc95cf0
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
2d8b4af
645e74f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
---
datasets:
- jan-hq/instruction-speech-v1.5
language:
- en
license: apache-2.0
tags:
- sound language model
---

## Model Details

We have developed and released the family [llama3s](https://huggingface.co/collections/homebrew-research/llama3-s-669df2139f0576abc6eb7405). This family is natively understanding audio and text input.

We continue to expand our last checkpoint [jan-hq/Jan-Llama3s-0708](https://huggingface.co/jan-hq/Jan-Llama3s-0708) with 1.3B tokens from [Instruction Speech v1.5](https://huggingface.co/datasets/homebrew-research/instruction-speech-v1.5) dataset.

**Model developers** Homebrew Research.

**Input** Text and sound.

**Output** Text.

**Model Architecture** Llama-3.

**Language(s):** English.

## Intended Use

**Intended Use Cases** This family is primarily intended for research applications. This version aims to further improve the LLM on sound understanding capabilities.

**Out-of-scope** The use of llama3-s in any manner that violates applicable laws or regulations is strictly prohibited.

## How to Get Started with the Model

First, we need to convert the audio file to sound tokens

```python
import torch
import torchaudio
from encodec import EncodecModel
from encodec.utils import convert_audio

def audio_to_sound_tokens(audio_path, target_bandwidth=1.5, device="cuda"):
    # Initialize Encodec
    model = EncodecModel.encodec_model_24khz()
    model.set_target_bandwidth(target_bandwidth)
    model.to(device)

    # Load and preprocess audio
    wav, sr = torchaudio.load(audio_path)
    wav = convert_audio(wav, sr, model.sample_rate, model.channels)
    wav = wav.unsqueeze(0).to(device)

    # Encode audio
    with torch.no_grad():
        encoded_frames = model.encode(wav)
    codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1)

    # Flatten codes
    audio_code1, audio_code2 = codes[0][0], codes[0][1]
    flatten_tokens = torch.stack((audio_code1, audio_code2), dim=1).flatten().tolist()

    # Convert to sound tokens
    result = ''.join(f'<|sound_{num:04d}|>' for num in flatten_tokens)
    return f'<|sound_start|>{result}<|sound_end|>'

# Usage
sound_tokens = audio_to_sound_tokens("/path/to/your/audio/file")
```

Then, we can inference the model the same as any other LLM.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline

def setup_pipeline(model_path, use_4bit=False, use_8bit=False):
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    model_kwargs = {"device_map": "auto"}

    if use_4bit:
        model_kwargs["quantization_config"] = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
        )
    elif use_8bit:
        model_kwargs["quantization_config"] = BitsAndBytesConfig(
            load_in_8bit=True,
            bnb_8bit_compute_dtype=torch.bfloat16,
            bnb_8bit_use_double_quant=True,
        )
    else:
        model_kwargs["torch_dtype"] = torch.bfloat16

    model = AutoModelForCausalLM.from_pretrained(model_path, **model_kwargs)

    return pipeline("text-generation", model=model, tokenizer=tokenizer)

def generate_text(pipe, messages, max_new_tokens=64, temperature=0.0, do_sample=False):
    generation_args = {
        "max_new_tokens": max_new_tokens,
        "return_full_text": False,
        "temperature": temperature,
        "do_sample": do_sample,
    }

    output = pipe(messages, **generation_args)
    return output[0]['generated_text']

# Usage
llm_path = "jan-hq/Jan-Llama3s-0719"
pipe = setup_pipeline(llm_path, use_4bit=True)
messages = [
    {"role": "user", "content": sound_tokens},
]
generated_text = generate_text(pipe, messages)
print(generated_text)
```

## Training process
**Training Metrics Image**: Below is a snapshot of the training loss curve visualized.

![train_loss_curve](https://cdn-uploads.huggingface.co/production/uploads/65713d70f56f9538679e5a56/JYUOqGs1c-7vXtJt06KH5.png)

### Hardware

**GPU Configuration**: Cluster of 8x NVIDIA H100-SXM-80GB.
**GPU Usage**:
  - **Continual Training**: 14 hours.

### Training Arguments

| Parameter                  | Continual Training      | 
|----------------------------|-------------------------|
| **Epoch**                  | 1                       | 
| **Global batch size**      | 128                     | 
| **Learning Rate**          | 1.5e-4                  | 
| **Learning Scheduler**     | Cosine with warmup      | 
| **Optimizer**              | Adam torch fused        | 
| **Warmup Ratio**           | 0.01                    | 
| **Weight Decay**           | 0.01                    |
| **beta1**                  | 0.9                     | 
| **beta2**                  | 0.98                    | 
| **epsilon**                | 1e-6                    | 
| **Gradient Cliping**       | 1.0                     | 

###
 Accelerate FSDP Config

```
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: true
fsdp_config:
  fsdp_activation_checkpointing: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

## Examples

1. Good example:

<details>
<summary>Click to toggle Example 1</summary>

```
# Question:
Is the Paris Climate Change agreement going to help slow pollution?

# Expected Output:
Yes, the Paris Climate Agreement is designed to help slow pollution and combat climate change on a global scale. The main goal of the agreement is to limit global warming to well below 2 degrees Celsius, preferably to 1.5 degrees Celsius, compared to pre-industrial levels. To achieve this, countries that have signed the agreement commit to reducing their greenhouse gas emissions.

The agreement encourages countries to move towards renewable energy sources and away from fossil fuels, which are a major source of pollution. It also promotes sustainable practices in various sectors such as agriculture, forestry, and transportation, which can help reduce pollution.

However, the effectiveness of the Paris Agreement in slowing pollution depends on the commitment and action of each participating country. It's also important to note that while the agreement can help slow pollution, it's just one part of a much larger effort needed to fully address climate change and environmental pollution.

# Model's Output:
Yes, the Paris Climate Agreement is a global commitment to limit global warming to well below 2 degrees Celsius above pre-industrial levels and pursue efforts to limit it to 1.5 degrees Celsius. The agreement was signed by 195 countries and entered into force on November 4, 2016. 

The agreement sets out a framework for countries to reduce greenhouse gas emissions and adapt to the impacts of climate change. It also encourages countries to increase their ambition and to support developing countries in their efforts to reduce emissions. 

The Paris Agreement is a significant step towards addressing climate change, but it is not a guarantee of success. The agreement requires countries to submit Nationally Determined Contributions (NDCs) outlining their plans to reduce emissions and adapt to the impacts of climate change. The success of the agreement will depend on the ability of countries to meet their NDCs and to work together to address the global nature of climate change.
```
</details>

<details>
<summary>Click to toggle Example 2</summary>

```
# Question:
Write an article about the Instruction of Dorastine 1500-2000 words in chemical industry
# Expected Output:
Title: The Instruction of Dorastine in the Chemical Industry

Introduction

The chemical industry has seen a significant evolution in the past few decades, with the introduction of new substances and compounds that have revolutionized various sectors. One such compound is Dorastine, a chemical compound that has found extensive use in different industries. This article aims to provide a comprehensive overview of Dorastine, its properties, applications, and its role in the chemical industry.

Understanding Dorastine

Dorastine is a synthetic compound that is primarily used as a catalyst in various chemical reactions. It is known for its high reactivity and stability, making it an ideal choice for a wide range of industrial applications. The compound is typically available in a powdered form and is characterized by its white or off-white color.

Properties of Dorastine

Dorastine exhibits several unique properties that make it a valuable asset in the chemical industry. It has a high melting point, which allows it to withstand extreme temperatures without losing its structural integrity. Additionally, Dorastine is highly resistant to corrosion, making it suitable for use in harsh environments. It also has excellent thermal and electrical conductivity, which further expands its range of applications.

Applications of Dorastine in the Chemical Industry

The chemical industry leverages the unique properties of Dorastine in various ways. One of the primary uses of Dorastine is as a catalyst in chemical reactions. Its high reactivity allows it to accelerate the rate of chemical reactions without being consumed in the process. This makes it an economical choice for industries that rely on large-scale chemical reactions.

Dorastine is also used in the production of polymers and plastics. Its ability to withstand high temperatures and resist corrosion makes it an ideal additive in the production of high-performance plastics. These plastics are used in a variety of applications, including automotive parts, aerospace components, and electronic devices.

In the pharmaceutical industry, Dorastine is used in the synthesis of various drugs. Its high reactivity allows it to facilitate complex chemical reactions, leading to the production of a wide range of pharmaceutical products.

Safety and Handling of Dorastine

Despite its many benefits, Dorastine must be handled with care due to its high reactivity. It should be stored in a cool, dry place away from heat sources and open flames. Protective clothing, including gloves and eye protection, should be worn when handling Dorastine to prevent skin and eye contact. In case of accidental exposure, immediate medical attention should be sought.

Conclusion

Dorastine is a versatile compound that plays a crucial role in the chemical industry. Its unique properties and wide range of applications make it an invaluable asset in various sectors, from polymer production to pharmaceutical synthesis. However, its high reactivity necessitates careful handling and storage. As the chemical industry continues to evolve, the use of Dorastine is expected to grow, further cementing its place as a cornerstone of industrial chemistry.

# Model's Output:
Title: The Instruction and Application of 2-Bromo-3-nitro-4-methylpyridine in the Chemical Industry

Introduction

2-Bromo-3-nitro-4-methylpyridine is a significant chemical compound that has found extensive use in the chemical industry. This compound, with the molecular formula C6H3BrN2O2, is a derivative of pyridine, a basic heterocyclic organic compound. This article aims to provide a comprehensive instruction on the synthesis, properties, and applications of 2-Bromo-3-nitro-4-methylpyridine in the chemical industry.

Synthesis of 2-Bromo-3-nitro-4-methylpyridine

The synthesis of 2-Bromo-3-nitro-4-methylpyridine involves a series of chemical reactions. The process begins with the nitration of pyridine, which is achieved by reacting it with a nitrating agent such as nitric acid. The resulting compound is then subjected to a bromination reaction, where it is reacted with a brominating agent such as bromine. The final product is then purified through a series of distillation and crystallization steps.

Properties of 2-Bromo-3-nitro-4-methylpyridine

2-Bromo-3-nitro-4-methylpyridine is a solid compound with a melting point of 150-155°C. It is soluble in organic solvents such as ethanol and acetone, making it suitable for use in various chemical reactions. The compound is stable under normal conditions but can decompose under high temperatures or in the presence of strong oxidizing agents.

Applications in the Chemical Industry

2-Bromo-3-nitro-4-methylpyridine is a versatile compound with numerous applications in the chemical industry. Its primary use is as an intermediate in the synthesis of various other chemical compounds. Its unique structure allows it to undergo a wide range of chemical reactions, making it a valuable tool in the creation of complex molecules.

One of the primary applications of 2-Bromo-3-nitro-4-methylpyridine is in the pharmaceutical industry. It is used in the synthesis of various drugs, including antiviral, antibacterial, and anticancer agents. Its ability to form stable complexes with various metal ions makes it a useful ligand in metal ion coordination chemistry.

In addition to its use
```
</details>


2. Misunderstanding example:

<details>
<summary>Click to toggle Example 3</summary>
  
```
# Question:
Why is gasoline a liquid?

# Expected Output:
Gasoline is a liquid because of the nature of its molecules and the forces acting between them. It is made up of hydrocarbons, which are molecules composed of hydrogen and carbon atoms. These molecules are held together by covalent bonds, which are quite strong. 

The forces acting between the molecules of gasoline are known as Van der Waals forces, which are relatively weak intermolecular forces. These forces are strong enough to keep the molecules together in a liquid form under normal conditions (room temperature and pressure), but not strong enough to keep them in a solid form. 

When the temperature drops significantly, gasoline can become a solid, and when it's heated, it becomes a gas. But under most conditions we encounter in daily life, these forces make gasoline a liquid.

# Model's Output:
As an AI language model, I do not have personal opinions or beliefs. However, I can provide information on the topic. 

Glasgow is a city in Scotland, and it is not a specific location for a lake. However, there are several lakes in Scotland, including Loch Lomond, which is the largest inland body of water in the world, and Loch Ness, which is famous for its mythical creature, the Loch Ness Monster. There are also several smaller lakes in the region, such as Loch Fyne and Loch Fyne. It is unclear which specific lake you are referring to.
```
</details>

3. Off-tracked example:

<details>
<summary>Click to toggle Example 4</summary>

```
# Quesion:
I am going to San Francisco next month.  Can you tell me the history of cable cars?

# Expected Output:
"Absolutely, I'd be happy to share some history about San Francisco's iconic cable cars!

The cable car system in San Francisco was first established in 1873 by Andrew Smith Hallidie, an engineer and businessman. Hallidie was inspired to create the cable car system after witnessing horses struggling to pull carriages up the city's steep hills. He thought there had to be a better way, and thus, the idea of the cable car was born.

The first cable car line was the Clay Street Hill Railroad, which started operation on August 2, 1873. The system was a success and soon other cable car lines were established throughout the city. By 1889, there were eight different cable car companies operating in San Francisco.

However, the 20th century brought new forms of transportation, such as the electric streetcar and the automobile, which were cheaper and more flexible than cable cars. By the 1950s, only a few cable car lines remained in operation.

In the 1960s, there was a movement to eliminate the remaining cable cars, as they were seen as outdated and inefficient. However, a public outcry led by Friedel Klussmann, a local civic leader, saved the cable cars. They were recognized as a historic symbol of San Francisco and an important tourist attraction.

Today, San Francisco's cable cars are the only ones still operating in a U.S. city. They are a beloved symbol of the city and are listed on the National Register of Historic Places. The cable cars continue to be a popular way to travel, especially for tourists wanting to experience a piece of San Francisco's history."

# Model's Output:
"Sure, here are some examples of how you can use the `get` command in Kubernetes:

1. **Get all pods:**
   ```
   kubectl get pods
   ```

2. **Get all services:**
   ```
   kubectl get services
   ```

3. **Get all deployments:**
   ```
   kubectl get deployments
   ```

4. **Get all deployments with a specific name:**
   ```
   kubectl get deployments --name=example
   ```

Remember to replace `example` with the actual name of your deployment."
```
</details>

After training for longer, the model started to understand more about sound-text semantics.

## Citation Information

**BibTeX:**

```
@article{Llama3-S: Sound Instruction Language Model 2024,
  title={Llama3-S},
  author={Homebrew Research},
  year=2024,
  month=July},
  url={https://huggingface.co/jan-hq/Jan-Llama3-0719}
```

## Acknowledgement

- **[WhisperSpeech](https://github.com/collabora/WhisperSpeech)**

- **[Encodec](https://github.com/facebookresearch/encodec)**

- **[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)**