HaNguyen commited on
Commit
ced68aa
1 Parent(s): 07ef9f3
Files changed (8) hide show
  1. README.md +134 -0
  2. config.json +23 -0
  3. custom.py +70 -0
  4. hyperparams.yaml +44 -0
  5. llama2_model.ckpt +3 -0
  6. tokenizer.json +0 -0
  7. tokenizer.model +3 -0
  8. tokenizer_config.json +33 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ thumbnail: null
5
+ tags:
6
+ - response-generation
7
+ - llama2
8
+ - pytorch
9
+ - speechbrain
10
+ license: apache-2.0
11
+ datasets:
12
+ - multiwoz
13
+ metrics:
14
+ - name: Test PPL
15
+ type: ppl
16
+ value: ' 2.90'
17
+ ---
18
+
19
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
20
+ <br/><br/>
21
+
22
+ # Llama2 trained on MultiWOZ.2.1
23
+ ### Notice: “Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.” ###
24
+
25
+ This repository provides all the necessary tools to perform response generation from an end-to-end system within
26
+ SpeechBrain. For a better experience, we encourage you to learn more about
27
+ [SpeechBrain](https://speechbrain.github.io).
28
+ The performance of the model is the following:
29
+
30
+ | Release | Test PPL | Test BLEU 4 | GPUs |
31
+ |:-------------:|:--------------:|:--------------:| :--------:|
32
+ | 2023-10-15 | 2.90 | 7.45e-04 | 1xV100 32GB |
33
+
34
+ ## Credits
35
+ The model is provided by [vitas.ai](https://www.vitas.ai/).
36
+
37
+ ## Pipeline description
38
+ This dialogue system is composed of 2 different but linked blocks:
39
+
40
+ - Pretrained Llama2 Tokenizer that transforms words into subwords.
41
+ - Llama2 to generate the next sentence given the history of the dialogue.
42
+
43
+ The system is trained with dialogue from the MultiWOZ corpus.
44
+
45
+
46
+ ## Install SpeechBrain
47
+ First of all, please install SpeechBrain with the following command:
48
+
49
+ ```
50
+ git clone https://github.com/speechbrain/speechbrain
51
+ cd speechbrain
52
+ pip install -r requirements.txt
53
+ pip install -e .
54
+ pip install -r recipes/MultiWOZ/response_generation/llama2/extra_requirements.txt
55
+ ```
56
+
57
+ Please notice that we encourage you to read our tutorials and learn more about
58
+ [SpeechBrain](https://speechbrain.github.io).
59
+
60
+ ### Generating your Own Dialogue
61
+
62
+ ```python
63
+ from speechbrain.inference.text import Llama2ResponseGenerator
64
+ res_gen_model = Llama2ResponseGenerator.from_hparams(source="speechbrain/MultiWOZ-Llama2-Response_Generation", savedir="pretrained_models/MultiWOZ-Llama2-Response_Generation", pymodule_file="custom.py")
65
+ print("Hi,How could I help you today?", end="\n")
66
+ while True:
67
+ turn = input()
68
+ response = res_gen_model.generate_response(turn)
69
+ print(response, end="\n")
70
+ ```
71
+
72
+ ### Inference on GPU
73
+
74
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
75
+
76
+ ## Parallel Inference on a Batch
77
+
78
+ Please, [see this Colab notebook](https://colab.research.google.com/drive/1hX5ZI9S4jHIjahFCZnhwwQmFoGAi3tmu?usp=sharing) to figure out how to transcribe in parallel a batch of input sentences using a pre-trained model.
79
+
80
+ ### Training
81
+
82
+ The model was trained with SpeechBrain (986a2175).
83
+ To train it from scratch follow these steps:
84
+
85
+ 1. Clone SpeechBrain:
86
+
87
+ ```bash
88
+ git clone https://github.com/speechbrain/speechbrain/
89
+ ```
90
+
91
+ 2. Install it:
92
+
93
+ ```
94
+ cd speechbrain
95
+ pip install -r requirements.txt
96
+ pip install -e .
97
+
98
+ ```
99
+
100
+ 3. Run Training:
101
+
102
+ ```
103
+ cd recipes/MultiWOZ/response_generation/llama2
104
+ pip install -r extra_requirements.txt
105
+ python train_with_llama2.py hparams/train_llama2.yaml --data_folder=your_data_folder
106
+ ```
107
+
108
+ You can find our training results (models, logs, etc) [here](ttps://www.dropbox.com/sh/d093vsje1d7ijj9/AAA-nHEd_MwNEFJfBGLmXxJra?dl=0)
109
+
110
+ ### Limitations
111
+
112
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
113
+
114
+ # **About SpeechBrain**
115
+
116
+ - Website: https://speechbrain.github.io/
117
+ - Code: https://github.com/speechbrain/speechbrain/
118
+ - HuggingFace: https://huggingface.co/speechbrain/
119
+
120
+ # **Citing SpeechBrain**
121
+
122
+ Please, cite SpeechBrain if you use it for your research or business.
123
+
124
+ ```bibtex
125
+ @misc{speechbrain,
126
+ title={{SpeechBrain}: A General-Purpose Speech Toolkit},
127
+ author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
128
+ year={2021},
129
+ eprint={2106.04624},
130
+ archivePrefix={arXiv},
131
+ primaryClass={eess.AS},
132
+ note={arXiv:2106.04624}
133
+ }
134
+ ```
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 4096,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 11008,
11
+ "max_position_embeddings": 2048,
12
+ "model_type": "llama",
13
+ "num_attention_heads": 32,
14
+ "num_hidden_layers": 32,
15
+ "pad_token_id": 0,
16
+ "rms_norm_eps": 1e-06,
17
+ "tie_word_embeddings": false,
18
+ "torch_dtype": "float16",
19
+ "transformers_version": "4.29.0.dev0",
20
+ "use_cache": true,
21
+ "vocab_size": 32000
22
+ }
23
+
custom.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """This lobe enables the integration of huggingface pretrained Llama2 Model model plus the expanding embedding layer for additional PAD tokens .
2
+
3
+ Transformer from HuggingFace needs to be installed:
4
+ https://huggingface.co/transformers/installation.html
5
+
6
+ Authors
7
+ * Pooneh Mousavi 2023
8
+ """
9
+
10
+ import logging
11
+ from torch import Tensor
12
+ import torch
13
+ import torch.nn as nn
14
+ from speechbrain.lobes.models.huggingface_transformers.llama2 import LLAMA2
15
+
16
+
17
+ logger = logging.getLogger(__name__)
18
+
19
+
20
+ class LLAMA2_expanded(LLAMA2):
21
+ """This lobe enables the integration of HuggingFace pretrained LLAMA2 model.
22
+ Source paper LLAMA2:
23
+ https://arxiv.org/abs/2307.09288
24
+ Transformer from HuggingFace needs to be installed:
25
+ https://huggingface.co/transformers/installation.html
26
+
27
+ The model can be finetuned. It will download automatically the model from
28
+ HuggingFace or use a local path.
29
+
30
+ Arguments
31
+ ---------
32
+ source : str
33
+ HuggingFace hub name: e.g "meta-llama/Llama-2-7b-chat-hf"
34
+ save_path : str
35
+ Path (dir) of the downloaded model.
36
+ freeze : bool (default: False)
37
+ If True, the model is frozen. If False, the model will be trained
38
+ alongside with the rest of the pipeline.
39
+ Example
40
+ -------
41
+ >>> model_hub = "meta-llama/Llama-2-7b-chat-hf"
42
+ >>> save_path = "savedir"
43
+ >>> model = LLAMA2(model_hub, save_path)
44
+ >>> tokens = torch.tensor([[1, 1]])
45
+ >>> attention_mask = torch.tensor([[1, 1]])
46
+ >>> outputs = model(tokens, attention_mask)
47
+ """
48
+ def __init__(
49
+ self, *args, **kwrds
50
+ ) -> None:
51
+ super().__init__( *args, **kwrds)
52
+ # Load tokenizer and add special tokens
53
+ # # Add special tokens to the tokenizer and resize model embedding
54
+ # Special tokens
55
+
56
+ self.add_special_tokens_(
57
+ {"pad_token": "<pad>"}
58
+ )
59
+
60
+ def add_special_tokens_(self, attr_to_special_token,) -> None:
61
+ orig_num_tokens = len(self.tokenizer)
62
+ num_added_tokens = self.tokenizer.add_special_tokens(
63
+ attr_to_special_token # type: ignore
64
+ ) # doesn't add if they are already there
65
+ if num_added_tokens > 0:
66
+ self.model.resize_token_embeddings(
67
+ new_num_tokens=orig_num_tokens + num_added_tokens
68
+ )
69
+
70
+
hyperparams.yaml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ################################
2
+ # Model: Llama2 dModel + NLL
3
+ # Authors:
4
+ # Pooneh Mousavi 2023
5
+ # ################################
6
+
7
+
8
+ # URL for the gpt2 model
9
+ model_hub: "/local_disk/clytie/hnguyen/speechbrain_recipes/HF_PR/MultiWOZ-Llama2-Response_Generation" #meta-llama/Llama-2-7b-chat-hf
10
+ llama2_folder: recipes/MultiWOZ/response_generation/llama2/results/train_with_llama2/1995/save/llama2_checkpoint/
11
+
12
+
13
+ # history_window, i.e. how many user-system exchanges consider as context.
14
+ max_history: 2
15
+
16
+ # decoder setting
17
+ freeze_model: True
18
+ num_beams: 8
19
+ max_new_tokens: 50
20
+ top_k: 45
21
+ top_p: 0.9
22
+
23
+ #LLAMA2 model
24
+ model: !new:custom.LLAMA2_expanded
25
+ source: !ref <model_hub>
26
+ freeze: !ref <freeze_model>
27
+ save_path: !ref <llama2_folder>
28
+ max_new_tokens: !ref <max_new_tokens>
29
+ num_beams: !ref <num_beams>
30
+ top_k: !ref <top_k>
31
+ top_p: !ref <top_p>
32
+ with_peft: True
33
+
34
+
35
+ # Masks
36
+ padding_mask: !name:speechbrain.lobes.models.transformer.Transformer.get_key_padding_mask
37
+
38
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
39
+ loadables:
40
+ model: !ref <model>
41
+
42
+ modules:
43
+ model: !ref <model>
44
+
llama2_model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78575f6cd2df7db958e2bd696fcd535322664e222a499f740560de8fdf6411a9
3
+ size 4827154534
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<s>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "</s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "model_max_length": 1000000000000000019884624838656,
22
+ "pad_token": null,
23
+ "sp_model_kwargs": {},
24
+ "tokenizer_class": "LlamaTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }