Text Generation
Transformers
PyTorch
llama
text-generation-inference
File size: 3,611 Bytes
f3b5ddf
34e27ac
f3b5ddf
 
 
 
 
 
 
 
9a7a174
0854599
f3b5ddf
 
 
 
 
 
 
 
9a7a174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f81511
9a7a174
 
f3b5ddf
 
9a7a174
f3b5ddf
 
 
 
 
0854599
 
9ee08c0
f3b5ddf
 
 
6678023
 
 
8c48b62
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: cc-by-sa-4.0
inference: false
datasets:
- PengQu/langchain-MRKL-finetune
- fnlp/moss-003-sft-data
---


**NOTE: This "delta model" cannot be used directly.**  
Users have to apply it on top of the original LLaMA weights to get actual vicuna-13b-finetuned-langchain-MRKL weights.  
See https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL#model-weights for instructions.

# vicuna-13b-finetuned-langchain-MRKL

## Model details

**Model type:**
vicuna-13b-finetuned-langchain-MRKL is an open-source chatbot trained by fine-tuning vicuna-13b on 15 examples with langchain-MRKL format.

**Model Usage:**

To obtain the correct model, plese run apply_delta.py first.(https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL/blob/main/model/apply_delta.py) See instructions https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL#model-weights

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("path/to/vicuna-13b-finetuned-langchain-MRKL")
model = AutoModelForCausalLM.from_pretrained("path/to/vicuna-13b-finetuned-langchain-MRKL")
model.cuda()

prompt = """Answer the following questions as best you can. You have access to the following tools:

Search: useful for when you need to answer questions about current events
Calculator: useful for when you need to answer questions about math

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Search, Calculator]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: The current age of the President of the United States multiplied by 0.5.
Thought:"""

input_ids = tokenizer(prompt, return_tensors='pt').input_ids.to("cuda")
tokens =  model.generate(input_ids,min_length = 5, max_new_tokens=128,do_sample = True, temperature = 0.7, top_p = 0.9)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
```

output(The tokens after "Thought:"):<br>
```sh
I need to find the current age of the President and then multiply it by 0.5
Action: Search
Action Input: Who is the President of the United States?
```

if you launched a httpserver with the model and installed langchain(https://github.com/hwchase17/langchain), you can modify demo.py with your httpserver's ip&port and your SERPAPI_API_KEY, then run it.(https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL/blob/main/demo.py)<br>
you can also try this by Jupyter Notebook. https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL/blob/main/demo.ipynb

**Where to send questions or comments about the model:**

https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL/issues

## Training dataset
train only one epoch on mix data (sharegpt + 32*my.json + moss-003-sft-data)

## Evaluation
- demo for langchain-MRKL: https://github.com/rinnakk/vicuna-13b-delta-finetuned-langchain-MRKL/blob/main/demo.ipynb
- No evaluation set. Because we don't think we improved the ability of model. we just make model fit langchain-MRKL strictly.
- We just want to show vicuna-13b's powerful ability about thinking and action.

## Major Improvement
- support langchain-MRKL(agent= "zero-shot-react-description")
- very fast because of stritcly format(it doesn't generate redundant tokens)

## Author
Qu Peng (https://huggingface.co/PengQu)