Spaces:
Running
Running
Florian Leuerer
commited on
Commit
•
510e903
1
Parent(s):
9631045
README
Browse files
README.md
CHANGED
@@ -10,4 +10,44 @@ pinned: false
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: mit
|
11 |
---
|
12 |
|
13 |
+
# Dataset
|
14 |
+
|
15 |
+
The dataset usesd is https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
|
16 |
+
|
17 |
+
Preprocessing:
|
18 |
+
- filtered german conversations
|
19 |
+
- took first user prompt
|
20 |
+
- deleted short prompts (less than 70 chars)
|
21 |
+
|
22 |
+
```python
|
23 |
+
dataset = load_dataset('lmsys/chatbot_arena_conversations')
|
24 |
+
|
25 |
+
def get_message(x):
|
26 |
+
x['message'] = [x['conversation_a'][0]]
|
27 |
+
return x
|
28 |
+
|
29 |
+
dataset = dataset.filter(lambda x: x['language'] == 'German')
|
30 |
+
dataset = dataset['train'].map(get_message)
|
31 |
+
dataset = dataset.filter(lambda x: len(x['message'][0]['content']) > 70)
|
32 |
+
```
|
33 |
+
|
34 |
+
# Generation
|
35 |
+
|
36 |
+
I rely on the huggingface `conversational` pipeline to generate the outputs. There are some issues with the chat template (esp. for the non-instruction tuned models) i'll fix later.
|
37 |
+
|
38 |
+
```python
|
39 |
+
messages = json.loads(Path('messages.json').read_text())
|
40 |
+
outputs = []
|
41 |
+
pipe = pipeline(
|
42 |
+
"conversational",
|
43 |
+
model=model_name,
|
44 |
+
torch_dtype="auto",
|
45 |
+
device_map=device,
|
46 |
+
max_new_tokens=1024,
|
47 |
+
trust_remote_code=True
|
48 |
+
)
|
49 |
+
|
50 |
+
for message in tqdm(messages):
|
51 |
+
output = pipe([message])
|
52 |
+
outputs.append(output)
|
53 |
+
```
|
app.py
CHANGED
@@ -26,7 +26,7 @@ with gr.Blocks() as iface:
|
|
26 |
drop_model1 = gr.Dropdown(models, label='Model 1', value=random.choice(models))
|
27 |
drop_model2 = gr.Dropdown(models, label='Model 2', value=random.choice(models))
|
28 |
with gr.Row():
|
29 |
-
btn = gr.Button("
|
30 |
with gr.Row():
|
31 |
out_message = gr.TextArea(label='Prompt')
|
32 |
with gr.Row():
|
|
|
26 |
drop_model1 = gr.Dropdown(models, label='Model 1', value=random.choice(models))
|
27 |
drop_model2 = gr.Dropdown(models, label='Model 2', value=random.choice(models))
|
28 |
with gr.Row():
|
29 |
+
btn = gr.Button("Show Outputs")
|
30 |
with gr.Row():
|
31 |
out_message = gr.TextArea(label='Prompt')
|
32 |
with gr.Row():
|