RichardErkhov commited on
Commit
94fc084
β€’
1 Parent(s): 34f11af

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ bloom-560m-finetuned-totto-table-to-text - bnb 8bits
11
+ - Model creator: https://huggingface.co/Narrativaai/
12
+ - Original model: https://huggingface.co/Narrativaai/bloom-560m-finetuned-totto-table-to-text/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ language:
20
+ - en
21
+ tags:
22
+ - table-to-text
23
+ - tabular
24
+ - Narratable
25
+ datasets:
26
+ - totto
27
+ widget:
28
+ - text: "<s><page_title> John Higgins </page_title> <section_title> Minor-ranking finals: 6 (3 titles, 3 runners-up) </section_title> <table> <row> <c> Outcome </c> <c> No. <row_header> Outcome </row_header> </c> <c> Year <row_header> Outcome </row_header> <row_header> No. </row_header> </c> <c> Championship <row_header> Outcome </row_header> <row_header> No. </row_header> <row_header> Year </row_header> </c> <c> Opponent in the final <row_header> Outcome </row_header> <row_header> No. </row_header> <row_header> Year </row_header> <row_header> Championship </row_header> </c> <c> Score <row_header> Outcome </row_header> <row_header> No. </row_header> <row_header> Year </row_header> <row_header> Championship </row_header> <row_header> Opponent in the final </row_header> </c> </row> <row> <c> Winner <col_header> Outcome </col_header> </c> <c> 1. <col_header> No. </col_header> </c> <c> 2010 <col_header> Year </col_header> </c> <c> Ruhr Championship <col_header> Championship </col_header> </c> <c> England Shaun Murphy <col_header> Opponent in the final </col_header> </c> <c> 4–2 <col_header> Score </col_header> </c> </row> <row> <c> Runner-up <col_header> Outcome </col_header> </c> <c> 1. <col_header> No. </col_header> </c> <c> 2010 <col_header> Year </col_header> </c> <c> Prague Classic <col_header> Championship </col_header> </c> <c> England Michael Holt <col_header> Opponent in the final </col_header> </c> <c> 3–4 <col_header> Score </col_header> </c> </row> <row> <c> Runner-up <col_header> Outcome </col_header> </c> <c> 2. <col_header> No. </col_header> </c> <c> 2011 <col_header> Year </col_header> </c> <c> Players Tour Championship – Event 5 <col_header> Championship </col_header> </c> <c> England Andrew Higginson <col_header> Opponent in the final </col_header> </c> <c> 1–4 <col_header> Score </col_header> </c> </row> <row> <c> Winner <col_header> Outcome </col_header> </c> <c> 2. <col_header> No. </col_header> </c> <c> 2012 <col_header> Year </col_header> </c> <c> Kay Suzanne Memorial Trophy <col_header> Championship </col_header> </c> <c> England Judd Trump <col_header> Opponent in the final </col_header> </c> <c> 4–2 <col_header> Score </col_header> </c> </row> <row> <c> Runner-up <col_header> Outcome </col_header> </c> <c> 3. <col_header> No. </col_header> </c> <c> 2012 <col_header> Year </col_header> </c> <c> Bulgarian Open <col_header> Championship </col_header> </c> <c> England Judd Trump <col_header> Opponent in the final </col_header> </c> <c> 0–4 <col_header> Score </col_header> </c> </row> <row> <highlighted_cell> Winner <col_header> Outcome </col_header> </highlighted_cell> <c> 3. <col_header> No. </col_header> </c> <highlighted_cell> 2013 <col_header> Year </col_header> </highlighted_cell> <highlighted_cell> Bulgarian Open <col_header> Championship </col_header> </highlighted_cell> <highlighted_cell> Australia Neil Robertson <col_header> Opponent in the final </col_header> </highlighted_cell> <highlighted_cell> 4–1 <col_header> Score </col_header> </highlighted_cell> </row> </table>\n\n"
29
+
30
+ inference:
31
+ parameters:
32
+ max_length: 500
33
+
34
+ ---
35
+
36
+ # BLOOM (0.56B) fine-tuned on ToTTo for Table-to-text πŸ“‹ ➑️ πŸ”€ aka NARRATABLE
37
+
38
+ This model is a fine-tuned version of [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) on the **ToTTo** [dataset](https://huggingface.co/datasets/totto).
39
+
40
+
41
+ ## The model 🧠
42
+
43
+ It is a 560M params version of [**BLOOM** 🌸](https://bigscience.huggingface.co/blog/bloom)
44
+
45
+ ## The dataset πŸ“š
46
+
47
+ **ToTTo** is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description.
48
+
49
+ During the dataset creation process, tables from English Wikipedia are matched with (noisy) descriptions. Each table cell mentioned in the description is highlighted and the descriptions are iteratively cleaned and corrected to faithfully reflect the content of the highlighted cells.
50
+
51
+
52
+ ### Evaluation results
53
+
54
+ | Metric | Value |
55
+ |:-------:|:-----:|
56
+ | rouge1 | 0.56 |
57
+ | rouge2 | 0.33 |
58
+ | rougeL | 0.48 |
59
+ | rougeLsum | 0.48 |
60
+ | sacrebleu | 20.87 |
61
+ | meteor | 0.49 |
62
+
63
+
64
+ ## Usage
65
+
66
+ ```py
67
+ from datasets import load_dataset
68
+ from transformers import BloomTokenizerFast, BloomForCausalLM
69
+
70
+ valid_dataset = load_dataset('totto', split='validation')
71
+
72
+ from preprocess import preprocess # This file is included in the repo
73
+
74
+ # Now we linearize the tables
75
+ valid_dataset = valid_dataset.map(preprocess)
76
+
77
+ model_ckpt = "mrm8488/bloom-560m-finetuned-totto-table-to-text"
78
+
79
+ tokenizer = BloomTokenizerFast.from_pretrained(ckpt)
80
+ model = BloomForCausalLM.from_pretrained(ckpt).to("cuda")
81
+
82
+
83
+ def explain_hl_cells(text):
84
+ inputs = tokenizer(text, return_tensors='pt')
85
+ input_ids = inputs.input_ids.to("cuda")
86
+ attention_mask = inputs.attention_mask.to("cuda")
87
+ output = model.generate(input_ids, attention_mask=attention_mask, max_length=2048, eos_token_id=tokenizer.eos_token_id)
88
+
89
+ return tokenizer.decode(output[0], skip_special_tokens=False)
90
+
91
+ example = valid_dataset[1]
92
+
93
+ print(explain_hl_cells(example['linearized_table'])
94
+ ```
95
+
96
+ <video loop="" autoplay="" controls="" src="https://huggingface.co/Narrativaai/bloom-560m-finetuned-totto-table-to-text/resolve/main/video_totto.mp4"></video>
97
+
98
+
99
+ ### Framework versions
100
+
101
+ - Transformers 4.21.2
102
+ - Pytorch 1.12.1+cu113
103
+ - Datasets 2.4.0
104
+ - Tokenizers 0.12.1
105
+
106
+
107
+ Created by: [Narrativa](https://www.narrativa.com/)
108
+
109
+ #### About **Narrativa**:
110
+
111
+ Narrativa is an internationally recognized **content services company** that uses its proprietary **artificial intelligence** and **machine learning** platforms to build and deploy **digital content solutions** for enterprises. Its technology suite, consisting of data extraction, data analysis, natural language processing (NLP) and natural language generation (NLG) tools, all seamlessly work together to power a lineup of smart content creation, automated business intelligence reporting and process optimization products for a variety of industries.
112
+ Contact us to learn more about our solutions!
113
+