theblackcat102
/

mt0-chat-large

Text2Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

theblackcat102 commited on Jan 29, 2023

Commit

f1bb80a

•

1 Parent(s): ef72ee3

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-license: apache-2.0
 ---
-A asymmetric version of bigscience's mt0 large model, which trim down like [Meena chatbot](https://arxiv.org/pdf/2001.09977.pdf) from Google. An interesting aspect of Meena is that they use a small encoder, big decoder architecture.
 > Meena has a single Evolved Transformer encoder block and 13 Evolved Transformer decoder blocks, as illustrated below. The encoder is responsible for processing the conversation context to help Meena understand what has already been said in the conversation. The decoder then uses that information to formulate an actual response. Through tuning the hyper-parameters, we discovered that a more powerful decoder was the key to higher conversational quality.
@@ -57,4 +57,4 @@ input :こんにちは!お元気</s>
 trimmed output : !,
 ```
-note that it's impossible to have the performance of the orignal model since roughly 30% of the weights were trimmed away.

 ---
+license: mit
 ---
+A asymmetric version of [bigscience's mt0 large model](https://huggingface.co/bigscience/mt0-large), which trim down like [Meena chatbot](https://arxiv.org/pdf/2001.09977.pdf) from Google. An interesting aspect of Meena is that they use a small encoder, big decoder architecture.
 > Meena has a single Evolved Transformer encoder block and 13 Evolved Transformer decoder blocks, as illustrated below. The encoder is responsible for processing the conversation context to help Meena understand what has already been said in the conversation. The decoder then uses that information to formulate an actual response. Through tuning the hyper-parameters, we discovered that a more powerful decoder was the key to higher conversational quality.
 trimmed output : !,
 ```
+note that it's impossible to have the performance of the orignal model since roughly 30% of the weights were trimmed away.