
schuler/experimental-JP47D62B
Text Generation
ā¢
Updated
ā¢
4
āāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Input Layer ā
āāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Token & Positional ā
ā Embedding ā
āāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā 12x Transformer ā
ā Blocks ā
ā - 12 heads ā
ā - 768 hidden dims ā
ā - 3072 intermediate ā
āāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā Output Layer ā
āāāāāāāāāāāāāāāāāāāāāāāāāāā
for CntLayer := 1 to {Layers=}12 do
begin
Result.AddTransformerBlockCAI(
{Heads=}12,
{intermediate dimensions=}4*768,
{NoForward=}true,
{HasNorm=}true,
false
);
end;
In the case that you run into any roadblock at modifying an existing model with this optimization so you can train the optimized model from scratch, please feel free to ask for help.