Post
420
On the 2nd of October a really cool paper was released called "Were RNNs all we need" https://arxiv.org/abs/2410.01201
This paper introduces the MinGRU model, a simplified version of the traditional Gated Recurrent Unit (GRU) designed to enhance efficiency by removing hidden state dependencies from its gates. This allows for parallel training, making it significantly faster than conventional GRUs. Additionally, MinGRU eliminates non-linear activations like tanh, streamlining computations.
So I read the paper and I tried training this model and it seems to be doing quite well , you could check out the pre-trained model on the huggingface spaces
- damerajee/mingru-stories
This paper introduces the MinGRU model, a simplified version of the traditional Gated Recurrent Unit (GRU) designed to enhance efficiency by removing hidden state dependencies from its gates. This allows for parallel training, making it significantly faster than conventional GRUs. Additionally, MinGRU eliminates non-linear activations like tanh, streamlining computations.
So I read the paper and I tried training this model and it seems to be doing quite well , you could check out the pre-trained model on the huggingface spaces
- damerajee/mingru-stories