sumedh
/

lstm-seq2seq

Model card Files Files and versions Metrics Training metrics Community

lstm-seq2seq / README.md

sumedh's picture

Update README.md

5e2b3c2 over 2 years ago

|

2.81 kB

	---
	library_name: keras
	license: apache-2.0
	tags:
	- seq2seq
	- translation
	language:
	- en
	- fr
	---

	## Keras Implementation of Character-level recurrent sequence-to-sequence model

	This repo contains the model and the notebook [to this Keras example on Character-level recurrent sequence-to-sequence model](https://keras.io/examples/nlp/lstm_seq2seq/).

	Full credits to : [fchollet](https://twitter.com/fchollet)

	Model reproduced by : [Sumedh](https://huggingface.co/sumedh)

	## Intended uses & limitations

	This model implements a basic character-level recurrent sequence-to-sequence network for translating short English sentences into short French sentences, character-by-character. Note that it is fairly unusual to do character-level machine translation, as word-level models are more common in this domain. It works best on text of length <= 15 characters.

	## Training and evaluation data
	English to French translation data from
	https://www.manythings.org/anki/

	## Training procedure
	- We start with input sequences from a domain (e.g. English sentences) and corresponding target sequences from another domain (e.g. French sentences).
	- An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs).
	- A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one timestep in the future, a training process called "teacher forcing" in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence.
	- In inference mode, when we want to decode unknown input sequences, we: - Encode the input sequence into state vectors - Start with a target sequence of size 1 (just the start-of-sequence character) - Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character - Sample the next character using these predictions (we simply use argmax). - Append the sampled character to the target sequence - Repeat until we generate the end-of-sequence character or we hit the character limit.

	### Training hyperparameters

	The following hyperparameters were used during training:

	\| name \| learning_rate \| decay \| rho \| momentum \| epsilon \| centered \| training_precision \|
	\|----\|-------------\|-----\|---\|--------\|-------\|--------\|------------------\|
	\|RMSprop\|0.0010000000474974513\|0.0\|0.8999999761581421\|0.0\|1e-07\|False\|float32\|

	```python
	batch_size = 64 # Batch size for training.
	epochs = 100 # Number of epochs to train for.
	latent_dim = 256 # Latent dimensionality of the encoding space.
	num_samples = 10000 # Number of samples to train on.
	```

	## Model Plot

	<details>
	<summary>View Model Plot</summary>

	![Model Image](./model.png)

	</details>