--- license: mit language: - en tags: - gpu --- # Text Summarization Model with Seq2Seq and LSTM This model is a sequence-to-sequence (seq2seq) model for text summarization. It uses a bidirectional LSTM encoder and an LSTM decoder to generate summaries from input articles. The model was trained on a dataset with sequences of length up to 800 tokens. ## Dataset CNN-DailyMail News Text Summarization from kaggle ## Model Architecture ### Encoder - **Input Layer:** Takes input sequences of length `max_len_article`. - **Embedding Layer:** Converts input sequences into dense vectors of size 100. - **Bidirectional LSTM Layer:** Processes the embedded input, capturing dependencies in both forward and backward directions. Outputs hidden and cell states from both directions. - **State Concatenation:** Combines forward and backward hidden and cell states to form the final encoder states. ### Decoder - **Input Layer:** Takes target sequences of variable length. - **Embedding Layer:** Converts target sequences into dense vectors of size 100. - **LSTM Layer:** Processes the embedded target sequences using an LSTM with the initial states set to the encoder states. - **Dense Layer:** Applies a Dense layer with softmax activation to generate the probabilities for each word in the vocabulary. ### Model Summary | Layer (type) | Output Shape | Param # | Connected to | |-----------------------|---------------------|-------------|-----------------------------| | input_1 (InputLayer) | [(None, 800)] | 0 | - | | embedding (Embedding) | (None, 800, 100) | 47,619,900 | input_1[0][0] | | bidirectional | [(None, 200), | 160,800 | embedding[0][0] | | (Bidirectional) | (None, 100), | | | | | (None, 100), | | | | | (None, 100), | | | | | (None, 100)] | | | | input_2 (InputLayer) | [(None, None)] | 0 | - | | embedding_1 | (None, None, 100) | 15,515,800 | input_2[0][0] | | (Embedding) | | | | | concatenate | (None, 200) | 0 | bidirectional[0][1] | | (Concatenate) | | | bidirectional[0][3] | | concatenate_1 | (None, 200) | 0 | bidirectional[0][2] | | (Concatenate) | | | bidirectional[0][4] | | lstm | [(None, None, 200), | 240,800 | embedding_1[0][0] | | (LSTM) | (None, 200), | | concatenate[0][0] | | | (None, 200)] | | concatenate_1[0][0] | | dense (Dense) | (None, None, 155158)| 31,186,758 | lstm[0][0] | | | | | | Total params: 94,724,060 Trainable params: 94,724,058 Non-trainable params: 0 ## Training The model was trained on a dataset with sequences of length up to 800 tokens using the following configuration: - **Optimizer:** Adam - **Loss Function:** Categorical Crossentropy - **Metrics:** Accuracy ### Training Loss and Validation Loss | Epoch | Training Loss | Validation Loss | Time per Epoch (s) | |-------|---------------|-----------------|--------------------| | 1 | 3.9044 | 0.4543 | 3087 | | 2 | 0.3429 | 0.0976 | 3091 | | 3 | 0.1054 | 0.0427 | 3096 | | 4 | 0.0490 | 0.0231 | 3099 | | 5 | 0.0203 | 0.0148 | 3098 | ### Test Loss | Test Loss | |----------------------| | 0.014802712015807629 | ## Usage -- I will update this soon To use this model, you can load it using the Hugging Face Transformers library: ```python from transformers import TFAutoModel model = TFAutoModel.from_pretrained('your-model-name') from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained('your-model-name') model = TFAutoModelForSeq2SeqLM.from_pretrained('your-model-name') article = "Your input text here." inputs = tokenizer.encode("summarize: " + article, return_tensors="tf", max_length=800, truncation=True) summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) print(summary)