--- title: Repetivec emoji: πŸ“š colorFrom: blue colorTo: blue sdk: gradio sdk_version: 4.25.0 app_file: app.py pinned: false license: apache-2.0 short_description: Generate coherent sentences from .txt files --- # Overview: Repetivec is a tool designed to generate coherent sentences based on a provided starting word and a corpus of text (mainly .txt files). It employs techniques such as data preprocessing, model training, and post-processing to produce decently (or not so great) coherent text outputs. # Model Details: Model Type: Natural Language Processing (NLP) Model Architecture: - Data preprocessing using NLTK for tokenization and normalization. - Word2Vec model training for word embeddings. - Language model training using a Markov chain approach. Input: Starting word and uploaded corpus file. Output: Generated sentence with improved readability and coherence. Techniques Used: - Tokenization: NLTK's word_tokenize function is used to split the text into words. - Word Embeddings: Word2Vec model is trained to generate word embeddings. - Markov Chain: Language model is trained using a Markov chain approach to predict the next word based on the current word. - Post-processing: Generated sentences are post-processed to improve readability, coherence, and grammar. Performance: - Starting Word: "he", "she" - Length: 101 words - Context Window: 4 - Max Context Window: 100 - Blacklist: None - Whitelist: None - Whitelist Weight: 0.1 - Uploaded corpus file: 5MB of MLP fanfiction (Yeah I'm that type of guy and I also had some left over so...) Results: 1. Starting word: "he" Result: "He a mix somehow works differently in prolonged contact and spittle and resting at what those lights.” blade home . Gone beyond with evocator chevalier from two dragon effect becomes that environment having only funny while cami realized his mind onto kayack to darken. Since i counter even. Come into sunny spires were better. ’ re…you ’ sanctuary shortly between kaiba beating the knocked-out goons are spring concluded it off.” 1 red-eyes black eye can flip this item on cotton candy in guards watching paint as defensively between then hit calmer now rested in anabundance, slapping it" 2. Starting word: "she" Result: "She had their professions. It-it didn ’ wrong they absolutely sure her azure scales to depict them thanks again.” if onyx pushed until yami noted with muscle covering his ankles. 'pegasi can access prior has gone now needed someone as reassurance from zexal, knowing sunset cloaking axe on clouds of answers since i needed help prepare himself while, alphan ( disguised human-turned-plus with artifacts with thunder monster or get justice brought that anyway when granny ’ or emotionally.” sweetheart comes back of appearance throughout everyone while severely crippling two ; hearth 's sudden fear i regret" Limitations and Considerations: - Quality: Sometimes the sentences it comes up with might not be the best because it depends on what was put into the database. - Repetitive Phrases: The model may attempt to replace repetitive phrases with alternative phrases, but the effectiveness of it may vary. - Checking: It's always a good idea to double-check the sentences it generates to make sure they make sense and are grammatically correct. Future Improvements: - Better Handling of Repetitive Phrases: Enhancing the algorithm for identifying and replacing repetitive phrases could probably lead to more natural outputs.