--- library_name: transformers inference: false license: cc-by-sa-4.0 base_model: - nqzfaizal77ai/swiftstrike-aero-init-580m --- **Swiftstrike Aero Model (Falcon Pruned Model)** This model is a fine-tuned version of the Swiftstrike Aero Model, specifically tailored for context-aware keyword searches related to culture. It's designed to process 1-block contexts, equivalent to approximately 384 tokens or a single paragraph of wikipedia standard(common length paragraph). **Training Data (Part 1 Culture Context Wikipedia)** The model was trained on a multi-stage dataset derived from Wikipedia's culture-related content: 1. **Base Dataset:** - 13,000 rows of capitalized and lowercase words extracted from Wikipedia's culture sentences. 2. **Sentence-Level Dataset:** - 2,300 rows of full sentences from Wikipedia's culture data. 3. **1-Block Context Dataset:** - 500 rows of 1-block contexts (approximately 1 paragraph) from Wikipedia's culture data. **Dataset Organization** The dataset is structured hierarchically, with each level representing an increasing level of complexity: 1. **Part:** Individual components or elements. 2. **Merge Part:** Combination of two or more parts. 3. **Fragment:** Combination of two or more merge parts. 4. **Sub-Unit:** Combination of two or more fragments. 5. **Unit:** Combination of two or more sub-units. 6. **Super-Unit:** Combination of two or more units. 7. **Mega-Unit:** Combination of two or more super-units. **How to Use** ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from IPython.display import display, HTML # prompt: load model and generate example model_name = "nqzfaizal77ai/sa-145m-en-wikipedia-culture-part1-1bc" model = AutoModelForCausalLM.from_pretrained(model_name,trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True) torch.manual_seed(3077) input_text = "The cultural impact of the internet is" inputs = tokenizer(input_text, return_tensors="pt") # Example usage stochastic decode output = model.generate(**inputs, do_sample=True, top_k=50, top_p=0.95, repetition_penalty=1.2, max_length=100) # Decode the generated output to a string generated_text = tokenizer.decode(output[0], skip_special_tokens=True).replace("\n", "
") def print_with_border(text): """Prints the given text with a border around it.""" from IPython.display import display, HTML display(HTML(f"
{text}
")) print_with_border(generated_text) # Example usage greedy decode output = model.generate(**inputs, do_sample=False, max_length=100) # Decode the generated output to a string generated_text = tokenizer.decode(output[0], skip_special_tokens=True).replace("\n", "
") def print_with_border(text): """Prints the given text with a border around it.""" from IPython.display import display, HTML display(HTML(f"
{text}
")) print_with_border(generated_text) ```