Instructions to use aniket23/news_scraper_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use aniket23/news_scraper_model with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "summarization" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("summarization", model="aniket23/news_scraper_model")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("aniket23/news_scraper_model") model = AutoModelForSeq2SeqLM.from_pretrained("aniket23/news_scraper_model") - Notebooks
- Google Colab
- Kaggle
News Scraper Model β Section-Aware News Summarizer
A fine-tuned facebook/bart-large-cnn
that turns full news articles into ~60-word, Inshorts-style summaries with
section-specific structure: a Crime story leads with who was charged, a
Business story leads with the key number, a Sport story leads with the result.
How to use
This model was trained on prompt-prefixed inputs, so you must wrap the article in the same instruction format used during training (a Tech example):
from transformers import BartForConditionalGeneration, BartTokenizer
tokenizer = BartTokenizer.from_pretrained("aniket23/news_scraper_model")
model = BartForConditionalGeneration.from_pretrained("aniket23/news_scraper_model")
title = "OpenAI launches GPT-5"
text = "OpenAI today unveiled GPT-5, claiming major reasoning improvements ..."
prompt = (
"Summarise as Tech news: Start with the company or product name. Include "
"what was launched, announced, or discovered, key specs or numbers that "
"matter, who is affected, and why it changes the industry or everyday users."
f"\n\nArticle: {title}. {text}"
)
inputs = tokenizer(prompt, max_length=512, truncation=True, return_tensors="pt")
ids = model.generate(
**inputs,
max_new_tokens=110, min_new_tokens=50,
num_beams=4, length_penalty=2.0, early_stopping=True,
)
print(tokenizer.decode(ids[0], skip_special_tokens=True))
The full pipeline β scraping, section classification, and the exact prompt for each of the 20 sections β is on GitHub: AniketMishra23/news_scraper_model
Sections
Crime, Tech, Politics, Business, Science, Sport, Entertainment, Lifestyle, World, Health, Education, Property, Environment, Defence, Travel, Immigration, Law, Economy, Arts, Personal Finance β each with its own summary structure.
Training
- Base model:
facebook/bart-large-cnn - Data: ~700 news articles scraped from 25 RSS feeds, labelled via knowledge-distillation bootstrapping (base BART generates the target summary for each section-prefixed input)
- Hardware: RTX 4060 Laptop GPU (8 GB), fp16, ~10 min
- Selection: best checkpoint by validation ROUGE-L, with early stopping
| Metric (validation) | Value |
|---|---|
| ROUGE-1 / ROUGE-2 / ROUGE-L | 0.61 / 0.50 / 0.56 |
| Average summary length | ~57 words |
ROUGE is measured against the bootstrap labels, so it reflects consistency with the teacher model rather than human-judged quality. Practical strengths: consistent length, section-appropriate structure, complete sentences.
Limitations
- Trained on bootstrap (not human-written) labels β quality ceiling is roughly "base BART, but length-controlled and section-aware".
- Section classification is keyword-based and can misfire on ambiguous articles.
- English news only.
License
MIT (inherited from facebook/bart-large-cnn).
- Downloads last month
- 28
Model tree for aniket23/news_scraper_model
Base model
facebook/bart-large-cnn