Jiahuita commited on
Commit
60dc372
·
1 Parent(s): aef9aa2

initial commit

Browse files
Files changed (6) hide show
  1. Dockerfile +22 -0
  2. README copy.md +97 -0
  3. app.py +87 -0
  4. news_classifier.h5 +3 -0
  5. requirements.txt +7 -0
  6. tokenizer.json +0 -0
Dockerfile ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9-slim
2
+
3
+ WORKDIR /code
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ build-essential \
8
+ curl \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Copy requirements first to leverage Docker cache
12
+ COPY requirements.txt .
13
+ RUN pip install --no-cache-dir -r requirements.txt
14
+
15
+ # Copy the rest of the application
16
+ COPY . .
17
+
18
+ # Expose the port the app runs on
19
+ EXPOSE 7860
20
+
21
+ # Command to run the application
22
+ CMD ["python", "-m", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README copy.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: News Source Classifier
3
+ emoji: 📰
4
+ colorFrom: blue
5
+ colorTo: red
6
+ sdk: fastapi
7
+ sdk_version: 0.95.2
8
+ app_file: app.py
9
+ pinned: false
10
+ language: en
11
+ license: mit
12
+ tags:
13
+ - text-classification
14
+ - news-classification
15
+ - LSTM
16
+ - tensorflow
17
+ pipeline_tag: text-classification
18
+ widget:
19
+ - example_title: "Crime News Headline"
20
+ text: "Wife of murdered Minnesota pastor hired 3 men to kill husband after affair: police"
21
+ - example_title: "Science News Headline"
22
+ text: "Scientists discover breakthrough in renewable energy research"
23
+ - example_title: "Political News Headline"
24
+ text: "Presidential candidates face off in heated debate over climate policies"
25
+ model-index:
26
+ - name: News Source Classifier
27
+ results:
28
+ - task:
29
+ type: text-classification
30
+ name: Text Classification
31
+ dataset:
32
+ name: Custom Dataset
33
+ type: Custom
34
+ metrics:
35
+ - name: Accuracy
36
+ type: accuracy
37
+ value: 0.82
38
+ ---
39
+
40
+ # News Source Classifier
41
+
42
+ This model classifies news headlines as either Fox News or NBC News using an LSTM neural network.
43
+
44
+ ## Model Description
45
+
46
+ - **Model Architecture**: LSTM Neural Network
47
+ - **Input**: News headlines (text)
48
+ - **Output**: Binary classification (Fox News vs NBC)
49
+ - **Training Data**: Large collection of headlines from both news sources
50
+ - **Performance**: Achieves approximately 82% accuracy on the test set
51
+
52
+ ## Usage
53
+
54
+ You can use this model directly with a FastAPI endpoint:
55
+
56
+ ```python
57
+ import requests
58
+
59
+ response = requests.post(
60
+ "https://huggingface.co/Jiahuita/NewsSourceClassification",
61
+ json={"text": "Your news headline here"}
62
+ )
63
+ print(response.json())
64
+ ```
65
+
66
+ Or use it locally:
67
+
68
+ ```python
69
+ from transformers import pipeline
70
+
71
+ classifier = pipeline("text-classification", model="Jiahuita/NewsSourceClassification")
72
+ result = classifier("Your news headline here")
73
+ print(result)
74
+ ```
75
+
76
+ Example response:
77
+ ```json
78
+ {
79
+ "label": "foxnews",
80
+ "score": 0.875
81
+ }
82
+ ```
83
+
84
+ ## Limitations and Bias
85
+
86
+ This model has been trained on news headlines from specific sources and time periods, which may introduce certain biases. Users should be aware of these limitations when using the model.
87
+
88
+ ## Training
89
+
90
+ The model was trained using:
91
+ - TensorFlow 2.13.0
92
+ - LSTM architecture
93
+ - Binary cross-entropy loss
94
+ - Adam optimizer
95
+
96
+ ## License
97
+ This project is licensed under the MIT License.
app.py ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException
2
+ from pydantic import BaseModel
3
+ from tensorflow.keras.models import load_model
4
+ from tensorflow.keras.preprocessing.text import tokenizer_from_json
5
+ from tensorflow.keras.preprocessing.sequence import pad_sequences
6
+ import numpy as np
7
+ import json
8
+ from typing import Union, List
9
+
10
+ app = FastAPI()
11
+
12
+ # Global variables for model and tokenizer
13
+ model = None
14
+ tokenizer = None
15
+
16
+ def load_model_and_tokenizer():
17
+ global model, tokenizer
18
+ try:
19
+ model = load_model('news_classifier.h5')
20
+ with open('tokenizer.json', 'r') as f:
21
+ tokenizer_data = json.load(f)
22
+ tokenizer = tokenizer_from_json(tokenizer_data)
23
+ except Exception as e:
24
+ print(f"Error loading model or tokenizer: {str(e)}")
25
+ raise e
26
+
27
+ # Load on startup
28
+ load_model_and_tokenizer()
29
+
30
+ class PredictionInput(BaseModel):
31
+ text: Union[str, List[str]]
32
+
33
+ class PredictionOutput(BaseModel):
34
+ label: str
35
+ score: float
36
+
37
+ @app.get("/")
38
+ def read_root():
39
+ return {
40
+ "message": "News Source Classifier API",
41
+ "model_type": "LSTM",
42
+ "version": "1.0",
43
+ "status": "ready" if model and tokenizer else "not_loaded"
44
+ }
45
+
46
+ @app.post("/predict", response_model=Union[PredictionOutput, List[PredictionOutput]])
47
+ async def predict(input_data: PredictionInput):
48
+ if not model or not tokenizer:
49
+ try:
50
+ load_model_and_tokenizer()
51
+ except Exception as e:
52
+ raise HTTPException(status_code=500, detail="Model not loaded")
53
+
54
+ try:
55
+ # Handle both single string and list inputs
56
+ texts = input_data.text if isinstance(input_data.text, list) else [input_data.text]
57
+
58
+ # Preprocess
59
+ sequences = tokenizer.texts_to_sequences(texts)
60
+ padded = pad_sequences(sequences, maxlen=41) # Match your model's input length
61
+
62
+ # Get predictions
63
+ predictions = model.predict(padded, verbose=0)
64
+
65
+ # Process results
66
+ results = []
67
+ for pred in predictions:
68
+ label = "foxnews" if pred[1] > 0.5 else "nbc"
69
+ score = float(pred[1] if label == "foxnews" else 1 - pred[1])
70
+ results.append({
71
+ "label": label,
72
+ "score": score
73
+ })
74
+
75
+ # Return single result if input was single string
76
+ return results[0] if isinstance(input_data.text, str) else results
77
+
78
+ except Exception as e:
79
+ raise HTTPException(status_code=500, detail=str(e))
80
+
81
+ @app.post("/reload")
82
+ async def reload_model():
83
+ try:
84
+ load_model_and_tokenizer()
85
+ return {"message": "Model reloaded successfully"}
86
+ except Exception as e:
87
+ raise HTTPException(status_code=500, detail=str(e))
news_classifier.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9258ee4d92199555974374b569634e73ad0d2b059d3b7125f3b75c2144528f4
3
+ size 117315152
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ tensorflow>=2.10.0
2
+ fastapi>=0.68.0
3
+ uvicorn>=0.15.0
4
+ pydantic>=1.8.2
5
+ numpy>=1.19.2
6
+ python-multipart
7
+ scikit-learn>=0.24.2
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff