Spaces:

Tymec
/

sentiment-analysis

Sleeping

Tymec commited on May 15

Commit

b43b167

•

2 Parent(s): 85ac990 391bd16

Merge branch 'master' of https://github.com/Tymec/projekt-psi

Files changed (4) hide show

README.md CHANGED Viewed

@@ -12,6 +12,10 @@ Sentiment Analysis
 - [IMDb](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
 - [Amazon Reviews](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews)
 ### TODO
 - [ ] CLI using `click` (commands: predict, train, evaluate) with settings set via flags or environment variables
 - [ ] GUI using `gradio` (tabs: predict, train, evaluate, compare, settings)

 - [IMDb](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
 - [Amazon Reviews](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews)
+### Required tools
+- `just`
+- `poetry`
 ### TODO
 - [ ] CLI using `click` (commands: predict, train, evaluate) with settings set via flags or environment variables
 - [ ] GUI using `gradio` (tabs: predict, train, evaluate, compare, settings)

app/model/__init__.py ADDED Viewed

File without changes

app/model/base.py ADDED Viewed

+from __future__ import annotations
+from abc import ABC, abstractmethod
+from typing import TYPE_CHECKING
+import joblib
+if TYPE_CHECKING:
+    from pathlib import Path
+    from sklearn.pipeline import Pipeline
+class Model(ABC):
+    """Base class for all models"""
+    @property
+    @abstractmethod
+    def pipeline(self) -> Pipeline:
+        """Pipeline used for the model"""
+        ...
+    @property
+    @abstractmethod
+    def description(self) -> str:
+        """Description of the architecture"""
+        ...
+    @abstractmethod
+    def _predict(self, text: str) -> int:
+        """Predict the sentiment of the given text"""
+        ...
+    @staticmethod
+    def from_file(path: Path) -> Model:
+        """Load the model from the given file"""
+        return joblib.load(path)
+    def to_file(self, path: Path) -> None:
+        """Save the model to the given file"""
+        joblib.dump(self, path)
+    def predict(self, text: str) -> int:
+        """Perform sentiment analysis on the given text"""
+        return self._predict(text)
+    def train(self, x: list[str], y: list[int]) -> None:
+        """Train the model on the given data"""
+        self.pipeline.fit(x, y)

app/model/tfid_lr.py ADDED Viewed

+from __future__ import annotations
+from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
+from sklearn.linear_model import LogisticRegression
+from sklearn.pipeline import Pipeline
+from .base import Model
+class TfidfLR(Model):
+    """Sentiment analysis model using TF-IDF and Logistic Regression"""
+    def __init__(self):
+        self._pipeline = Pipeline(
+            [
+                (
+                    "vectorize",
+                    CountVectorizer(stop_words="english", ngram_range=(1, 2), max_features=10000),
+                ),
+                ("tfidf", TfidfTransformer()),
+                ("clf", LogisticRegression(max_iter=1000, random_state=self.rng)),
+            ],
+            memory=self.cache,
+        )
+    @property
+    def pipeline(self) -> Pipeline:
+        return self._pipeline
+    @property
+    def description(self) -> str:
+        return "TF-IDF with Logistic Regression"
+    def _predict(self, text: str) -> int:
+        return self.pipeline.predict([text])[0]