Spaces:

Pujan-Dev
/

AI_API

Running

App Files Files Community

Pujan-Dev commited on Jun 4, 2025

Commit

c470154

1 Parent(s): ad95b6b

push

Browse files

Files changed (8) hide show

docs/api_endpoints.md +75 -0
docs/deployment.md +105 -0
docs/functions.md +53 -0
docs/nestjs_integration.md +82 -0
docs/security.md +9 -0
docs/setup.md +23 -0
docs/structure.md +54 -0
readme.md +11 -320

docs/api_endpoints.md ADDED Viewed

	@@ -0,0 +1,75 @@

+# 🧩 API Endpoints
+### English (GPT-2) - `/text/`
+| Endpoint                         | Method | Description                               |
+| --------------------------------- | ------ | ----------------------------------------- |
+| `/text/analyse`                  | POST   | Classify raw English text                 |
+| `/text/analyse-sentences`        | POST   | Sentence-by-sentence breakdown            |
+| `/text/analyse-sentance-file`    | POST   | Upload file, per-sentence breakdown       |
+| `/text/upload`                   | POST   | Upload file for overall classification    |
+| `/text/health`                   | GET    | Health check                             |
+#### Example: Classify English text
+```bash
+curl -X POST http://localhost:8000/text/analyse \
+  -H "Authorization: Bearer <SECRET_TOKEN>" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "This is a sample text for analysis."}'
+```
+**Response:**
+```json
+{
+  "result": "AI-generated",
+  "perplexity": 55.67,
+  "ai_likelihood": 66.6
+}
+```
+#### Example: File upload
+```bash
+curl -X POST http://localhost:8000/text/upload \
+  -H "Authorization: Bearer <SECRET_TOKEN>" \
+  -F 'file=@yourfile.txt;type=text/plain'
+```
+---
+### Nepali (SentencePiece) - `/NP/`
+| Endpoint                         | Method | Description                               |
+| --------------------------------- | ------ | ----------------------------------------- |
+| `/NP/analyse`                    | POST   | Classify Nepali text                      |
+| `/NP/analyse-sentences`          | POST   | Sentence-by-sentence breakdown            |
+| `/NP/upload`                     | POST   | Upload Nepali PDF for classification      |
+| `/NP/file-sentences-analyse`     | POST   | PDF upload, per-sentence breakdown        |
+| `/NP/health`                     | GET    | Health check                             |
+#### Example: Nepali text classification
+```bash
+curl -X POST http://localhost:8000/NP/analyse \
+  -H "Authorization: Bearer <SECRET_TOKEN>" \
+  -H "Content-Type: application/json" \
+  -d '{"text": "यो उदाहरण वाक्य हो।"}'
+```
+**Response:**
+```json
+{
+  "label": "Human",
+  "confidence": 98.6
+}
+```
+#### Example: Nepali PDF upload
+```bash
+curl -X POST http://localhost:8000/NP/upload \
+  -H "Authorization: Bearer <SECRET_TOKEN>" \
+  -F 'file=@NepaliText.pdf;type=application/pdf'
+```

docs/deployment.md ADDED Viewed

	@@ -0,0 +1,105 @@

+#  Deployment
+This project is containerized and deployed on **Hugging Face Spaces** using a custom `Dockerfile`. This guide explains the structure of the Dockerfile and key considerations for deploying FastAPI apps on Spaces with Docker SDK.
+---
+## 📦 Base Image
+```dockerfile
+FROM python:3.9
+````
+We use the official Python 3.9 image for compatibility and stability across most Python libraries and tools.
+---
+## 👤 Create a Non-Root User
+```dockerfile
+RUN useradd -m -u 1000 user
+USER user
+ENV PATH="/home/user/.local/bin:$PATH"
+```
+* Hugging Face Spaces **requires** that containers run as a non-root user with UID `1000`.
+* We also prepend the user's local binary path to `PATH` for Python package accessibility.
+---
+## 🗂️ Set Working Directory
+```dockerfile
+WORKDIR /app
+```
+All application files will reside under `/app` for consistency and clarity.
+---
+## 📋 Install Dependencies
+```dockerfile
+COPY --chown=user ./requirements.txt requirements.txt
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+```
+* Copies the dependency list with correct file ownership.
+* Uses `--no-cache-dir` to reduce image size.
+* Ensures the latest compatible versions are installed.
+---
+## 🔡 Download Language Model (Optional)
+```dockerfile
+RUN python -m spacy download en_core_web_sm || echo "Failed to download model"
+```
+* Downloads the small English NLP model required by SpaCy.
+* Uses `|| echo ...` to prevent build failure if the download fails (optional safeguard).
+---
+## 📁 Copy Project Files
+```dockerfile
+COPY --chown=user . /app
+```
+Copies the entire project source into the container, setting correct ownership for Hugging Face's user-based execution.
+---
+## 🌐 Start the FastAPI Server
+```dockerfile
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
+```
+* Launches the FastAPI app using `uvicorn`.
+* **Port 7860 is mandatory** for Docker-based Hugging Face Spaces deployments.
+* `app:app` refers to the `FastAPI()` instance in `app.py`.
+---
+## ✅ Deployment Checklist
+* [x] Ensure your main file is named `app.py` or adjust `CMD` accordingly.
+* [x] All dependencies should be listed in `requirements.txt`.
+* [x] If using models like SpaCy, verify they are downloaded or bundled.
+* [x] Test your Dockerfile locally with `docker build` before pushing to Hugging Face.
+---
+## 📚 References
+* Hugging Face Docs: [Spaces Docker SDK](https://huggingface.co/docs/hub/spaces-sdks-docker)
+* Uvicorn Docs: [https://www.uvicorn.org/](https://www.uvicorn.org/)
+* SpaCy Models: [https://spacy.io/models](https://spacy.io/models)
+---
+Happy deploying!
+**P.S.** Try not to break stuff. 😅

docs/functions.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# Major  Functions used
+## in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)
+- **`load_model()`**
+  Loads the GPT-2 model and tokenizer from the specified directory paths.
+- **`lifespan()`**
+  Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
+- **`classify_text_sync()`**
+  Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
+- **`classify_text()`**
+  Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
+- **`analyze_text()`**
+  **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.
+- **`health()`**
+  **GET** endpoint: Simple health check for API liveness.
+- **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
+  Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.
+- **`warmup()`**
+  Downloads the model repository and initializes the model/tokenizer using `load_model()`.
+- **`download_model_repo()`**
+  Downloads the model files from the designated `MODEL` folder.
+- **`get_model_tokenizer()`**
+  Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
+- **`handle_file_upload()`**
+  Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.
+- **`extract_file_contents()`**
+  Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
+- **`handle_file_sentence()`**
+  Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
+- **`handle_sentence_level_analysis()`**
+  Checks/strips each sentence, then computes AI/human likelihood for each.
+- **`analyze_sentences()`**
+  Splits paragraphs into sentences, classifies each, and returns all results.
+- **`analyze_sentence_file()`**
+  Like `handle_file_sentence()`—analyzes sentences in uploaded files.
+## for image_classifier

docs/nestjs_integration.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# Nestjs + fastapi
+You can easily call this API from a NestJS microservice.
+**.env**
+```env
+FASTAPI_BASE_URL=http://localhost:8000
+SECRET_TOKEN=your_secret_token_here
+```
+**fastapi.service.ts**
+```typescript
+import { Injectable } from "@nestjs/common";
+import { HttpService } from "@nestjs/axios";
+import { ConfigService } from "@nestjs/config";
+import { firstValueFrom } from "rxjs";
+@Injectable()
+export class FastAPIService {
+  constructor(
+    private http: HttpService,
+    private config: ConfigService,
+  ) {}
+  async analyzeText(text: string) {
+    const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
+    const token = this.config.get("SECRET_TOKEN");
+    const response = await firstValueFrom(
+      this.http.post(
+        url,
+        { text },
+        {
+          headers: {
+            Authorization: `Bearer ${token}`,
+          },
+        },
+      ),
+    );
+    return response.data;
+  }
+}
+```
+**app.module.ts**
+```typescript
+import { Module } from "@nestjs/common";
+import { ConfigModule } from "@nestjs/config";
+import { HttpModule } from "@nestjs/axios";
+import { AppController } from "./app.controller";
+import { FastAPIService } from "./fastapi.service";
+@Module({
+  imports: [ConfigModule.forRoot(), HttpModule],
+  controllers: [AppController],
+  providers: [FastAPIService],
+})
+export class AppModule {}
+```
+**app.controller.ts**
+```typescript
+import { Body, Controller, Post, Get } from '@nestjs/common';
+import { FastAPIService } from './fastapi.service';
+@Controller()
+export class AppController {
+  constructor(private readonly fastapiService: FastAPIService) {}
+  @Post('analyze-text')
+  async callFastAPI(@Body('text') text: string) {
+    return this.fastapiService.analyzeText(text);
+  }
+  @Get()
+  getHello(): string {
+    return 'NestJS is connected to FastAPI';
+  }
+}
+```

docs/security.md ADDED Viewed

	@@ -0,0 +1,9 @@

+# Security: Bearer Token Auth
+All endpoints require authentication via Bearer token:
+- Set `SECRET_TOKEN` in `.env`
+- Add header: `Authorization: Bearer <SECRET_TOKEN>`
+Unauthorized requests receive `403 Forbidden`.

docs/setup.md ADDED Viewed

	@@ -0,0 +1,23 @@

+# Setup & Installation
+## 1. Clone the Repository
+```bash
+git clone https://github.com/cyberalertnepal/aiapi
+cd aiapi
+```
+## 2. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+## 3. Configure Environment
+Create a `.env` file:
+```env
+SECRET_TOKEN=your_secret_token_here
+```
+## 4. Run the API
+```bash
+uvicorn app:app --host 0.0.0.0 --port 8000
+```

docs/structure.md ADDED Viewed

	@@ -0,0 +1,54 @@

+## 🏗️ Project Structure
+```
+├── app.py                   # Main FastAPI app entrypoint
+├── config.py                # Configuration loader (.env, settings)
+├── features/
+│   ├── text_classifier/     # English (GPT-2) classifier
+│   │   ├── controller.py
+│   │   ├── inferencer.py
+│   │   ├── model_loader.py
+│   │   ├── preprocess.py
+│   │   └── routes.py
+│   └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
+│       ├── controller.py
+│       ├── inferencer.py
+│       ├── model_loader.py
+│       ├── preprocess.py
+│       └── routes.py
+├── np_text_model/           # Nepali model artifacts (auto-downloaded)
+│   ├── classifier/
+│   │   └── sentencepiece.bpe.model
+│   └── model_95_acc.pth
+├── models/                  # English GPT-2 model/tokenizer (auto-downloaded)
+│   ├── merges.txt
+│   ├── tokenizer.json
+│   └── model_weights.pth
+├── Dockerfile               # Container build config
+├── Procfile                 # Deployment entrypoint (for PaaS)
+├── requirements.txt         # Python dependencies
+├── README.md
+├── Docs                     # documents
+└── .env                     # Secret token(s), environment config
+```
+### 🌟 Key Files and Their Roles
+- **`app.py`**: Entry point initializing FastAPI app and routes.
+- **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
+- **`requirements.txt`**: Tracks all Python dependencies for the project.
+- **`__init__.py`**: Package initializer for the root module and submodules.
+- **`features/text_classifier/`**
+  - **`controller.py`**: Handles logic between routes and the model.
+  - **`inferencer.py`**: Runs inference and returns predictions as well as file system
+  utilities.
+- **`features/NP/`**
+  - **`controller.py`**: Handles logic between routes and the model.
+  - **`inferencer.py`**: Runs inference and returns predictions as well as file system
+  utilities.
+  - **`model_loader.py`**: Loads the ML model and tokenizer.
+  - **`preprocess.py`**: Prepares input text for the model.
+  - **`routes.py`**: Defines API routes for text classification.
+-[Main](../README.md)

readme.md CHANGED Viewed

@@ -1,330 +1,21 @@
-# 🚀 FastAPI AI Text Detector
-A production-ready FastAPI application for **AI-generated vs. human-written text detection** in both **English** and **Nepali**. Models are auto-managed and endpoints are secured via Bearer token authentication.
----
-## 🏗️ Project Structure
-```
-├── app.py                   # Main FastAPI app entrypoint
-├── config.py                # Configuration loader (.env, settings)
-├── features/
-│   ├── text_classifier/     # English (GPT-2) classifier
-│   │   ├── controller.py
-│   │   ├── inferencer.py
-│   │   ├── model_loader.py
-│   │   ├── preprocess.py
-│   │   └── routes.py
-│   └── nepali_text_classifier/ # Nepali (sentencepiece) classifier
-│       ├── controller.py
-│       ├── inferencer.py
-│       ├── model_loader.py
-│       ├── preprocess.py
-│       └── routes.py
-├── np_text_model/           # Nepali model artifacts (auto-downloaded)
-│   ├── classifier/
-│   │   └── sentencepiece.bpe.model
-│   └── model_95_acc.pth
-├── models/                  # English GPT-2 model/tokenizer (auto-downloaded)
-│   ├── merges.txt
-│   ├── tokenizer.json
-│   └── model_weights.pth
-├── Dockerfile               # Container build config
-├── Procfile                 # Deployment entrypoint (for PaaS)
-├── requirements.txt         # Python dependencies
-├── README.md                # This file
-└── .env                     # Secret token(s), environment config
-```
----
-### 🌟 Key Files and Their Roles
-- **`app.py`**: Entry point initializing FastAPI app and routes.
-- **`Procfile`**: Tells Railway (or similar platforms) how to run the program.
-- **`requirements.txt`**: Tracks all Python dependencies for the project.
-- **`__init__.py`**: Package initializer for the root module and submodules.
-- **`features/text_classifier/`**
-  - **`controller.py`**: Handles logic between routes and the model.
-  - **`inferencer.py`**: Runs inference and returns predictions as well as file system
-  utilities.
-- **`features/NP/`**
-  - **`controller.py`**: Handles logic between routes and the model.
-  - **`inferencer.py`**: Runs inference and returns predictions as well as file system
-  utilities.
-  - **`model_loader.py`**: Loads the ML model and tokenizer.
-  - **`preprocess.py`**: Prepares input text for the model.
-  - **`routes.py`**: Defines API routes for text classification.
----
-## ⚙️ Setup & Installation
-1. **Clone the repository**
-   ```bash
-   git clone https://github.com/cyberalertnepal/aiapi
-   cd aiapi
-   ```
-2. **Install dependencies**
-   ```bash
-   pip install -r requirements.txt
-   ```
-3. **Configure secrets**
-   - Create a `.env` file at the project root:
-     ```env
-     SECRET_TOKEN=your_secret_token_here
-     ```
-   - **All endpoints require `Authorization: Bearer <SECRET_TOKEN>`**
----
-## 🚦 Running the API Server
 ```bash
 uvicorn app:app --host 0.0.0.0 --port 8000
 ```
----
-## 🔒 Security: Bearer Token Auth
-All endpoints require authentication via Bearer token:
-- Set `SECRET_TOKEN` in `.env`
-- Add header: `Authorization: Bearer <SECRET_TOKEN>`
-Unauthorized requests receive `403 Forbidden`.
----
-## 🧩 API Endpoints
-### English (GPT-2) - `/text/`
-| Endpoint                         | Method | Description                               |
-| --------------------------------- | ------ | ----------------------------------------- |
-| `/text/analyse`                  | POST   | Classify raw English text                 |
-| `/text/analyse-sentences`        | POST   | Sentence-by-sentence breakdown            |
-| `/text/analyse-sentance-file`    | POST   | Upload file, per-sentence breakdown       |
-| `/text/upload`                   | POST   | Upload file for overall classification    |
-| `/text/health`                   | GET    | Health check                             |
-#### Example: Classify English text
-```bash
-curl -X POST http://localhost:8000/text/analyse \
-  -H "Authorization: Bearer <SECRET_TOKEN>" \
-  -H "Content-Type: application/json" \
-  -d '{"text": "This is a sample text for analysis."}'
-```
-**Response:**
-```json
-{
-  "result": "AI-generated",
-  "perplexity": 55.67,
-  "ai_likelihood": 66.6
-}
-```
-#### Example: File upload
-```bash
-curl -X POST http://localhost:8000/text/upload \
-  -H "Authorization: Bearer <SECRET_TOKEN>" \
-  -F 'file=@yourfile.txt;type=text/plain'
-```
----
-### Nepali (SentencePiece) - `/NP/`
-| Endpoint                         | Method | Description                               |
-| --------------------------------- | ------ | ----------------------------------------- |
-| `/NP/analyse`                    | POST   | Classify Nepali text                      |
-| `/NP/analyse-sentences`          | POST   | Sentence-by-sentence breakdown            |
-| `/NP/upload`                     | POST   | Upload Nepali PDF for classification      |
-| `/NP/file-sentences-analyse`     | POST   | PDF upload, per-sentence breakdown        |
-| `/NP/health`                     | GET    | Health check                             |
-#### Example: Nepali text classification
-```bash
-curl -X POST http://localhost:8000/NP/analyse \
-  -H "Authorization: Bearer <SECRET_TOKEN>" \
-  -H "Content-Type: application/json" \
-  -d '{"text": "यो उदाहरण वाक्य हो।"}'
-```
-**Response:**
-```json
-{
-  "label": "Human",
-  "confidence": 98.6
-}
-```
-#### Example: Nepali PDF upload
-```bash
-curl -X POST http://localhost:8000/NP/upload \
-  -H "Authorization: Bearer <SECRET_TOKEN>" \
-  -F 'file=@NepaliText.pdf;type=application/pdf'
-```
----
-## 📝 API Docs
-- **Swagger UI:** [http://localhost:8000/docs](http://localhost:8000/docs)
-- **ReDoc:** [http://localhost:8000/redoc](http://localhost:8000/redoc)
----
-## 🧪 Example: Integration with NestJS
-You can easily call this API from a NestJS microservice.
-**.env**
-```env
-FASTAPI_BASE_URL=http://localhost:8000
-SECRET_TOKEN=your_secret_token_here
-```
-**fastapi.service.ts**
-```typescript
-import { Injectable } from "@nestjs/common";
-import { HttpService } from "@nestjs/axios";
-import { ConfigService } from "@nestjs/config";
-import { firstValueFrom } from "rxjs";
-@Injectable()
-export class FastAPIService {
-  constructor(
-    private http: HttpService,
-    private config: ConfigService,
-  ) {}
-  async analyzeText(text: string) {
-    const url = `${this.config.get("FASTAPI_BASE_URL")}/text/analyse`;
-    const token = this.config.get("SECRET_TOKEN");
-    const response = await firstValueFrom(
-      this.http.post(
-        url,
-        { text },
-        {
-          headers: {
-            Authorization: `Bearer ${token}`,
-          },
-        },
-      ),
-    );
-    return response.data;
-  }
-}
-```
-**app.module.ts**
-```typescript
-import { Module } from "@nestjs/common";
-import { ConfigModule } from "@nestjs/config";
-import { HttpModule } from "@nestjs/axios";
-import { AppController } from "./app.controller";
-import { FastAPIService } from "./fastapi.service";
-@Module({
-  imports: [ConfigModule.forRoot(), HttpModule],
-  controllers: [AppController],
-  providers: [FastAPIService],
-})
-export class AppModule {}
-```
-**app.controller.ts**
-```typescript
-import { Body, Controller, Post, Get } from '@nestjs/common';
-import { FastAPIService } from './fastapi.service';
-@Controller()
-export class AppController {
-  constructor(private readonly fastapiService: FastAPIService) {}
-  @Post('analyze-text')
-  async callFastAPI(@Body('text') text: string) {
-    return this.fastapiService.analyzeText(text);
-  }
-  @Get()
-  getHello(): string {
-    return 'NestJS is connected to FastAPI';
-  }
-}
-```
----
-## 🧠 Main Functions in Text Classifier (`features/text_classifier/` and `features/text_classifier/`)
-- **`load_model()`**
-  Loads the GPT-2 model and tokenizer from the specified directory paths.
-- **`lifespan()`**
-  Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
-- **`classify_text_sync()`**
-  Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
-- **`classify_text()`**
-  Asynchronously runs `classify_text_sync()` in a thread pool for non-blocking text classification.
-- **`analyze_text()`**
-  **POST** endpoint: Accepts text input, classifies it using `classify_text()`, and returns the result with perplexity.
-- **`health()`**
-  **GET** endpoint: Simple health check for API liveness.
-- **`parse_docx()`, `parse_pdf()`, `parse_txt()`**
-  Utilities to extract and convert `.docx`, `.pdf`, and `.txt` file contents to plain text.
-- **`warmup()`**
-  Downloads the model repository and initializes the model/tokenizer using `load_model()`.
-- **`download_model_repo()`**
-  Downloads the model files from the designated `MODEL` folder.
-- **`get_model_tokenizer()`**
-  Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
-- **`handle_file_upload()`**
-  Handles file uploads from the `/upload` route. Extracts text, classifies, and returns results.
-- **`extract_file_contents()`**
-  Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
-- **`handle_file_sentence()`**
-  Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
-- **`handle_sentence_level_analysis()`**
-  Checks/strips each sentence, then computes AI/human likelihood for each.
-- **`analyze_sentences()`**
-  Splits paragraphs into sentences, classifies each, and returns all results.
-- **`analyze_sentence_file()`**
-  Like `handle_file_sentence()`—analyzes sentences in uploaded files.
----
 ## 🚀 Deployment
 - **Local**: Use `uvicorn` as above.

+# 🚀 FastAPI AI Detector
+A production-ready FastAPI app for detecting AI vs. human-written text in English and Nepali. It uses GPT-2 and SentencePiece-based models, with Bearer token security.
+## 📂 Documentation
+- [Project Structure](docs/structure.md)
+- [API Endpoints](docs/api_endpoints.md)
+- [Setup & Installation](docs/setup.md)
+- [Deployment](docs/deployment.md)
+- [Security](docs/security.md)
+- [NestJS Integration](docs/nestjs_integration.md)
+- [Core Functions](docs/functions.md)
+## ⚡ Quick Start
 ```bash
 uvicorn app:app --host 0.0.0.0 --port 8000
 ```
 ## 🚀 Deployment
 - **Local**: Use `uvicorn` as above.