Badal commited on
Commit
fce7147
·
0 Parent(s):

Upload code

Browse files
Files changed (4) hide show
  1. Dockerfile +20 -0
  2. README.md +86 -0
  3. app.py +167 -0
  4. requirements.txt +7 -0
Dockerfile ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Tesseract और लैंग्वेज पैक्स इनस्टॉल करना
6
+ RUN apt-get update && apt-get install -y \
7
+ tesseract-ocr \
8
+ tesseract-ocr-hin \
9
+ tesseract-ocr-tel \
10
+ libgl1 \
11
+ && rm -rf /var/lib/apt/lists/*
12
+
13
+ COPY requirements.txt .
14
+ RUN pip install --no-cache-dir -r requirements.txt
15
+
16
+ COPY . .
17
+
18
+ EXPOSE 7860
19
+
20
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ title: 'Still frame '
4
+ sdk: docker
5
+ emoji: 🚀
6
+ colorFrom: green
7
+ colorTo: red
8
+ pinned: false
9
+ thumbnail: >-
10
+ https://cdn-uploads.huggingface.co/production/uploads/683d3312c1707119d087fc4d/DDo9ikZZQM1L9k5UC1ePR.jpeg
11
+ short_description: 'Screenshot for picture '
12
+ ---
13
+ title: TMDB OCR Pro API emoji: 🎬 colorFrom: blue colorTo: indigo sdk: docker app_file: app.py pinned: false
14
+ 🎬 TMDB + OCR Pro API (OptiPix Engine)
15
+ Created by: Badal 🚀
16
+ This is a high-performance, parallel-processing API designed to fetch 100% clean, text-free movie screenshots and posters. It bridges the gap between IMDb, TMDb, and the OptiPix Image Compression Engine.
17
+ ✨ Key Features
18
+ * Smart Text-Filter: Uses TMDb's language tag hack (iso_639_1 is null) to instantly filter out fan-made posters and title cards.
19
+ * Hardcore OCR Scanner: Integrates Tesseract OCR (English, Hindi, Telugu) to aggressively scan and reject any remaining images containing text.
20
+ * Parallel Optimization: Uses asyncio.gather to send multiple images to the OptiPix compression server simultaneously, resulting in blazing-fast response times.
21
+ * ISP Bypass (India Ready): Generates optimized URLs via a custom CDN, bypassing Indian ISP blocks on TMDb image servers.
22
+ * Dual URL Output: Returns both the original TMDb HD URL and the Secured OptiPix Compressed URL.
23
+ 📡 API Reference
24
+ Endpoint
25
+ POST /get-media
26
+ Request Format
27
+ Content-Type: multipart/form-data
28
+ Parameters
29
+ | Parameter | Type | Required | Default | Description |
30
+ |---|---|---|---|---|
31
+ | title_id | string | Yes | - | The IMDb Title ID of the movie (e.g., tt3801314). |
32
+ | top_shots | integer | No | 3 | Maximum number of clean screenshots you want to fetch. |
33
+ | level | string | No | extreme | Compression level for OptiPix (none, medium, extreme). |
34
+ 💻 How to Make a Request
35
+ Example 1: cURL (Terminal)
36
+ curl -X POST "https://YOUR_SPACE_NAME.hf.space/get-media" \
37
+ -H "accept: application/json" \
38
+ -H "Content-Type: application/x-www-form-urlencoded" \
39
+ -d "title_id=tt3801314&top_shots=3&level=extreme"
40
+
41
+ Example 2: JavaScript (Frontend)
42
+ const formData = new FormData();
43
+ formData.append("title_id", "tt3801314");
44
+ formData.append("top_shots", 3);
45
+ formData.append("level", "extreme");
46
+
47
+ fetch("https://YOUR_SPACE_NAME.hf.space/get-media", {
48
+ method: "POST",
49
+ body: formData
50
+ })
51
+ .then(response => response.json())
52
+ .then(data => console.log(data));
53
+
54
+ 📦 Expected JSON Response
55
+ The API returns a clean JSON object containing both the Poster and an array of Screenshots.
56
+ {
57
+ "title_id": "tt3801314",
58
+ "tmdb_id": 293313,
59
+ "requested_shots": 3,
60
+ "total_screenshots_scanned": 15,
61
+ "poster": {
62
+ "original_url": "[https://image.tmdb.org/t/p/original/mxyz123.jpg](https://image.tmdb.org/t/p/original/mxyz123.jpg)",
63
+ "processed_url": "[https://bk939448-image-optimizer-api.hf.space/optimized_poster.jpg](https://bk939448-image-optimizer-api.hf.space/optimized_poster.jpg)"
64
+ },
65
+ "screenshots": [
66
+ {
67
+ "original_url": "[https://image.tmdb.org/t/p/original/abc1.jpg](https://image.tmdb.org/t/p/original/abc1.jpg)",
68
+ "processed_url": "[https://bk939448-image-optimizer-api.hf.space/shot1.jpg](https://bk939448-image-optimizer-api.hf.space/shot1.jpg)"
69
+ },
70
+ {
71
+ "original_url": "[https://image.tmdb.org/t/p/original/abc2.jpg](https://image.tmdb.org/t/p/original/abc2.jpg)",
72
+ "processed_url": "[https://bk939448-image-optimizer-api.hf.space/shot2.jpg](https://bk939448-image-optimizer-api.hf.space/shot2.jpg)"
73
+ },
74
+ {
75
+ "original_url": "[https://image.tmdb.org/t/p/original/abc3.jpg](https://image.tmdb.org/t/p/original/abc3.jpg)",
76
+ "processed_url": "[https://bk939448-image-optimizer-api.hf.space/shot3.jpg](https://bk939448-image-optimizer-api.hf.space/shot3.jpg)"
77
+ }
78
+ ]
79
+ }
80
+
81
+ Note: If OptiPix fails to compress an image, the processed_url will return as null. You can always fallback to the original_url.
82
+ ⚙️ Deployment Requirements
83
+ If you are hosting this yourself, ensure the following setup:
84
+ * Dockerfile: Must have tesseract-ocr, tesseract-ocr-hin, and tesseract-ocr-tel installed via apt-get.
85
+ * Environment Variables: You MUST set your TMDb API Key in the server secrets.
86
+ * TMDB_API_KEY = your_tmdb_api_key_here
app.py ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import httpx
3
+ from fastapi import FastAPI, Form, HTTPException
4
+ from fastapi.middleware.cors import CORSMiddleware
5
+ from pydantic import BaseModel
6
+ from typing import Optional, List
7
+ import asyncio
8
+ import uvicorn
9
+ import pytesseract
10
+ from PIL import Image
11
+ import io
12
+ import re
13
+
14
+ # Tesseract का Linux पाथ
15
+ pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'
16
+
17
+ app = FastAPI(title="TMDB + OCR Pro API | Badal Special")
18
+
19
+ app.add_middleware(
20
+ CORSMiddleware,
21
+ allow_origins=["*"],
22
+ allow_credentials=True,
23
+ allow_methods=["*"],
24
+ allow_headers=["*"],
25
+ )
26
+
27
+ OPTIPIX_API = "https://jerecom-image-optimizer-api-2.hf.space/upload-poster"
28
+ TMDB_API_KEY = os.getenv("TMDB_API_KEY")
29
+
30
+ class ImageMedia(BaseModel):
31
+ original_url: str
32
+ processed_url: Optional[str]
33
+
34
+ class ProcessResponse(BaseModel):
35
+ title_id: str
36
+ tmdb_id: int
37
+ requested_shots: int
38
+ total_screenshots_scanned: int
39
+ poster: Optional[ImageMedia]
40
+ screenshots: List[ImageMedia]
41
+
42
+ # --- 1. OCR Scanner Function ---
43
+ def check_text_in_image(image_bytes: bytes) -> bool:
44
+ try:
45
+ img = Image.open(io.BytesIO(image_bytes))
46
+ img.thumbnail((500, 500)) # फ़ास्ट स्कैनिंग के लिए छोटा करना
47
+ img = img.convert('L') # ब्लैक एंड वाइट
48
+
49
+ # इंग्लिश, हिंदी और तेलुगु स्कैन
50
+ text = pytesseract.image_to_string(img, lang='eng+hin+tel')
51
+
52
+ # सिर्फ़ शब्द और नंबर रखना
53
+ clean_text = re.sub(r'[^a-zA-Z0-9\u0900-\u097F\u0C00-\u0C7F]', '', text)
54
+
55
+ # अगर 4 कैरेक्टर से ज़्यादा टेक्स्ट है, तो यह स्क्रीनशॉट नहीं, पोस्टर है (True)
56
+ return len(clean_text) > 4
57
+ except Exception as e:
58
+ print(f"OCR Parsing Error: {e}")
59
+ return True # रिस्क नहीं लेने का, रिजेक्ट कर दो!
60
+
61
+ # --- 2. Parallel OptiPix Function ---
62
+ async def optimize_image(client: httpx.AsyncClient, raw_url: str, level: str):
63
+ form_data = {"level": level, "url": raw_url}
64
+ result = {"original_url": raw_url, "processed_url": None}
65
+ try:
66
+ res = await client.post(OPTIPIX_API, data=form_data, timeout=30.0)
67
+ data = res.json()
68
+ if data.get("success"):
69
+ result["processed_url"] = data.get("url")
70
+ except Exception as e:
71
+ print(f"OptiPix failed for {raw_url} - Error: {e}")
72
+ return result
73
+
74
+ @app.post("/get-media", response_model=ProcessResponse)
75
+ async def get_media(
76
+ title_id: str = Form(..., description="IMDb Title ID (e.g., tt3801314)"),
77
+ top_shots: int = Form(3, description="Number of screenshots required"),
78
+ level: str = Form("extreme", description="Compression level")
79
+ ):
80
+ if not TMDB_API_KEY:
81
+ raise HTTPException(status_code=500, detail="TMDB_API_KEY is missing!")
82
+
83
+ async with httpx.AsyncClient(timeout=120.0) as client:
84
+ # --- STEP 1: TMDb ID ढूँढना ---
85
+ find_url = f"https://api.themoviedb.org/3/find/{title_id}?external_source=imdb_id&api_key={TMDB_API_KEY}"
86
+ find_res = await client.get(find_url)
87
+ find_data = find_res.json()
88
+
89
+ movie_results = find_data.get("movie_results", [])
90
+ if not movie_results:
91
+ return {"error": "TMDb पर इस IMDb ID की कोई मूवी नहीं मिली!"}
92
+
93
+ tmdb_id = movie_results[0]["id"]
94
+
95
+ # --- STEP 2: TMDb से इमेजेज लाना ---
96
+ images_url = f"https://api.themoviedb.org/3/movie/{tmdb_id}/images?api_key={TMDB_API_KEY}"
97
+ img_res = await client.get(images_url)
98
+ img_data = img_res.json()
99
+
100
+ raw_backdrops = img_data.get("backdrops", [])
101
+ raw_posters = img_data.get("posters", [])
102
+
103
+ # 🔥 SMART HACK: सिर्फ़ वो बैकड्रॉप्स लो जिनमें लैंग्वेज 'null' हो
104
+ clean_backdrops = [shot for shot in raw_backdrops if shot.get("iso_639_1") is None]
105
+ clean_backdrops.sort(key=lambda x: x["width"], reverse=True)
106
+
107
+ # --- STEP 3: Poster निकालना ---
108
+ best_poster_url = None
109
+ if raw_posters:
110
+ raw_posters.sort(key=lambda x: x["width"], reverse=True)
111
+ best_poster_url = f"https://image.tmdb.org/t/p/original{raw_posters[0]['file_path']}"
112
+
113
+ # --- STEP 4: HARDCORE OCR SCANNING ---
114
+ verified_screenshots_urls = []
115
+ for shot in clean_backdrops:
116
+ if len(verified_screenshots_urls) >= top_shots:
117
+ break # ज़रूरत पूरी हो गई, रुक जाओ
118
+
119
+ shot_url = f"https://image.tmdb.org/t/p/original{shot['file_path']}"
120
+
121
+ try:
122
+ # इमेज डाउनलोड करके OCR को दो
123
+ img_res_dl = await client.get(shot_url, timeout=10.0)
124
+ if img_res_dl.status_code == 200:
125
+ # Async में OCR चलाओ ताकि सर्वर हैंग न हो
126
+ has_text = await asyncio.to_thread(check_text_in_image, img_res_dl.content)
127
+
128
+ if not has_text: # अगर टेक्स्ट नहीं है, तो पास!
129
+ verified_screenshots_urls.append(shot_url)
130
+ print(f"Clean Screenshot Passed OCR: {shot_url}")
131
+ else:
132
+ print(f"Rejected by OCR (Text Found): {shot_url}")
133
+ except Exception as e:
134
+ print(f"Image download error for OCR: {e}")
135
+ continue
136
+
137
+ # --- STEP 5: पैरेलल ऑप्टिमाइज़ेशन (OptiPix) ---
138
+ tasks = []
139
+ if best_poster_url:
140
+ tasks.append(optimize_image(client, best_poster_url, level))
141
+
142
+ for url in verified_screenshots_urls:
143
+ tasks.append(optimize_image(client, url, level))
144
+
145
+ results = await asyncio.gather(*tasks)
146
+
147
+ final_poster = None
148
+ final_screenshots = []
149
+
150
+ if best_poster_url and results:
151
+ final_poster = results[0]
152
+ final_screenshots = results[1:]
153
+ else:
154
+ final_screenshots = results
155
+
156
+ return ProcessResponse(
157
+ title_id=title_id,
158
+ tmdb_id=tmdb_id,
159
+ requested_shots=top_shots,
160
+ total_screenshots_scanned=len(clean_backdrops),
161
+ poster=final_poster,
162
+ screenshots=final_screenshots
163
+ )
164
+
165
+ if __name__ == "__main__":
166
+ uvicorn.run("app:app", host="0.0.0.0", port=7860)
167
+
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ httpx
4
+ pydantic
5
+ python-multipart
6
+ Pillow
7
+ pytesseract