Proff12 commited on
Commit
790fb60
ยท
0 Parent(s):

Squashed commit: keep current project state

Browse files
Files changed (2) hide show
  1. Dockerfile +217 -0
  2. README.md +104 -0
Dockerfile ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # syntax=docker/dockerfile:1.4
2
+ FROM python:3.10-slim AS source
3
+
4
+ ARG HF_API_TOKEN
5
+ ARG SRC_URL
6
+
7
+ # Ensure git and certificates are available for cloning
8
+ RUN apt-get update && apt-get install -y --no-install-recommends \
9
+ git ca-certificates && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Clone the repository once in its own stage. Files will be moved to /repo
12
+ # Use a shallow clone to reduce time and bandwidth and make caching more stable
13
+ # This RUN attempts to read a BuildKit secret at /run/secrets/HF_API_TOKEN, and
14
+ # falls back to the HF_API_TOKEN environment variable if present. It fails early
15
+ # with a clear message when no token is provided.
16
+ RUN --mount=type=secret,id=HF_API_TOKEN,required=false --mount=type=secret,id=SRC_URL,required=false sh -c '\
17
+ if [ -f /run/secrets/HF_API_TOKEN ]; then TOKEN=$(cat /run/secrets/HF_API_TOKEN); \
18
+ elif [ -f /run/secrets/HF_TOKEN ]; then TOKEN=$(cat /run/secrets/HF_TOKEN); \
19
+ elif [ -n "$HF_API_TOKEN" ]; then TOKEN=$HF_API_TOKEN; \
20
+ elif [ -n "$HF_TOKEN" ]; then TOKEN=$HF_TOKEN; \
21
+ else echo "ERROR: HF token not provided (set BuildKit secret HF_API_TOKEN/HF_TOKEN or HF_API_TOKEN/HF_TOKEN env)"; exit 1; fi && \
22
+ # Attempt to clone directly into /repo. If the remote creates a single top-level
23
+ # directory, detect that and move its contents into /repo so /repo/frontend exists.
24
+ mkdir -p /repo && \
25
+ # Determine source URL: secret at /run/secrets/SRC_URL > ARG SRC_URL
26
+ if [ -f /run/secrets/SRC_URL ]; then SRC=$(cat /run/secrets/SRC_URL); \
27
+ elif [ -n "$SRC_URL" ]; then SRC=$SRC_URL; \
28
+ else echo "ERROR: SRC_URL not provided (set BuildKit secret SRC_URL or build-arg SRC_URL)"; exit 1; fi && \
29
+ echo "Cloning from $SRC" && \
30
+ # Normalize SRC: remove leading http(s):// if present, then insert token credentials
31
+ if echo "$SRC" | grep -qE '^https?://'; then \
32
+ NO_SCHEME=$(echo "$SRC" | sed -E 's#^https?://##'); \
33
+ else \
34
+ NO_SCHEME="$SRC"; \
35
+ fi && \
36
+ CLONE_URL="https://__token__:$TOKEN@$NO_SCHEME" && \
37
+ git clone --depth 1 "$CLONE_URL" /repo_tmp && \
38
+ echo "--- Debug: listing /repo_tmp (show hidden and nested) ---" && \
39
+ ls -la /repo_tmp || true && \
40
+ # If repo_tmp contains exactly one directory and no other files, move its contents up
41
+ set -- /repo_tmp/*; count=$#; if [ $count -eq 1 ] && [ -d "$1" ]; then \
42
+ echo "--- Single top-level dir detected: moving its contents into /repo ---" && \
43
+ mv "$1"/* "$1"/.??* /repo/ 2>/dev/null || true; \
44
+ else \
45
+ echo "--- Multiple entries detected: moving all into /repo ---" && \
46
+ mv /repo_tmp/* /repo/ 2>/dev/null || true; \
47
+ mv /repo_tmp/.[!.]* /repo/ 2>/dev/null || true; \
48
+ fi && \
49
+ rm -rf /repo_tmp/.git && rm -rf /repo_tmp'
50
+
51
+ # Verify the clone succeeded and /repo contains files; fail early with a helpful message
52
+ RUN [ -d /repo ] && [ "$(ls -A /repo | wc -c)" -gt 0 ] || (echo "ERROR: clone failed or /repo is empty" && exit 1)
53
+
54
+ # --- Stage 1: Build React frontend ---
55
+ FROM node:20-alpine AS frontend
56
+
57
+ WORKDIR /app/frontend
58
+
59
+ # Install dependencies (copied from the cloned source stage)
60
+ COPY --from=source /repo/frontend/package*.json ./
61
+ COPY --from=source /repo/frontend/package-lock.json ./
62
+ RUN npm install --frozen-lockfile
63
+
64
+ # Build frontend (source files copied from the cloned source stage)
65
+ COPY --from=source /repo/frontend/ ./
66
+ RUN npm run build
67
+
68
+ # --- Stage 2: Python backend (CPU only) ---
69
+ FROM python:3.10-slim AS backend
70
+
71
+ # Environment setup
72
+ ENV DEBIAN_FRONTEND=noninteractive \
73
+ PYTHONDONTWRITEBYTECODE=1 \
74
+ PYTHONUNBUFFERED=1 \
75
+ PIP_NO_CACHE_DIR=1 \
76
+ HF_HOME=/app/.cache/huggingface
77
+
78
+ # Install system dependencies
79
+ RUN apt-get update && apt-get install -y --no-install-recommends \
80
+ git curl && \
81
+ rm -rf /var/lib/apt/lists/*
82
+
83
+ # Create non-root user
84
+ RUN useradd -m appuser
85
+
86
+ # Create necessary directories and set permissions
87
+ RUN mkdir -p /app/.cache/huggingface \
88
+ && mkdir -p /app/static \
89
+ && chown -R appuser:appuser /app
90
+
91
+ # Switch to non-root user
92
+ USER appuser
93
+
94
+ WORKDIR /app
95
+
96
+ # Upgrade pip and install Python dependencies
97
+ COPY --from=source /repo/backend/requirements.txt /app/backend/requirements.txt
98
+ RUN python3 -m pip install --upgrade pip && \
99
+ python3 -m pip install -r /app/backend/requirements.txt
100
+
101
+ # Copy backend code
102
+ COPY --from=source /repo/backend/ /app/backend/
103
+
104
+ # Fathom-Search-4B files are now part of the backend app directory
105
+
106
+ # Copy frontend build to static directory
107
+ COPY --from=frontend /app/frontend/out/ /app/static/
108
+
109
+ # App-specific environment variables
110
+ ENV STATIC_DIR=/app/static \
111
+ MODEL_ID=FractalAIResearch/Fathom-R1-14B \
112
+ PIPELINE_TASK=text-generation \
113
+ QUANTIZE=auto \
114
+ PORT_SERPER_HOST=2221 \
115
+ HOST_SERPER_URL=http://0.0.0.0:2221 \
116
+ SERPER_URL=http://0.0.0.0:2221 \
117
+ PYTHONPATH=/app/backend/app:/app/backend \
118
+ MAX_OUTBOUND=256 \
119
+ JINA_CACHE_DIR=/app/.cache/jina_cache \
120
+ SERPER_CACHE_DIR=/app/.cache/serper_cache \
121
+ BOXED_WRAP_WIDTH=130 \
122
+ CRAWL4AI_EP=http://localhost:8080 \
123
+ CURL_CA_BUNDLE="" \
124
+ REQUESTS_CA_BUNDLE="" \
125
+ SSL_VERIFY=false
126
+
127
+ # Create cache directories
128
+ RUN mkdir -p /app/.cache/jina_cache /app/.cache/serper_cache && \
129
+ chown -R appuser:appuser /app/.cache
130
+
131
+ # Optional: Healthcheck endpoint - check both services
132
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
133
+ CMD curl -f http://localhost:7860/docs && curl -f http://localhost:2221/health || exit 1
134
+
135
+ EXPOSE 7860 2221
136
+
137
+ # Create startup script with proper service management
138
+ RUN echo '#!/bin/bash\n\
139
+ set -e\n\
140
+ \n\
141
+ # Cleanup function\n\
142
+ cleanup() {\n\
143
+ echo "๐Ÿ›‘ Shutting down services..."\n\
144
+ if [ ! -z "$SERPER_PID" ] && kill -0 $SERPER_PID 2>/dev/null; then\n\
145
+ kill $SERPER_PID\n\
146
+ echo "โœ… Serper service stopped"\n\
147
+ fi\n\
148
+ if [ ! -z "$BACKEND_PID" ] && kill -0 $BACKEND_PID 2>/dev/null; then\n\
149
+ kill $BACKEND_PID\n\
150
+ echo "โœ… Backend service stopped"\n\
151
+ fi\n\
152
+ exit 0\n\
153
+ }\n\
154
+ \n\
155
+ # Set up signal handlers\n\
156
+ trap cleanup SIGTERM SIGINT\n\
157
+ \n\
158
+ echo "๐Ÿš€ Starting FathomPlayground on Hugging Face Spaces"\n\
159
+ echo "โœ… Environment variables configured:"\n\
160
+ echo " HF_MODEL_URL: configured"\n\
161
+ echo " HOST_SERPER_URL: configured"\n\
162
+ echo " PORT_SERPER_HOST: configured"\n\
163
+ echo " HF_API_TOKEN: SET"\n\
164
+ echo " SERPER_API_KEY: SET"\n\
165
+ echo " OPENAI_API_KEY: SET"\n\
166
+ echo " HF_TOKEN: SET"\n\
167
+ echo " SUMMARY_HF_MODEL_URL: configured"\n\
168
+ echo " CRAWL4AI_EP: configured"\n\
169
+ echo " JINA_API_KEY: SET"\n\
170
+ echo " JINA_CACHE_DIR: configured"\n\
171
+ echo " SERPER_CACHE_DIR: configured"\n\
172
+ \n\
173
+ echo "๐Ÿ” Starting Serper Host Server..."\n\
174
+ cd /app/backend/app\n\
175
+ python3 -m web_agents_5.sandbox_serper --port 2221 --workers 1 &\n\
176
+ SERPER_PID=$!\n\
177
+ echo "โœ… Serper service started"\n\
178
+ \n\
179
+ # Wait for Serper service to be ready\n\
180
+ echo "โณ Waiting for Serper service to be ready..."\n\
181
+ for i in {1..30}; do\n\
182
+ if curl -s http://localhost:2221/health > /dev/null 2>&1; then\n\
183
+ echo "โœ… Serper service is ready"\n\
184
+ break\n\
185
+ fi\n\
186
+ if [ $i -eq 30 ]; then\n\
187
+ echo "โŒ Serper service failed to start within 30 seconds"\n\
188
+ cleanup\n\
189
+ exit 1\n\
190
+ fi\n\
191
+ sleep 1\n\
192
+ done\n\
193
+ \n\
194
+ echo "๐Ÿš€ Starting Backend Service..."\n\
195
+ python3 -m uvicorn main:app --host 0.0.0.0 --port 7860 &\n\
196
+ BACKEND_PID=$!\n\
197
+ echo "โœ… Backend service started on port 7860 (PID: $BACKEND_PID)"\n\
198
+ \n\
199
+ # Monitor both services\n\
200
+ while true; do\n\
201
+ if ! kill -0 $SERPER_PID 2>/dev/null; then\n\
202
+ echo "โŒ Serper service died, restarting..."\n\
203
+ python3 -m web_agents_5.sandbox_serper --port 2221 --workers 1 &\n\
204
+ SERPER_PID=$!\n\
205
+ echo "โœ… Serper service restarted (PID: $SERPER_PID)"\n\
206
+ fi\n\
207
+ if ! kill -0 $BACKEND_PID 2>/dev/null; then\n\
208
+ echo "โŒ Backend service died, exiting..."\n\
209
+ cleanup\n\
210
+ exit 1\n\
211
+ fi\n\
212
+ sleep 5\n\
213
+ done' > /app/start.sh && \
214
+ chmod +x /app/start.sh && \
215
+ chown appuser:appuser /app/start.sh
216
+
217
+ ENTRYPOINT ["/app/start.sh"]
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: FathomDeepResearch
3
+ emoji: ๐Ÿงฎ
4
+ colorFrom: blue
5
+ colorTo: red
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: apache-2.0
10
+ short_description: Advanced research AI with web search capabilities
11
+ ---
12
+
13
+ # ๐Ÿ”ฌ FathomDeepResearch
14
+
15
+ Advanced AI research agent powered by Fathom-Search-4B and Fathom-Synthesizer-4B models. This app provides deep research capabilities with real-time web search and intelligent synthesis.
16
+
17
+ ## ๐Ÿš€ Features
18
+
19
+ - **๐Ÿง  Advanced Reasoning**: Powered by Fathom-R1-14B for sophisticated thinking
20
+ - **๐Ÿ” Real-time Web Search**: Integrated search across multiple sources
21
+ - **๐Ÿ“Š Intelligent Synthesis**: Combines search results into coherent answers
22
+ - **๐ŸŽจ Rich UI Components**: Streamlined chat interface with progress tracking
23
+ - **โšก Fast Performance**: Optimized for Hugging Face Spaces
24
+
25
+ ## ๐Ÿ› ๏ธ How to Use
26
+
27
+ 1. **Enter your research question** in the text box
28
+ 2. **Click "Research"** to start the deep research process
29
+ 3. **Watch progress** as the AI searches and synthesizes information
30
+ 4. **Get comprehensive answers** with source citations
31
+
32
+ ## ๐Ÿ’ก Example Questions
33
+
34
+ - "What are the latest AI developments in 2024?"
35
+ - "DeepResearch on climate change solutions"
36
+ - "UPSC 2025 preparation strategy"
37
+ - "Comparative analysis of electric vehicle adoption"
38
+
39
+ ## ๐Ÿ”ง Technical Details
40
+
41
+ ### Models Used
42
+ - **Fathom-Search-4B**: For web search and retrieval
43
+ - **Fathom-Synthesizer-4B**: For answer synthesis
44
+ - **Fathom-R1-14B**: For reasoning and planning
45
+
46
+ ### Architecture
47
+ - **Backend**: FastAPI with Gradio integration
48
+ - **Frontend**: React-based chat interface
49
+ - **Search**: Multi-source web search with Serper API
50
+ - **Deployment**: Docker containers optimized for HF Spaces
51
+
52
+ ## ๐Ÿ“‹ Requirements
53
+
54
+ - Python 3.10+
55
+ - Transformers 4.35+
56
+ - Gradio 4.0+
57
+ - FastAPI
58
+ - Hugging Face Transformers
59
+
60
+ ## ๐ŸŒ Deployment
61
+
62
+ This app is deployed on Hugging Face Spaces using Docker. The setup includes:
63
+
64
+ - Automatic model downloading
65
+ - Environment configuration
66
+ - Error handling and fallbacks
67
+ - Multi-modal capabilities
68
+
69
+ ## ๐Ÿ“– License
70
+
71
+ Apache 2.0 License - See LICENSE file for details
72
+
73
+ ## ๐Ÿค Contributing
74
+
75
+ Contributions are welcome! Please feel free to submit a Pull Request.
76
+
77
+ ## ๐Ÿ“ž Support
78
+
79
+ For issues or questions:
80
+ - Check the docs folder for detailed documentation
81
+ - Open an issue on the repository
82
+ - Contact the development team
83
+
84
+ ## ๐Ÿงฉ Building the Docker image locally (private Hugging Face repo)
85
+
86
+ If the source is in a private Hugging Face Space, provide a token when building the image. The Dockerfile clones the repository during build using the build-arg `HF_API_TOKEN`.
87
+
88
+ Examples (PowerShell):
89
+
90
+ Provide token as a build-arg (less secure, visible in image history):
91
+
92
+ ```powershell
93
+ docker build -t fathom-deploy --build-arg HF_API_TOKEN=hf_xxx .
94
+ ```
95
+
96
+ Using BuildKit and a secret (recommended):
97
+
98
+ ```powershell
99
+ $env:DOCKER_BUILDKIT=1; docker build --secret id=hf_token,src=$env:USERPROFILE\.hf_token -t fathom-deploy .
100
+ ```
101
+
102
+ Place your token in a file (e.g. %USERPROFILE%\.hf_token) containing only the token string, then reference it with `--secret`. You would need to adapt the Dockerfile to read from `/run/secrets/hf_token` if you choose this approach.
103
+
104
+ Note: If the repository is public you can omit the build-arg and the Dockerfile will clone anonymously.