Prince9191 commited on
Commit
012f59e
·
verified ·
1 Parent(s): 7821e12

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -95
README.md CHANGED
@@ -1,98 +1,12 @@
1
- Absolutely! Here’s the full rephrased content you can easily copy and paste:
2
-
3
-
4
-
5
-
6
-
7
- # Image Description and Audio Transcript App
8
-
9
- An AI-powered web app that identifies objects in images and converts the generated descriptions into speech using Hugging Face Transformers.
10
-
11
- ---
12
-
13
- ## Overview
14
- This project showcases how to build a pipeline using:
15
- - **BLIP** for image captioning
16
- - **gTTS** (Google Text-to-Speech) for audio generation
17
- - **Gradio** for the user interface and deployment on Hugging Face Spaces
18
-
19
- ---
20
-
21
- ## What It Does
22
- - Upload an image → Get an AI-generated description
23
- - Automatically convert the description into audio
24
- - Built with accessibility in mind for users with visual impairments
25
- - Runs on a clean, responsive web UI using **Gradio**
26
-
27
- ---
28
-
29
- ## Tech Stack
30
- - **Language**: Python 3.7+
31
- - **AI Models**:
32
- - `Salesforce/blip-image-captioning-base` – for generating image captions
33
- - `gtts` – for converting text into speech
34
- - **Frameworks/Libraries**:
35
- - `torch` – powering the models
36
- - `transformers` – loading and running pre-trained models
37
- - `gradio` – creating the interactive frontend
38
- - `Pillow`, `matplotlib`, `inflect` – for image handling and fine-tuning the output
39
-
40
  ---
41
-
42
- ## Installation
43
-
44
- 1. **Clone the repo** (or upload files to your Hugging Face Space):
45
- ```bash
46
- git clone https://github.com/your-username/image-caption-audio-app.git
47
- cd image-caption-audio-app
48
-
49
- 2. (Optional) Create a virtual environment:
50
-
51
- python -m venv venv
52
- source venv/bin/activate # For Windows: venv\Scripts\activate
53
-
54
- 3. Install dependencies:
55
-
56
- pip install torch transformers gtts gradio Pillow matplotlib inflect
57
-
58
- If you’re using Hugging Face Spaces, simply include a requirements.txt file with those packages.
59
-
60
-
61
-
62
- How to Run
63
-
64
- Locally:
65
-
66
- python object_detection.py
67
-
68
- Then visit: http://127.0.0.1:7860 in your browser.
69
-
70
- On Hugging Face:
71
- Just upload all files (including requirements.txt) to your Space. It’ll launch automatically.
72
-
73
-
74
-
75
- Customizations
76
-
77
- You can tweak parameters (like host, port, or debug settings) directly in the script if needed. For example:
78
-
79
- gr.Interface(...).launch(server_name="0.0.0.0", server_port=7860, debug=True)
80
-
81
-
82
-
83
-
84
-
85
- Credits
86
- • Hugging Face for the BLIP model
87
- • Google for gTTS
88
- • Gradio for simplifying deployment and UI creation
89
-
90
-
91
-
92
- License
93
-
94
- MIT License – Feel free to use, share, and modify.
95
-
96
  ---
97
 
98
- Let me know if you'd like a version with your name, GitHub link, or any branding!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Object Detection App
3
+ emoji: 🧠
4
+ colorFrom: indigo
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: "4.20.0"
8
+ app_file: app.py
9
+ pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference