Spaces:
Sleeping
Sleeping
Commit
·
e0b4927
1
Parent(s):
60344c1
Readme update
Browse files
README.md
CHANGED
|
@@ -1,224 +1,16 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
### Prerequisites
|
| 19 |
-
|
| 20 |
-
1. **Python 3.8+**
|
| 21 |
-
2. **Pinecone Account**: Create a free account at [Pinecone.io](https://www.pinecone.io/)
|
| 22 |
-
3. **System Requirements** for LLM:
|
| 23 |
-
- **RAM**: 8GB minimum (16GB+ recommended)
|
| 24 |
-
- **Storage**: 5-8GB free space for model
|
| 25 |
-
- **CPU**: Multi-core processor (supports GPU acceleration if available)
|
| 26 |
-
|
| 27 |
-
### Installation
|
| 28 |
-
|
| 29 |
-
1. **Clone or download this project**
|
| 30 |
-
```bash
|
| 31 |
-
git clone https://github.com/shahzeb171/code-compass.git
|
| 32 |
-
cd code-compass
|
| 33 |
-
```
|
| 34 |
-
|
| 35 |
-
2. **Install dependencies**
|
| 36 |
-
```bash
|
| 37 |
-
pip install -r requirements.txt
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
3. **Download the LLM model**
|
| 41 |
-
```
|
| 42 |
-
wget https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
|
| 43 |
-
```
|
| 44 |
-
**Recommended**: Select Q4_K_M for the best balance of quality and performance.
|
| 45 |
-
|
| 46 |
-
4. **Set up Pinecone API Key**
|
| 47 |
-
|
| 48 |
-
Create `config.py` file:
|
| 49 |
-
```
|
| 50 |
-
PINECONE_API_KEY=your-pinecone-api-key-here
|
| 51 |
-
PINECONE_INDEX_NAME=index_name(eg. code_compass_index)
|
| 52 |
-
PINECONE_EMBEDDING_MODEL=embedding_model(eg. llama-text-embed-v2 (check pinecone docs for more models))
|
| 53 |
-
MODEL_PATH=path_to_the_model
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
### Getting Your Pinecone API Key
|
| 57 |
-
|
| 58 |
-
1. Go to [Pinecone.io](https://www.pinecone.io/) and sign up for a free account
|
| 59 |
-
2. Navigate to the "API Keys" section in your dashboard
|
| 60 |
-
3. Create a new API key or copy an existing one
|
| 61 |
-
4. The free tier includes:
|
| 62 |
-
- 1 index
|
| 63 |
-
- 5M vector dimensions
|
| 64 |
-
- Enough for most code analysis projects!
|
| 65 |
-
|
| 66 |
-
## 🚀 Usage
|
| 67 |
-
|
| 68 |
-
1. **Start the application**
|
| 69 |
-
```bash
|
| 70 |
-
python main.py
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
2. **Open your browser** to `http://localhost:7860`
|
| 74 |
-
|
| 75 |
-
3. **Load a repository**
|
| 76 |
-
- Enter a GitHub URL (e.g., `https://github.com/pallets/flask`)
|
| 77 |
-
- Or upload a ZIP file of your code
|
| 78 |
-
- Click "📁 Load Repository"
|
| 79 |
-
|
| 80 |
-
4. **Process the repository**
|
| 81 |
-
- Click "🚀 Process Repository" to analyze and chunk your code
|
| 82 |
-
- This creates hierarchical chunks and stores them in Pinecone with automatic embedding generation
|
| 83 |
-
- Wait for processing to complete (may take 1-5 minutes depending on repo size)
|
| 84 |
-
|
| 85 |
-
5. **Initialize the AI model** (Optional but recommended)
|
| 86 |
-
- Click "🚀 Initialize LLM" to start loading the local AI model
|
| 87 |
-
- This will load Qwen2.5-Coder-7B-Instruct for intelligent code analysis
|
| 88 |
-
- Initial loading takes 1-3 minutes
|
| 89 |
-
|
| 90 |
-
6. **Query your code**
|
| 91 |
-
- Ask natural language questions like:
|
| 92 |
-
- "What does this repository do?"
|
| 93 |
-
- "Show me authentication functions"
|
| 94 |
-
- "How is error handling implemented?"
|
| 95 |
-
- "What are the main classes?"
|
| 96 |
-
- Toggle "Use AI Analysis" for intelligent responses vs basic search results
|
| 97 |
-
- The AI maintains conversation context for follow-up questions
|
| 98 |
-
|
| 99 |
-
## 📊 How It Works
|
| 100 |
-
|
| 101 |
-
### Hierarchical Chunking Strategy
|
| 102 |
-
|
| 103 |
-
The system creates multiple levels of code chunks:
|
| 104 |
-
|
| 105 |
-
**Level 1: File Context**
|
| 106 |
-
- Complete file overview with imports and purpose
|
| 107 |
-
- Metadata: file path, language, total lines
|
| 108 |
-
|
| 109 |
-
**Level 2: Class Chunks**
|
| 110 |
-
- Full class definitions with inheritance and methods
|
| 111 |
-
- Metadata: class name, methods list, relationships
|
| 112 |
-
|
| 113 |
-
**Level 3: Function Chunks**
|
| 114 |
-
- Individual function implementations with signatures
|
| 115 |
-
- Metadata: function name, arguments, complexity score
|
| 116 |
-
|
| 117 |
-
**Level 4: Code Block Chunks**
|
| 118 |
-
- Sub-chunks for complex functions (loops, conditionals, error handling)
|
| 119 |
-
- Metadata: block type, purpose, parent function
|
| 120 |
-
|
| 121 |
-
### Vector Search Process
|
| 122 |
-
|
| 123 |
-
1. **Embedding Generation**: Code chunks are converted to vector embeddings using SentenceTransformers
|
| 124 |
-
2. **Vector Storage**: Embeddings stored in Pinecone with rich metadata
|
| 125 |
-
3. **Semantic Search**: User queries are embedded and matched against stored vectors
|
| 126 |
-
4. **Hybrid Filtering**: Results filtered by chunk type, file path, repository, etc.
|
| 127 |
-
5. **Ranked Results**: Most relevant code sections returned with similarity scores
|
| 128 |
-
|
| 129 |
-
## 🔧 Configuration Options
|
| 130 |
-
|
| 131 |
-
### Supported Languages
|
| 132 |
-
|
| 133 |
-
Currently optimized for Python with basic support for:
|
| 134 |
-
- JavaScript/TypeScript
|
| 135 |
-
- Java
|
| 136 |
-
- C/C++
|
| 137 |
-
- Go
|
| 138 |
-
- Rust
|
| 139 |
-
- PHP
|
| 140 |
-
- Ruby
|
| 141 |
-
|
| 142 |
-
## 📝 Example Repositories
|
| 143 |
-
|
| 144 |
-
Try these public repositories:
|
| 145 |
-
|
| 146 |
-
- **Flask**: `https://github.com/pallets/flask` - Web framework
|
| 147 |
-
- **Requests**: `https://github.com/requests/requests` - HTTP library
|
| 148 |
-
- **FastAPI**: `https://github.com/tiangolo/fastapi` - Modern web framework
|
| 149 |
-
- **Black**: `https://github.com/psf/black` - Code formatter
|
| 150 |
-
|
| 151 |
-
## 🔍 Example Queries
|
| 152 |
-
|
| 153 |
-
### General Repository Understanding
|
| 154 |
-
- "What is the main purpose of this repository?"
|
| 155 |
-
- "What are the core components and how do they interact?"
|
| 156 |
-
- "Show me the project architecture overview"
|
| 157 |
-
|
| 158 |
-
### Function & Class Discovery
|
| 159 |
-
- "What are the main classes and their responsibilities?"
|
| 160 |
-
- "Show me all authentication-related functions"
|
| 161 |
-
- "Find functions that handle file operations"
|
| 162 |
-
- "What utility functions are available?"
|
| 163 |
-
|
| 164 |
-
### Implementation Analysis
|
| 165 |
-
- "How is error handling implemented?"
|
| 166 |
-
- "Show me configuration management code"
|
| 167 |
-
- "Find database-related functions"
|
| 168 |
-
- "How does logging work in this project?"
|
| 169 |
-
|
| 170 |
-
### Code Patterns
|
| 171 |
-
- "Show me decorator implementations"
|
| 172 |
-
- "Find async/await usage patterns"
|
| 173 |
-
- "What design patterns are used?"
|
| 174 |
-
- "How are tests structured?"
|
| 175 |
-
|
| 176 |
-
## 🛟 Troubleshooting
|
| 177 |
-
|
| 178 |
-
### Common Issues
|
| 179 |
-
|
| 180 |
-
**"Pinecone API key is required"**
|
| 181 |
-
- Make sure you've set the `PINECONE_API_KEY` environment variable
|
| 182 |
-
- Or enter it in the Advanced Options section
|
| 183 |
-
|
| 184 |
-
**"Error downloading repository"**
|
| 185 |
-
- Check that the GitHub URL is correct and public
|
| 186 |
-
- Ensure you have internet connection
|
| 187 |
-
- Large repositories may timeout - try smaller repos first
|
| 188 |
-
|
| 189 |
-
**"No chunks generated"**
|
| 190 |
-
- Make sure the repository contains supported code files
|
| 191 |
-
- Check that ZIP files aren't corrupted
|
| 192 |
-
- Python files work best currently
|
| 193 |
-
|
| 194 |
-
**"Vector store initialization failed"**
|
| 195 |
-
- Verify your Pinecone API key is valid
|
| 196 |
-
- Check your Pinecone account hasn't exceeded free tier limits
|
| 197 |
-
- Try a different environment region if needed
|
| 198 |
-
|
| 199 |
-
### Performance Tips
|
| 200 |
-
|
| 201 |
-
- Start with smaller repositories (< 100 files) to test
|
| 202 |
-
- Python repositories work best currently
|
| 203 |
-
- Processing time scales with repository size
|
| 204 |
-
- Queries are fast once processing is complete
|
| 205 |
-
|
| 206 |
-
## 🔮 Future Enhancements
|
| 207 |
-
|
| 208 |
-
- **More Language Support**: Better parsing for JavaScript, Java, etc.
|
| 209 |
-
- **Code Generation**: AI-powered code completion and generation
|
| 210 |
-
- **Diff Analysis**: Compare changes between repository versions
|
| 211 |
-
- **Team Collaboration**: Share analyzed repositories
|
| 212 |
-
- **Custom Embeddings**: Fine-tuned models for specific domains
|
| 213 |
-
- **API Integration**: REST API for programmatic access
|
| 214 |
-
|
| 215 |
-
## 🤝 Contributing
|
| 216 |
-
|
| 217 |
-
Contributions welcome! Please open issues or submit pull requests.
|
| 218 |
-
|
| 219 |
-
## 📞 Support
|
| 220 |
-
|
| 221 |
-
For issues or questions:
|
| 222 |
-
1. Check the troubleshooting section above
|
| 223 |
-
2. Open a GitHub issue with detailed error messages
|
| 224 |
-
3. Include your Python version and OS information
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Code Compass
|
| 3 |
+
emoji: 💬
|
| 4 |
+
colorFrom: yellow
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.42.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
hf_oauth: true
|
| 11 |
+
hf_oauth_scopes:
|
| 12 |
+
- inference-api
|
| 13 |
+
short_description: An AI-powered tool for analyzing code repositories
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|