Spaces:

shahzeb171
/

code-compass

Sleeping

App Files Files Community

shahzeb171 commited on Sep 25

Commit

e0b4927

1 Parent(s): 60344c1

Readme update

Browse files

Files changed (1) hide show

README.md +16 -224

README.md CHANGED Viewed

@@ -1,224 +1,16 @@
-# 🔍code-compass
-An AI-powered tool for analyzing code repositories using hierarchical chunking and semantic search with Pinecone vector database.
-## 🚀 Features
-- **📥 Multiple Input Methods**: GitHub URLs or ZIP file uploads
-- **🧠 Hierarchical Chunking**: Smart code parsing at multiple levels (file → class → function → block)
-- **🔍 Semantic Search**: AI-powered natural language queries using Pinecone vector database
-- **🤖 Intelligent Analysis**: Local LLM integration with Qwen2.5-Coder-7B-Instruct
-- **💬 Conversation History**: Maintains context across multiple queries
-- **📊 Repository Analytics**: Comprehensive statistics and structure analysis
-- **🎯 Pinecone Integration**: Scalable vector database with automatic embedding generation
-- **⚡ Optimized Performance**: Quantized models for efficient local inference
-## 🛠️ Setup
-### Prerequisites
-1. **Python 3.8+**
-2. **Pinecone Account**: Create a free account at [Pinecone.io](https://www.pinecone.io/)
-3. **System Requirements** for LLM:
-   - **RAM**: 8GB minimum (16GB+ recommended)
-   - **Storage**: 5-8GB free space for model
-   - **CPU**: Multi-core processor (supports GPU acceleration if available)
-### Installation
-1. **Clone or download this project**
-   ```bash
-   git clone https://github.com/shahzeb171/code-compass.git
-   cd code-compass
-   ```
-2. **Install dependencies**
-   ```bash
-   pip install -r requirements.txt
-   ```
-3. **Download the LLM model**
-   ```
-   wget https://huggingface.co/bartowski/Qwen2.5-Coder-7B-Instruct-GGUF/resolve/main/Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf
-   ```
-   **Recommended**: Select Q4_K_M for the best balance of quality and performance.
-4. **Set up Pinecone API Key**
-   Create `config.py` file:
-   ```
-   PINECONE_API_KEY=your-pinecone-api-key-here
-   PINECONE_INDEX_NAME=index_name(eg. code_compass_index)
-   PINECONE_EMBEDDING_MODEL=embedding_model(eg. llama-text-embed-v2 (check pinecone docs for more models))
-   MODEL_PATH=path_to_the_model
-   ```
-### Getting Your Pinecone API Key
-1. Go to [Pinecone.io](https://www.pinecone.io/) and sign up for a free account
-2. Navigate to the "API Keys" section in your dashboard
-3. Create a new API key or copy an existing one
-4. The free tier includes:
-   - 1 index
-   - 5M vector dimensions
-   - Enough for most code analysis projects!
-## 🚀 Usage
-1. **Start the application**
-   ```bash
-   python main.py
-   ```
-2. **Open your browser** to `http://localhost:7860`
-3. **Load a repository**
-   - Enter a GitHub URL (e.g., `https://github.com/pallets/flask`)
-   - Or upload a ZIP file of your code
-   - Click "📁 Load Repository"
-4. **Process the repository**
-   - Click "🚀 Process Repository" to analyze and chunk your code
-   - This creates hierarchical chunks and stores them in Pinecone with automatic embedding generation
-   - Wait for processing to complete (may take 1-5 minutes depending on repo size)
-5. **Initialize the AI model** (Optional but recommended)
-   - Click "🚀 Initialize LLM" to start loading the local AI model
-   - This will load Qwen2.5-Coder-7B-Instruct for intelligent code analysis
-   - Initial loading takes 1-3 minutes
-6. **Query your code**
-   - Ask natural language questions like:
-     - "What does this repository do?"
-     - "Show me authentication functions"
-     - "How is error handling implemented?"
-     - "What are the main classes?"
-   - Toggle "Use AI Analysis" for intelligent responses vs basic search results
-   - The AI maintains conversation context for follow-up questions
-## 📊 How It Works
-### Hierarchical Chunking Strategy
-The system creates multiple levels of code chunks:
-**Level 1: File Context**
-- Complete file overview with imports and purpose
-- Metadata: file path, language, total lines
-**Level 2: Class Chunks**
-- Full class definitions with inheritance and methods
-- Metadata: class name, methods list, relationships
-**Level 3: Function Chunks**
-- Individual function implementations with signatures
-- Metadata: function name, arguments, complexity score
-**Level 4: Code Block Chunks**
-- Sub-chunks for complex functions (loops, conditionals, error handling)
-- Metadata: block type, purpose, parent function
-### Vector Search Process
-1. **Embedding Generation**: Code chunks are converted to vector embeddings using SentenceTransformers
-2. **Vector Storage**: Embeddings stored in Pinecone with rich metadata
-3. **Semantic Search**: User queries are embedded and matched against stored vectors
-4. **Hybrid Filtering**: Results filtered by chunk type, file path, repository, etc.
-5. **Ranked Results**: Most relevant code sections returned with similarity scores
-## 🔧 Configuration Options
-### Supported Languages
-Currently optimized for Python with basic support for:
-- JavaScript/TypeScript
-- Java
-- C/C++
-- Go
-- Rust
-- PHP
-- Ruby
-## 📝 Example Repositories
-Try these public repositories:
-- **Flask**: `https://github.com/pallets/flask` - Web framework
-- **Requests**: `https://github.com/requests/requests` - HTTP library
-- **FastAPI**: `https://github.com/tiangolo/fastapi` - Modern web framework
-- **Black**: `https://github.com/psf/black` - Code formatter
-## 🔍 Example Queries
-### General Repository Understanding
-- "What is the main purpose of this repository?"
-- "What are the core components and how do they interact?"
-- "Show me the project architecture overview"
-### Function & Class Discovery
-- "What are the main classes and their responsibilities?"
-- "Show me all authentication-related functions"
-- "Find functions that handle file operations"
-- "What utility functions are available?"
-### Implementation Analysis
-- "How is error handling implemented?"
-- "Show me configuration management code"
-- "Find database-related functions"
-- "How does logging work in this project?"
-### Code Patterns
-- "Show me decorator implementations"
-- "Find async/await usage patterns"
-- "What design patterns are used?"
-- "How are tests structured?"
-## 🛟 Troubleshooting
-### Common Issues
-**"Pinecone API key is required"**
-- Make sure you've set the `PINECONE_API_KEY` environment variable
-- Or enter it in the Advanced Options section
-**"Error downloading repository"**
-- Check that the GitHub URL is correct and public
-- Ensure you have internet connection
-- Large repositories may timeout - try smaller repos first
-**"No chunks generated"**
-- Make sure the repository contains supported code files
-- Check that ZIP files aren't corrupted
-- Python files work best currently
-**"Vector store initialization failed"**
-- Verify your Pinecone API key is valid
-- Check your Pinecone account hasn't exceeded free tier limits
-- Try a different environment region if needed
-### Performance Tips
-- Start with smaller repositories (< 100 files) to test
-- Python repositories work best currently
-- Processing time scales with repository size
-- Queries are fast once processing is complete
-## 🔮 Future Enhancements
-- **More Language Support**: Better parsing for JavaScript, Java, etc.
-- **Code Generation**: AI-powered code completion and generation
-- **Diff Analysis**: Compare changes between repository versions
-- **Team Collaboration**: Share analyzed repositories
-- **Custom Embeddings**: Fine-tuned models for specific domains
-- **API Integration**: REST API for programmatic access
-## 🤝 Contributing
-Contributions welcome! Please open issues or submit pull requests.
-## 📞 Support
-For issues or questions:
-1. Check the troubleshooting section above
-2. Open a GitHub issue with detailed error messages
-3. Include your Python version and OS information

+---
+title: Code Compass
+emoji: 💬
+colorFrom: yellow
+colorTo: purple
+sdk: gradio
+sdk_version: 5.42.0
+app_file: app.py
+pinned: false
+hf_oauth: true
+hf_oauth_scopes:
+- inference-api
+short_description: An AI-powered tool for analyzing code repositories
+---
+An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).