YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π ARCHON
π§ What is this?
A system that analyzes any GitHub repository and generates deep technical interview questions + answers based on:
- Architecture
- Scalability
- Tradeoffs
- Real-world engineering decisions
β‘ Works even without LLM access using a fallback heuristic engine.
β¨ Features
π Repository Analysis
- Fetches and parses GitHub repositories via API
- Prioritizes important files (core logic > boilerplate)
- Supports multiple languages
π§© Intelligent Chunking
- Breaks code into meaningful chunks
- Filters noise (non-informative code)
- Preserves structural context
π§ Embedding + Retrieval (RAG)
- Uses SentenceTransformers
- Retrieves most relevant code sections
- Builds contextual understanding of system design
π€ AI Question Generation
Generates interview-level questions on:
- Architecture decisions
- Scalability concerns
- Tradeoffs
β‘ Fallback Mode (No LLM Required)
Automatically switches to rule-based generation
Uses detected signals:
- API usage
- State management
- Auth systems
- Async logic
π§± System Architecture
GitHub Repo
β
File Fetching + Prioritization
β
Chunking + Filtering
β
Embeddings (SentenceTransformers)
β
Vector Similarity Retrieval
β
Context Builder
β
AI Question Generator
β
Fallback Engine (if LLM unavailable)
π οΈ Tech Stack
Backend:
- FastAPI
- Python
AI / ML:
- SentenceTransformers
- Cosine Similarity (Sklearn)
Data:
- GitHub REST API
Frontend:
- Minimal Web UI (React / HTML)
Optional:
- OpenAI API (LLM generation)
βοΈ How it Works
Input a GitHub repo URL
System fetches and filters key files
Code is chunked and embedded
Relevant chunks are retrieved
Questions are generated using:
- LLM (if available)
- OR fallback heuristic engine
π‘ API Usage
POST /analyze
{
"repo_url": "https://github.com/user/repo",
"num_questions": 5
}
Response
{
"repo": "...",
"mode": "mock",
"questions": [
{
"id": 1,
"question": "...",
"answer": "..."
}
]
}
β οΈ Challenges Solved
- Large repo handling (chunking + prioritization)
- Token limitations (retrieval instead of full context)
- LLM dependency β solved with fallback system
- Noise reduction in code analysis
π‘ Future Improvements
- π₯ Dynamic repo-type detection (ML, backend, real-time, etc.)
- π Question difficulty levels (junior β senior)
- π Follow-up interview questions
- π§ Hybrid LLM + rule-based reasoning
- β‘ Caching + performance optimization
π§βπ» Author
Built by Dave β aspiring systems engineer β‘
π¬ Demo
β Support
If this project helped or inspired you:
- β Star the repo
- π΄ Fork it
- π§ Build something even crazier
βDonβt just read code. Interrogate it.β
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support