YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🚀 ARCHON

🧠 What is this?

A system that analyzes any GitHub repository and generates deep technical interview questions + answers based on:

Architecture
Scalability
Tradeoffs
Real-world engineering decisions

⚡ Works even without LLM access using a fallback heuristic engine.

✨ Features

🔍 Repository Analysis

Fetches and parses GitHub repositories via API
Prioritizes important files (core logic > boilerplate)
Supports multiple languages

🧩 Intelligent Chunking

Breaks code into meaningful chunks
Filters noise (non-informative code)
Preserves structural context

🧠 Embedding + Retrieval (RAG)

Uses SentenceTransformers
Retrieves most relevant code sections
Builds contextual understanding of system design

🤖 AI Question Generation

Generates interview-level questions on:
- Architecture decisions
- Scalability concerns
- Tradeoffs

⚡ Fallback Mode (No LLM Required)

Automatically switches to rule-based generation
Uses detected signals:
- API usage
- State management
- Auth systems
- Async logic

🧱 System Architecture

GitHub Repo
   ↓
File Fetching + Prioritization
   ↓
Chunking + Filtering
   ↓
Embeddings (SentenceTransformers)
   ↓
Vector Similarity Retrieval
   ↓
Context Builder
   ↓
AI Question Generator
   ↓
Fallback Engine (if LLM unavailable)

🛠️ Tech Stack

Backend:
- FastAPI
- Python

AI / ML:
- SentenceTransformers
- Cosine Similarity (Sklearn)

Data:
- GitHub REST API

Frontend:
- Minimal Web UI (React / HTML)

Optional:
- OpenAI API (LLM generation)

⚙️ How it Works

Input a GitHub repo URL
System fetches and filters key files
Code is chunked and embedded
Relevant chunks are retrieved
Questions are generated using:
- LLM (if available)
- OR fallback heuristic engine

📡 API Usage

POST `/analyze`

{
  "repo_url": "https://github.com/user/repo",
  "num_questions": 5
}

Response

{
  "repo": "...",
  "mode": "mock",
  "questions": [
    {
      "id": 1,
      "question": "...",
      "answer": "..."
    }
  ]
}

⚠️ Challenges Solved

Large repo handling (chunking + prioritization)
Token limitations (retrieval instead of full context)
LLM dependency → solved with fallback system
Noise reduction in code analysis

💡 Future Improvements

🔥 Dynamic repo-type detection (ML, backend, real-time, etc.)
📊 Question difficulty levels (junior → senior)
🔗 Follow-up interview questions
🧠 Hybrid LLM + rule-based reasoning
⚡ Caching + performance optimization

🧑‍💻 Author

Built by Dave — aspiring systems engineer ⚡

🎬 Demo

⭐ Support

If this project helped or inspired you:

⭐ Star the repo
🍴 Fork it
🧠 Build something even crazier

“Don’t just read code. Interrogate it.”

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support