YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸš€ ARCHON


🧠 What is this?

A system that analyzes any GitHub repository and generates deep technical interview questions + answers based on:

  • Architecture
  • Scalability
  • Tradeoffs
  • Real-world engineering decisions

⚑ Works even without LLM access using a fallback heuristic engine.


✨ Features

πŸ” Repository Analysis

  • Fetches and parses GitHub repositories via API
  • Prioritizes important files (core logic > boilerplate)
  • Supports multiple languages

🧩 Intelligent Chunking

  • Breaks code into meaningful chunks
  • Filters noise (non-informative code)
  • Preserves structural context

🧠 Embedding + Retrieval (RAG)

  • Uses SentenceTransformers
  • Retrieves most relevant code sections
  • Builds contextual understanding of system design

πŸ€– AI Question Generation

  • Generates interview-level questions on:

    • Architecture decisions
    • Scalability concerns
    • Tradeoffs

⚑ Fallback Mode (No LLM Required)

  • Automatically switches to rule-based generation

  • Uses detected signals:

    • API usage
    • State management
    • Auth systems
    • Async logic

🧱 System Architecture

GitHub Repo
   ↓
File Fetching + Prioritization
   ↓
Chunking + Filtering
   ↓
Embeddings (SentenceTransformers)
   ↓
Vector Similarity Retrieval
   ↓
Context Builder
   ↓
AI Question Generator
   ↓
Fallback Engine (if LLM unavailable)

πŸ› οΈ Tech Stack

Backend:
- FastAPI
- Python

AI / ML:
- SentenceTransformers
- Cosine Similarity (Sklearn)

Data:
- GitHub REST API

Frontend:
- Minimal Web UI (React / HTML)

Optional:
- OpenAI API (LLM generation)

βš™οΈ How it Works

  1. Input a GitHub repo URL

  2. System fetches and filters key files

  3. Code is chunked and embedded

  4. Relevant chunks are retrieved

  5. Questions are generated using:

    • LLM (if available)
    • OR fallback heuristic engine

πŸ“‘ API Usage

POST /analyze

{
  "repo_url": "https://github.com/user/repo",
  "num_questions": 5
}

Response

{
  "repo": "...",
  "mode": "mock",
  "questions": [
    {
      "id": 1,
      "question": "...",
      "answer": "..."
    }
  ]
}

⚠️ Challenges Solved

  • Large repo handling (chunking + prioritization)
  • Token limitations (retrieval instead of full context)
  • LLM dependency β†’ solved with fallback system
  • Noise reduction in code analysis

πŸ’‘ Future Improvements

  • πŸ”₯ Dynamic repo-type detection (ML, backend, real-time, etc.)
  • πŸ“Š Question difficulty levels (junior β†’ senior)
  • πŸ”— Follow-up interview questions
  • 🧠 Hybrid LLM + rule-based reasoning
  • ⚑ Caching + performance optimization

πŸ§‘β€πŸ’» Author

Built by Dave β€” aspiring systems engineer ⚑


🎬 Demo


⭐ Support

If this project helped or inspired you:

  • ⭐ Star the repo
  • 🍴 Fork it
  • 🧠 Build something even crazier

β€œDon’t just read code. Interrogate it.”

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support