Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: ✏️
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: blue
|
|
@@ -11,177 +11,3 @@ hf_oauth_scopes:
|
|
| 11 |
- manage-repos
|
| 12 |
- inference-api
|
| 13 |
---
|
| 14 |
-
|
| 15 |
-
# Research Article Template Editor
|
| 16 |
-
|
| 17 |
-
A collaborative, real-time editor for web-native scientific articles. It lets multiple authors co-write a paper with rich text, math, citations, figures and interactive D3 embeds, then publishes the result as a static HTML page (or a PDF) aligned with the [research-article-template](https://github.com/huggingface/research-article-template).
|
| 18 |
-
|
| 19 |
-
## What it gives you
|
| 20 |
-
|
| 21 |
-
- **Real-time collaboration** over WebSocket (Y.js + Hocuspocus), with visible cursors and per-user selection colors
|
| 22 |
-
- **Rich article authoring**: headings, lists, tables, code blocks with syntax highlighting, LaTeX math (KaTeX), footnotes, sidenotes, block quotes, callouts
|
| 23 |
-
- **Research-specific blocks**: citations + bibliography (BibTeX), figures with captions, stacks / wide / full-width layouts, glossary terms, Mermaid/Wardley/architecture diagrams
|
| 24 |
-
- **Interactive D3 embeds** authored inline: each embed is a self-contained HTML file the editor can generate and iterate on via an **AI-assisted "embed studio"**
|
| 25 |
-
- **Comments & discussion** anchored on any selection
|
| 26 |
-
- **Slash menu** (`/`) and drag/drop block handles, in the spirit of Notion
|
| 27 |
-
- **Click-to-edit frontmatter**: title, subtitle, authors, affiliations, links, banner color
|
| 28 |
-
- **Publishing pipeline**: one-click export to a standalone static HTML bundle, plus PDF generation (Puppeteer) and an `llms.txt` Markdown twin for LLM agents/crawlers (served at `/llms.txt`, advertised in `/robots.txt`)
|
| 29 |
-
- **Persistence**:
|
| 30 |
-
- Local mode: documents stored on disk under `DATA_DIR`
|
| 31 |
-
- HF mode: documents pushed/pulled from a Hugging Face dataset via OAuth
|
| 32 |
-
- **Dark mode**, responsive layout (TOC drawer on mobile), live table of contents with scroll-spy
|
| 33 |
-
- **AI chat side-panel** that can edit the article via structured tool calls (agent loop over the current TipTap doc)
|
| 34 |
-
|
| 35 |
-
## Stack
|
| 36 |
-
|
| 37 |
-
| Layer | Tech |
|
| 38 |
-
|---|---|
|
| 39 |
-
| Editor | React 18, TypeScript, TipTap v3, ProseMirror |
|
| 40 |
-
| Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
|
| 41 |
-
| Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
|
| 42 |
-
| Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
|
| 43 |
-
| AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`) → Hugging Face Inference Providers (OpenAI-compatible router) |
|
| 44 |
-
| Styling | Plain CSS with custom properties, no framework |
|
| 45 |
-
| Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
|
| 46 |
-
| Container | Single-image Docker build, runs on port 8080 |
|
| 47 |
-
|
| 48 |
-
Around **3.6k LOC backend** and **9.5k LOC frontend** (TypeScript/TSX, excluding generated code).
|
| 49 |
-
|
| 50 |
-
## Repo layout
|
| 51 |
-
|
| 52 |
-
```
|
| 53 |
-
collab-editor/
|
| 54 |
-
├── backend/ # Express + Hocuspocus server, publisher, AI agent routes
|
| 55 |
-
│ └── src/
|
| 56 |
-
│ ├── server.ts # Entry point
|
| 57 |
-
│ ├── create-app.ts # App factory (routes, middleware, Hocuspocus)
|
| 58 |
-
│ ├── publisher/ # TipTap-JSON → HTML + PDF
|
| 59 |
-
│ ├── agent/ # LLM agent (tool calls over the doc)
|
| 60 |
-
│ ├── shared/ # Component defs shared with the frontend
|
| 61 |
-
│ └── hf-storage.ts # HF dataset sync
|
| 62 |
-
├── frontend/ # Vite + React + TipTap editor
|
| 63 |
-
│ └── src/
|
| 64 |
-
│ ├── App.tsx # Top-level shell
|
| 65 |
-
│ ├── editor/ # TipTap editor + extensions + components
|
| 66 |
-
│ ├── components/ # Shared UI pieces (TOC, Chat, Dialog, ...)
|
| 67 |
-
│ ├── hooks/ # React hooks (agent chat, selection, ...)
|
| 68 |
-
│ ├── styles/ # CSS layers (see docs/ARCHITECTURE.md)
|
| 69 |
-
│ └── utils/
|
| 70 |
-
├── docs/
|
| 71 |
-
│ ├── ARCHITECTURE.md # Deep dive on layers, data flow, CSS
|
| 72 |
-
│ ├── SPECIFICATION.md # Feature spec and contracts
|
| 73 |
-
│ ├── TESTS.md # Testing strategy
|
| 74 |
-
│ └── embed-studio.md # How the AI-authored embeds pipeline works
|
| 75 |
-
└── Dockerfile # Production multi-stage build
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a diagram and the full tour.
|
| 79 |
-
|
| 80 |
-
## Getting started
|
| 81 |
-
|
| 82 |
-
### Prerequisites
|
| 83 |
-
|
| 84 |
-
- Node.js 20+
|
| 85 |
-
- A Hugging Face token with the `Make calls to Inference Providers` permission for the AI features (embed studio, chat agent). Generate one at https://huggingface.co/settings/tokens. On a HF Space the logged-in user's OAuth token is used instead - no manual setup needed.
|
| 86 |
-
- A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
|
| 87 |
-
|
| 88 |
-
### Local development
|
| 89 |
-
|
| 90 |
-
Backend and frontend run as two separate processes in dev (Vite proxies `/api`, `/collab`, `/uploads`, `/published`, `/oauth`, `/auth` to the backend).
|
| 91 |
-
|
| 92 |
-
```bash
|
| 93 |
-
# terminal 1 — backend (Express + Hocuspocus on :8080)
|
| 94 |
-
cd backend
|
| 95 |
-
cp .env.example .env # set HF_TOKEN, optional OAUTH_* and HF_DATASET_ID
|
| 96 |
-
npm install
|
| 97 |
-
npm run dev
|
| 98 |
-
|
| 99 |
-
# terminal 2 — frontend (Vite on :5678)
|
| 100 |
-
cd frontend
|
| 101 |
-
npm install
|
| 102 |
-
npm run dev
|
| 103 |
-
```
|
| 104 |
-
|
| 105 |
-
Then open http://localhost:5678. Open a second tab or browser to see collaboration in action.
|
| 106 |
-
|
| 107 |
-
### Production (Docker / HF Spaces)
|
| 108 |
-
|
| 109 |
-
The `Dockerfile` builds both frontend and backend into a single image listening on port 8080. This is the image used by the Hugging Face Space.
|
| 110 |
-
|
| 111 |
-
```bash
|
| 112 |
-
docker build -t collab-editor .
|
| 113 |
-
docker run -p 8080:8080 --env-file backend/.env collab-editor
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
Then open http://localhost:8080.
|
| 117 |
-
|
| 118 |
-
### Run your own copy on a Hugging Face Space
|
| 119 |
-
|
| 120 |
-
Want your own editor? One step:
|
| 121 |
-
|
| 122 |
-
1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
|
| 123 |
-
|
| 124 |
-
That's it. No API key to wire up. The AI features (chat agent + embed studio) call **Hugging Face Inference Providers** at `https://router.huggingface.co/v1` using the OAuth token of whoever is currently logged in. As long as your duplicated Space requests the `inference-api` scope (already declared in the README frontmatter as `hf_oauth_scopes`), every editor gets AI for free under their own Inference Providers quota.
|
| 125 |
-
|
| 126 |
-
Optional public variable: `HF_INFERENCE_MODEL` (e.g. `meta-llama/Llama-3.3-70B-Instruct`) to override the default model id. The full list of supported chat-completion models lives at https://huggingface.co/models?inference_provider=all&other=conversational.
|
| 127 |
-
|
| 128 |
-
## Scripts
|
| 129 |
-
|
| 130 |
-
### Backend (`cd backend`)
|
| 131 |
-
|
| 132 |
-
| Command | What it does |
|
| 133 |
-
|---|---|
|
| 134 |
-
| `npm run dev` | Start Express + Hocuspocus in watch mode |
|
| 135 |
-
| `npm run build` | Compile TypeScript to `dist/` |
|
| 136 |
-
| `npm start` | Run the compiled server |
|
| 137 |
-
| `npm run test` | Unit + integration tests (Vitest) |
|
| 138 |
-
| `npm run test:e2e` | End-to-end tests (Playwright) |
|
| 139 |
-
|
| 140 |
-
### Frontend (`cd frontend`)
|
| 141 |
-
|
| 142 |
-
| Command | What it does |
|
| 143 |
-
|---|---|
|
| 144 |
-
| `npm run dev` | Start Vite dev server on :5678 |
|
| 145 |
-
| `npm run build` | Production bundle to `dist/` |
|
| 146 |
-
| `npm run preview` | Preview the built bundle |
|
| 147 |
-
| `npm run test` | Unit tests (Vitest) |
|
| 148 |
-
| `npm run typecheck` | `tsc --noEmit` on the whole frontend |
|
| 149 |
-
|
| 150 |
-
## Environment variables
|
| 151 |
-
|
| 152 |
-
Copy `backend/.env.example` to `backend/.env` and fill the relevant values. Key ones:
|
| 153 |
-
|
| 154 |
-
| Variable | Purpose |
|
| 155 |
-
|---|---|
|
| 156 |
-
| `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
|
| 157 |
-
| `OAUTH_SCOPES` | OAuth scopes (default `openid profile`). Add `manage-repos` for dataset persistence and `inference-api` to power the AI features with the user's token |
|
| 158 |
-
| `HF_TOKEN` | Server-side Hugging Face token. Used as a fallback when no user OAuth token is present (e.g. local dev). Needs the `Make calls to Inference Providers` permission to enable the chat agent + embed studio |
|
| 159 |
-
| `HF_INFERENCE_MODEL` | Override the default chat-completion model id (defaults to `openai/gpt-oss-120b`). Any tool-calling-capable model exposed by HF Inference Providers works |
|
| 160 |
-
| `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
|
| 161 |
-
| `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
|
| 162 |
-
| `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |
|
| 163 |
-
| `PUBLISH_BASE_URL` | Absolute base URL used when publishing (defaults to `http://127.0.0.1:${PORT}`) |
|
| 164 |
-
| `ENABLE_PDF` | Set to `false` to disable Playwright-based PDF export |
|
| 165 |
-
| `PORT` | Server port (default 8080) |
|
| 166 |
-
|
| 167 |
-
## Testing
|
| 168 |
-
|
| 169 |
-
- **Backend unit tests**: Vitest covers the publisher (HTML renderer, frontmatter, bibliography), storage, auth utilities.
|
| 170 |
-
- **Backend E2E**: Playwright drives the full editor against a real backend.
|
| 171 |
-
- **Frontend unit tests**: Vitest covers chat persistence and a handful of utilities.
|
| 172 |
-
- **Type checking**: `npm run typecheck` in both workspaces.
|
| 173 |
-
|
| 174 |
-
See [`docs/TESTS.md`](docs/TESTS.md) for the current strategy and gaps.
|
| 175 |
-
|
| 176 |
-
## Known technical debt
|
| 177 |
-
|
| 178 |
-
These are tracked explicitly so new contributors don't trip on them:
|
| 179 |
-
|
| 180 |
-
- **`useEmbedChat` still lacks dedicated unit tests**; the rest of the stores (frontmatter, comments, embeds) and the agent undo batching primitive are now covered.
|
| 181 |
-
- **Bundle size warning**: the frontend bundle is over the 500 kB Vite warning threshold. Code-splitting the Mermaid / KaTeX / D3 stacks via dynamic imports would help.
|
| 182 |
-
- **`addToolOutput` typing**: the ai-sdk v6 `ChatAddToolOutputFunction` is a generic over the tool name union. We currently cast to a plain signature at the two call sites because we don't export a typed tool registry yet.
|
| 183 |
-
- **`backend/src/publisher/html-renderer.ts` is ~1000 LOC**: a per-node-type registry would make it more maintainable.
|
| 184 |
-
|
| 185 |
-
## License
|
| 186 |
-
|
| 187 |
-
Follow the upstream [research-article-template](https://github.com/huggingface/research-article-template) license.
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Carbon tokenization
|
| 3 |
emoji: ✏️
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: blue
|
|
|
|
| 11 |
- manage-repos
|
| 12 |
- inference-api
|
| 13 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|