carbon-tokenization

Running

App Files Files Community

tfrere HF Staff commited on May 20

Commit

3557bec

verified ·

1 Parent(s): 50f7a20

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -175

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Research Article Template Editor
 emoji: ✏️
 colorFrom: purple
 colorTo: blue
@@ -11,177 +11,3 @@ hf_oauth_scopes:
   - manage-repos
   - inference-api
 ---
-# Research Article Template Editor
-A collaborative, real-time editor for web-native scientific articles. It lets multiple authors co-write a paper with rich text, math, citations, figures and interactive D3 embeds, then publishes the result as a static HTML page (or a PDF) aligned with the [research-article-template](https://github.com/huggingface/research-article-template).
-## What it gives you
-- **Real-time collaboration** over WebSocket (Y.js + Hocuspocus), with visible cursors and per-user selection colors
-- **Rich article authoring**: headings, lists, tables, code blocks with syntax highlighting, LaTeX math (KaTeX), footnotes, sidenotes, block quotes, callouts
-- **Research-specific blocks**: citations + bibliography (BibTeX), figures with captions, stacks / wide / full-width layouts, glossary terms, Mermaid/Wardley/architecture diagrams
-- **Interactive D3 embeds** authored inline: each embed is a self-contained HTML file the editor can generate and iterate on via an **AI-assisted "embed studio"**
-- **Comments & discussion** anchored on any selection
-- **Slash menu** (`/`) and drag/drop block handles, in the spirit of Notion
-- **Click-to-edit frontmatter**: title, subtitle, authors, affiliations, links, banner color
-- **Publishing pipeline**: one-click export to a standalone static HTML bundle, plus PDF generation (Puppeteer) and an `llms.txt` Markdown twin for LLM agents/crawlers (served at `/llms.txt`, advertised in `/robots.txt`)
-- **Persistence**:
-  - Local mode: documents stored on disk under `DATA_DIR`
-  - HF mode: documents pushed/pulled from a Hugging Face dataset via OAuth
-- **Dark mode**, responsive layout (TOC drawer on mobile), live table of contents with scroll-spy
-- **AI chat side-panel** that can edit the article via structured tool calls (agent loop over the current TipTap doc)
-## Stack
-| Layer | Tech |
-|---|---|
-| Editor | React 18, TypeScript, TipTap v3, ProseMirror |
-| Collaboration | Y.js, Hocuspocus (WebSocket), y-tiptap |
-| Backend | Node.js, Express, Vite (dev proxy), Hocuspocus server |
-| Publishing | Custom TipTap-JSON → HTML renderer, Puppeteer for PDF |
-| AI | Vercel AI SDK v6 (`ai`, `@ai-sdk/react`) → Hugging Face Inference Providers (OpenAI-compatible router) |
-| Styling | Plain CSS with custom properties, no framework |
-| Storage | Local FS or Hugging Face datasets (via `@huggingface/hub`) |
-| Container | Single-image Docker build, runs on port 8080 |
-Around **3.6k LOC backend** and **9.5k LOC frontend** (TypeScript/TSX, excluding generated code).
-## Repo layout
-```
-collab-editor/
-├── backend/              # Express + Hocuspocus server, publisher, AI agent routes
-│   └── src/
-│       ├── server.ts             # Entry point
-│       ├── create-app.ts         # App factory (routes, middleware, Hocuspocus)
-│       ├── publisher/            # TipTap-JSON → HTML + PDF
-│       ├── agent/                # LLM agent (tool calls over the doc)
-│       ├── shared/               # Component defs shared with the frontend
-│       └── hf-storage.ts         # HF dataset sync
-├── frontend/             # Vite + React + TipTap editor
-│   └── src/
-│       ├── App.tsx               # Top-level shell
-│       ├── editor/               # TipTap editor + extensions + components
-│       ├── components/           # Shared UI pieces (TOC, Chat, Dialog, ...)
-│       ├── hooks/                # React hooks (agent chat, selection, ...)
-│       ├── styles/               # CSS layers (see docs/ARCHITECTURE.md)
-│       └── utils/
-├── docs/
-│   ├── ARCHITECTURE.md           # Deep dive on layers, data flow, CSS
-│   ├── SPECIFICATION.md          # Feature spec and contracts
-│   ├── TESTS.md                  # Testing strategy
-│   └── embed-studio.md           # How the AI-authored embeds pipeline works
-└── Dockerfile            # Production multi-stage build
-```
-See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) for a diagram and the full tour.
-## Getting started
-### Prerequisites
-- Node.js 20+
-- A Hugging Face token with the `Make calls to Inference Providers` permission for the AI features (embed studio, chat agent). Generate one at https://huggingface.co/settings/tokens. On a HF Space the logged-in user's OAuth token is used instead - no manual setup needed.
-- A Hugging Face OAuth app (client id/secret) if you want login + HF dataset persistence
-### Local development
-Backend and frontend run as two separate processes in dev (Vite proxies `/api`, `/collab`, `/uploads`, `/published`, `/oauth`, `/auth` to the backend).
-```bash
-# terminal 1 — backend (Express + Hocuspocus on :8080)
-cd backend
-cp .env.example .env          # set HF_TOKEN, optional OAUTH_* and HF_DATASET_ID
-npm install
-npm run dev
-# terminal 2 — frontend (Vite on :5678)
-cd frontend
-npm install
-npm run dev
-```
-Then open http://localhost:5678. Open a second tab or browser to see collaboration in action.
-### Production (Docker / HF Spaces)
-The `Dockerfile` builds both frontend and backend into a single image listening on port 8080. This is the image used by the Hugging Face Space.
-```bash
-docker build -t collab-editor .
-docker run -p 8080:8080 --env-file backend/.env collab-editor
-```
-Then open http://localhost:8080.
-### Run your own copy on a Hugging Face Space
-Want your own editor? One step:
-1. **Duplicate the Space.** On https://huggingface.co/spaces/tfrere/research-article-template-editor, click `⋯ → Duplicate this Space`. Pick your namespace and visibility. HF copies the Dockerfile, the OAuth wiring and rebuilds the image automatically.
-That's it. No API key to wire up. The AI features (chat agent + embed studio) call **Hugging Face Inference Providers** at `https://router.huggingface.co/v1` using the OAuth token of whoever is currently logged in. As long as your duplicated Space requests the `inference-api` scope (already declared in the README frontmatter as `hf_oauth_scopes`), every editor gets AI for free under their own Inference Providers quota.
-Optional public variable: `HF_INFERENCE_MODEL` (e.g. `meta-llama/Llama-3.3-70B-Instruct`) to override the default model id. The full list of supported chat-completion models lives at https://huggingface.co/models?inference_provider=all&other=conversational.
-## Scripts
-### Backend (`cd backend`)
-| Command | What it does |
-|---|---|
-| `npm run dev` | Start Express + Hocuspocus in watch mode |
-| `npm run build` | Compile TypeScript to `dist/` |
-| `npm start` | Run the compiled server |
-| `npm run test` | Unit + integration tests (Vitest) |
-| `npm run test:e2e` | End-to-end tests (Playwright) |
-### Frontend (`cd frontend`)
-| Command | What it does |
-|---|---|
-| `npm run dev` | Start Vite dev server on :5678 |
-| `npm run build` | Production bundle to `dist/` |
-| `npm run preview` | Preview the built bundle |
-| `npm run test` | Unit tests (Vitest) |
-| `npm run typecheck` | `tsc --noEmit` on the whole frontend |
-## Environment variables
-Copy `backend/.env.example` to `backend/.env` and fill the relevant values. Key ones:
-| Variable | Purpose |
-|---|---|
-| `OAUTH_CLIENT_ID` / `OAUTH_CLIENT_SECRET` | HF OAuth app for user login (required to edit when running on a Space) |
-| `OAUTH_SCOPES` | OAuth scopes (default `openid profile`). Add `manage-repos` for dataset persistence and `inference-api` to power the AI features with the user's token |
-| `HF_TOKEN` | Server-side Hugging Face token. Used as a fallback when no user OAuth token is present (e.g. local dev). Needs the `Make calls to Inference Providers` permission to enable the chat agent + embed studio |
-| `HF_INFERENCE_MODEL` | Override the default chat-completion model id (defaults to `openai/gpt-oss-120b`). Any tool-calling-capable model exposed by HF Inference Providers works |
-| `HF_DATASET_ID` | Target HF dataset repo for document persistence (when not running on a Space) |
-| `SPACE_ID` / `SPACE_HOST` | Auto-set by HF Spaces; drive dataset id + secure cookies in production |
-| `DATA_DIR` | Where documents, uploads and published bundles are stored on disk (default: `./data`) |
-| `PUBLISH_BASE_URL` | Absolute base URL used when publishing (defaults to `http://127.0.0.1:${PORT}`) |
-| `ENABLE_PDF` | Set to `false` to disable Playwright-based PDF export |
-| `PORT` | Server port (default 8080) |
-## Testing
-- **Backend unit tests**: Vitest covers the publisher (HTML renderer, frontmatter, bibliography), storage, auth utilities.
-- **Backend E2E**: Playwright drives the full editor against a real backend.
-- **Frontend unit tests**: Vitest covers chat persistence and a handful of utilities.
-- **Type checking**: `npm run typecheck` in both workspaces.
-See [`docs/TESTS.md`](docs/TESTS.md) for the current strategy and gaps.
-## Known technical debt
-These are tracked explicitly so new contributors don't trip on them:
-- **`useEmbedChat` still lacks dedicated unit tests**; the rest of the stores (frontmatter, comments, embeds) and the agent undo batching primitive are now covered.
-- **Bundle size warning**: the frontend bundle is over the 500 kB Vite warning threshold. Code-splitting the Mermaid / KaTeX / D3 stacks via dynamic imports would help.
-- **`addToolOutput` typing**: the ai-sdk v6 `ChatAddToolOutputFunction` is a generic over the tool name union. We currently cast to a plain signature at the two call sites because we don't export a typed tool registry yet.
-- **`backend/src/publisher/html-renderer.ts` is ~1000 LOC**: a per-node-type registry would make it more maintainable.
-## License
-Follow the upstream [research-article-template](https://github.com/huggingface/research-article-template) license.

 ---
+title: Carbon tokenization
 emoji: ✏️
 colorFrom: purple
 colorTo: blue
   - manage-repos
   - inference-api
 ---