Find Architecture
This page covers the implementation details behind PinchTab's semantic find pipeline.
Overview
The find system converts accessibility snapshot nodes into lightweight descriptors, scores them against a natural-language query, and returns the best matching ref.
The implementation is designed to stay:
- local
- fast
- dependency-light
- recoverable after page re-renders
Pipeline
accessibility snapshot
-> element descriptors
-> lexical matcher
-> embedding matcher
-> combined score
-> best ref
-> intent cache / recovery hooks
Element Descriptors
Each accessibility node is converted into a descriptor with:
refrolenamevalue
Those fields are also combined into a composite string used for matching.
Matchers
PinchTab currently uses a combined matcher built from:
- a lexical matcher
- an embedding matcher based on a hashing embedder
Default weighting is:
0.6 lexical + 0.4 embedding
Per-request overrides exist through lexicalWeight and embeddingWeight.
Lexical Side
The lexical matcher focuses on exact and near-exact token overlap, including role-aware matching behavior.
Useful properties:
- strong for exact words
- easy to reason about
- good precision on explicit queries like
submit button
Embedding Side
The embedding matcher uses a feature-hashing approach rather than an external ML model.
Useful properties:
- catches fuzzy similarity
- handles partial and sub-word overlap better
- has no model download or network dependency
Combined Matching
The combined matcher runs lexical and embedding scoring concurrently, merges results by element ref, and applies the weighted final score.
It also uses a lower internal threshold before the final merge so that candidates which are only strong on one side are not discarded too early.
Snapshot Dependency
find depends on the same accessibility snapshot/ref-cache infrastructure used by snapshot-driven interaction.
If a cached snapshot is missing, the handler tries to refresh it automatically before giving up.
Intent Cache And Recovery
After a successful match, PinchTab records:
- the original query
- the matched descriptor
- score/confidence metadata
This allows recovery logic to attempt a semantic re-match if a later action fails because the old ref became stale after a page update.
Orchestrator Routing
The orchestrator exposes POST /tabs/{id}/find and proxies it to the correct running instance. The actual matching implementation remains in the shared handler layer.
Design Constraints
The current design intentionally avoids:
- external embedding services
- heavyweight model dependencies
- selector-first coupling
That keeps the system portable and fast, but it also means the quality ceiling is bounded by the in-process matcher design and the quality of the accessibility snapshot.
Performance
Benchmarks on Intel i5-4300U @ 1.90GHz:
| Operation | Elements | Latency | Allocations |
|---|---|---|---|
| Lexical Find | 16 | ~71 us | 134 allocs |
| HashingEmbedder (single) | 1 | ~11 us | 3 allocs |
| HashingEmbedder (batch) | 16 | ~171 us | 49 allocs |
| Embedding Find | 16 | ~180 us | 98 allocs |
| Combined Find | 16 | ~233 us | 263 allocs |
| Combined Find | 100 | ~1.5 ms | 1685 allocs |