Local OCR · Chrome extension · MIT

Grab text from anything on screen.

Select any region of a page — a code block, a paragraph, a formula, a table — and OCR Buddy reads it locally. It shows you the source, flags what it isn't sure about, and never sends a pixel to a server.

Runs on-device No account Never invents text
docs.internal.dev/streaming

Streaming pipeline limits

Each request is buffered before the worker flushes it downstream. Backpressure is applied when the queue exceeds maxInflight, so the consumer never overruns the WebGPU device.

Tuning these limits is the difference between a smooth stream and a stalled one…

Selecting
OCR Buddy WebGPU

Source · captured region

Captured screen region showing a paragraph of prose about a streaming pipeline
Read as

Extracted text

Each request is buffered before the worker flushes it downstream. Backpressure is applied when the queue exceeds maxInflight, so the consumer never overruns the WebGPU device.
2 words flagged — verify against source
100% on-device WebGPU accelerated Never invents text Free & open source (MIT)
Why it's different

Classic OCR. Not generative OCR.

Big vision-language models top the benchmarks — then invent fluent, plausible, wrong text the moment the pixels get unclear. For code, numbers, IDs or prices, a confidently-wrong transcription is worse than none. OCR Buddy makes the opposite bet.

Generative OCR

Predicts the next likely token
  • Falls back on a language prior when the image is ambiguous
  • Writes something that reads well but isn't there
  • Far too heavy to run inside a browser tab

Hallucination here is architectural — not a bug you can prompt away.

OCR Buddy

Detection + CTC recognition
  • Has no language prior — it transcribes the glyphs that are actually present
  • When it can't read, it fails to blanks or low-confidence — never a made-up sentence
  • Small and fast enough to run comfortably in the browser

In-browser and no-hallucination aren't a tradeoff — both constraints select the same stack: PP-OCRv5 on ONNX Runtime Web.

How it works

The whole pipeline, on your device.

A drag-select hands a clean crop to a warm OCR engine running in an offscreen worker — coordinated, never uploaded.

01

Select a region

Drag over text on the page. The overlay is passive — it never reads page content.

02

Captured cleanly

A composited screenshot is cropped on an offscreen canvas — even from a paused cross-origin video.

03

Recognized locally

PP-OCRv5 runs on ONNX Runtime Web — WebGPU when available, multi-threaded WASM as fallback.

04

Verify & copy

The crop sits beside the text; low-confidence words are flagged. Edit, then copy.

Three modes

Pick how a region should be read.

And change your mind after capturing — the “Read as” switcher re-runs a different mode on the same crop, no re-selecting.

Text / Code

Code, prose, or any text

Inter-word spacing and blank lines are reconstructed from box geometry — the recognizer emits no space token, so layout is rebuilt, not guessed. A Code view restores indentation and syntax-highlights it.

Code view
while (queue.length > maxInflight) {
  const chunk = queue.shift();
  device.submit(chunk);
}
Formula → LaTeX

One equation, into LaTeX

The one place a generative model is unavoidable — so the guardrail is visual. The LaTeX is rendered with KaTeX right beside the source crop; if it can't render, OCR Buddy abstains and shows the image.

Rendered · verify against crop
softmax(QKT√dk)V \frac{QK^{\top}}{\sqrt{d_k}}
Table → Markdown

One table, into a grid

Rebuilt by pure geometry from the word boxes — rows by vertical position, columns from an x-coverage profile. Because it keys off alignment, not ruled lines, it handles borderless tables too.

Markdown table
ModelSizeLicense
PP-OCRv5 det4.7 MBApache-2.0
Latin rec8 MBApache-2.0
Faithful by design

You always see the source.

Anti-hallucination isn't a tagline — it's the feature set. The captured crop sits right above the extracted text, the cheapest possible check. If the model isn't confident about a word, it says so instead of guessing.

  • Source crop shown beside the result, every time
  • Per-word confidence — low scores underlined, never silently trusted
  • A blank or ambiguous region yields empty output — never invented filler

Source · captured region

Captured region with the words maxInflight and WebGPU rendered ambiguously
Backpressure is applied when the queue exceeds maxInflight, so the consumer never overruns the WebGPU device.
2 words flagged low-confidence
Built right

A small, honest tool — by choice.

A full page-layout “Document mode” and a heavyweight formula library both shipped, then were removed on purpose: layout models need a whole page of context and misread single crops, and the library corrupted the formula decode. Keeping them out is part of the design.

  • Column-aware reading order, so two-column papers don't interleave
  • Homoglyph fold maps stray look-alikes back to Latin — 4o0400
  • Capture works on paused cross-origin video — no tainted-canvas failures

Extracted text

function flush(queue, maxInflight) {
  // backpressure: never overrun the device
  while (queue.length > maxInflight) {
    const chunk = queue.shift();
    device.submit(chunk);
  }
  return queue.length;
}
Accuracy, honestly

Essentially perfect on what each mode is for.

Measured with the exact PP-OCRv5 config the extension ships, against ground truth on real academic pages.

99.9/100
character accuracy on a coherent text block — the normal “select a region” workflow
scripts/ocr-image-test.mjs · Node / CPU

Clean prose is effectively verbatim

Sentences, citations like [22] and tokens like RoPE-2D, all correct.

Grab a paragraph + a table together and the score drops

That's reading-order interleaving, not misrecognition — the characters are right, the order isn't. Select one region to restore it.

Equations and tables aren't text

Use Formula and Table modes for those — Text/Code mode flattens them. No “100% OCR of anything” claims here.

Private by architecture

Nothing leaves your device.

There is no server. The OCR models are bundled in the extension and run in an offscreen worker, so even first-run inference is fully offline. The only network use is downloading the extension itself.

  • No servers, no API calls, no telemetry
  • Models bundled — works fully offline
  • Screenshot permission requested explicitly, per-site, only when needed
On-device OCR
Detection & recognition in a local worker
ON
Network upload
No images or text ever sent out
OFF
Tracking & telemetry
No account, no analytics
OFF
Open & bundled

Built on excellent open-source work.

All models ship inside the extension and run on-device. Permissive licenses throughout — no copyleft anywhere in the stack.

ModelRoleLicense
PP-OCRv5 mobile det · ~4.7 MBText detectionApache-2.0
latin PP-OCRv5 rec · ~8 MBLatin text recognition (CTC)Apache-2.0
mfr_encoder / decoder · ~53 MBFormula → LaTeX (pix2text-mfr)MIT
Vite + CRXJS Manifest V3 ONNX Runtime Web WebGPU / WASM KaTeX highlight.js Chrome 124+
FAQ

Questions, answered plainly.

Does my data ever leave my device?

No. There's no server and no API calls. The OCR models are bundled in the extension and run entirely on your device — even the first run is fully offline. The only network use is downloading the extension itself.

Is it really free?

Yes — free and open source under the MIT license. No account, no subscription, no telemetry.

Why not use a big AI OCR model?

Generative models predict the next likely token, so when the image is unclear they invent fluent but wrong text. OCR Buddy uses classic detection + CTC recognition, which transcribes the glyphs that are present and fails to blanks instead of fabricating a sentence.

What can it read?

Plain text and code (including code in a paused video or a PDF), single equations converted to LaTeX, and single tables converted to a Markdown grid — borderless tables included.

Does it work offline?

Yes. The models ship inside the extension, so OCR Buddy works with no network connection at all.

Which browsers are supported?

Chrome 124 or newer (WebGPU in workers). On devices without WebGPU it falls back to multi-threaded WebAssembly with identical results.

Ready to read anything?

Add OCR Buddy to Chrome and pull clean text off any screen in seconds.

Add to Chrome — free
Manifest V3 · Chrome 124+ · Runs entirely on your device