draw
Signvert
/ Blog / PDF Tools
PDF Tools10 March 2026 · 6 min read

How to Extract Text from a Scanned PDF (OCR Guide)

Scanned PDFs are essentially photos — you can't copy text from them. OCR (Optical Character Recognition) converts the image back into real, editable text. Here's how to do it for free.

document_scanner

Try the free tool mentioned in this article

PDF to Word (with OCR)

What is a scanned PDF?

When you scan a physical document, the scanner takes a photograph of each page and wraps those images in a PDF container. The result looks like a normal PDF but contains no actual text — just pixels. You can't select, copy, or search the text.

OCR (Optical Character Recognition) analyses those pixel patterns and converts them back into machine-readable text characters.

How to extract text from a scanned PDF for free

  1. Go to signvert.com/tools/pdf-to-word.
  2. Drop your scanned PDF onto the page.
  3. Select your document language from the dropdown (English is the default; 20+ languages are supported).
  4. If you know the PDF is scanned, toggle "Force OCR" on. Otherwise, the tool detects it automatically.
  5. Click "Convert" and download the .docx file containing the extracted text.

How does the OCR work?

The tool uses Tesseract.js, a WebAssembly port of Google's open-source Tesseract OCR engine. Each page is rendered to a canvas at 2× resolution and passed through the OCR engine. The recognised text is then assembled into a Word document with per-page headings.

Everything runs in your browser — no page images are ever sent to a server.

Supported languages

The tool supports over 20 languages including English, French, German, Spanish, Portuguese, Italian, Dutch, Polish, Russian, Arabic, Chinese (Simplified), Japanese, Korean, Hindi, and more.

Tips for better OCR accuracy

  • Scan quality matters most. A clean, high-contrast scan at 300 DPI or above gives the best results. Blurry or skewed scans reduce accuracy significantly.
  • Select the correct language. OCR accuracy drops sharply if the wrong language model is used.
  • Expect imperfections. OCR is not 100% accurate. Always proofread the output, especially for numbers, punctuation, and unusual fonts.
  • Tables and columns are hard. The extracted text will be in reading order but won't preserve table structure. For complex layouts, manual cleanup is needed.

When OCR isn't the right tool

If you just need to sign or annotate a scanned PDF without extracting text, use our document signing tool directly — it works on scanned PDFs without OCR.

Ready to try it yourself?

Free, browser-based, no account required.

PDF to Word (with OCR)