What is a scanned PDF?
When you scan a physical document, the scanner takes a photograph of each page and wraps those images in a PDF container. The result looks like a normal PDF but contains no actual text — just pixels. You can't select, copy, or search the text.
OCR (Optical Character Recognition) analyses those pixel patterns and converts them back into machine-readable text characters.
How to extract text from a scanned PDF for free
- Go to signvert.com/tools/pdf-to-word.
- Drop your scanned PDF onto the page.
- Select your document language from the dropdown (English is the default; 20+ languages are supported).
- If you know the PDF is scanned, toggle "Force OCR" on. Otherwise, the tool detects it automatically.
- Click "Convert" and download the .docx file containing the extracted text.
How does the OCR work?
The tool uses Tesseract.js, a WebAssembly port of Google's open-source Tesseract OCR engine. Each page is rendered to a canvas at 2× resolution and passed through the OCR engine. The recognised text is then assembled into a Word document with per-page headings.
Everything runs in your browser — no page images are ever sent to a server.
Supported languages
The tool supports over 20 languages including English, French, German, Spanish, Portuguese, Italian, Dutch, Polish, Russian, Arabic, Chinese (Simplified), Japanese, Korean, Hindi, and more.
Tips for better OCR accuracy
- Scan quality matters most. A clean, high-contrast scan at 300 DPI or above gives the best results. Blurry or skewed scans reduce accuracy significantly.
- Select the correct language. OCR accuracy drops sharply if the wrong language model is used.
- Expect imperfections. OCR is not 100% accurate. Always proofread the output, especially for numbers, punctuation, and unusual fonts.
- Tables and columns are hard. The extracted text will be in reading order but won't preserve table structure. For complex layouts, manual cleanup is needed.
When OCR isn't the right tool
If you just need to sign or annotate a scanned PDF without extracting text, use our document signing tool directly — it works on scanned PDFs without OCR.