Tutorial · ocr
How to extract text from a scanned PDF (OCR)
Got a scanned doc and can't copy a single word out of it? OCR fixes that. Convert it to searchable PDF or plain text in seconds.
You receive a scanned contract and try to highlight a clause to paste into an email — nothing happens. The cursor passes right over the text without reacting. That's because the "text" in the PDF is actually a photograph of a piece of paper. To any program, you're just selecting a JPG.
OCR (Optical Character Recognition) is the technology that lets a machine "read" those images and convert them into real text. The result: a document you can search with Cmd/Ctrl-F, copy passages from, edit in Word, index for search, or feed to an LLM.
How to tell if a PDF is scanned
3-second test: open the PDF and try to select a word with your mouse. If it highlights cleanly word by word, the PDF has real text and doesn't need OCR. If you can only draw a rectangle on top of it, it's an image — you need OCR.
Another tell: hit Cmd/Ctrl-F and search for a word you KNOW is in the document. If it doesn't find it, the content is an image.
When OCR is the right call
- Old digitized contracts — to extract clauses, dates, amounts
- Scanned receipts — to populate expense spreadsheets
- Books and academic papers — to cite passages, translate
- HR documents — IDs, payslips, certificates for record-keeping
- Medical history — to digitize old patient records
- Field research — partially handwritten survey forms
Step-by-step: extract text with OCR
1. Upload the scanned file
Works with PDF, JPG, PNG and TIFF. You can upload multi-page documents — OCR processes everything in one shot and preserves the order.
2. Pick the content language
We support English, Portuguese, and Spanish. OCR uses different models per language — picking the wrong one tanks accuracy. If your document is multilingual (a bilingual report, for instance), process each part separately.
3. Choose output format
- Searchable PDF — keeps the original visual but adds an invisible text layer on top. You can Cmd/Ctrl-F, copy and paste normally.
- Plain text (.txt) — just the extracted content, no formatting. Great for spreadsheets, importing into systems, feeding to AI.
- Word (.docx) — converts with basic formatting preserved (paragraphs, alignment). Good for editing.
4. Process and download
OCR is slower than other conversions (each page takes 2-10 seconds depending on resolution). When it finishes, you download the file in your chosen format.
How to improve OCR quality
OCR accuracy depends heavily on the source image quality. Some tips:
- Scan at 300 DPI minimum — below that, small letters blur
- Make sure the page is straight — tilted scans and angled shots confuse the recognition
- Clean stains and folds before scanning — marks become random characters
- Prefer white background and black ink — high contrast = better reading
- Avoid screen captures — moiré and pixelization hurt accuracy
Bonus: OCR + other tools
OCR unlocks several follow-ups:
- OCR + Compress — after OCR the PDF gets MUCH lighter (text weighs far less than image)
- OCR + Word — export to .docx for editing and review
- OCR + Excel — if the document is tabular, OCR + Excel converter splits into columns
- OCR + ChatGPT — drop the extracted text into AI to summarize, translate, or analyze
Frequently asked questions
More guides
Other tutorials you might find useful
How to compress a PDF without a watermark
Most "free" PDF compressors stamp a promo on your file. Here you compress it for real — quality intact, size cut, zero watermark.
4 min readHow to create UTM links for Google Ads
Without UTMs you can't track which channel drove which conversion. Here's the right structure, with copy-paste templates and a free builder.
5 min readHow to reduce PDF size to send by email
Gmail caps at 25 MB, Outlook at 20, some corporate servers at 10. Here's how to shrink a PDF until it fits, without losing quality.
4 min read