Google Bot now capable of performing OCR on scanned PDF files!

AddThis Feed Button

October 31st, 2008 Leave a comment Visited 74 times, 2 so far today

Google Bot now capable of performing OCR on scanned PDF files!

Google has announced that they are now indexing content stored as images in PDF files!

The Google Bot basically runs an OCR (Optical character recognition) on these documents.

The aim is to convert the image content in the PDF files to text which can be indexed and searched on.

The end result is that more data is collected from the file rather than just the file name.

This is a pretty impressive data collection from Google. They might expand this to images as well in the future.

Checkout: A picture of a thousand words?

You could perhaps use this feature to scan your own PDF documents by making them available online and wait for Google to index it. But it is going to be a time consuming process which might not work for days if not weeks.

It would however be interesting to see if Google makes their OCR technology for use online. They can easily provide this service for free on the web enabling users to convert their scanned documents to text!





TechWhack on Facebook

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

Leave a Comment

Related Posts

Popular Posts

blank