Google PDF search: tesseract redux
Written on 1.11.08
So it seems that Google's now able to index PDFs by extracting text from images. If you ask me, this is pretty exciting stuff although I have a hunch they're just eating their dog food and, if so, the technology isn't new, it's just Google being Google (ie. smart). At a point where everyone appears to be thinking about semantic search and other (mostly useless at this point) non-sense Google takes it one step beyond and continues to improve their service. This, in itself, is begging for the question: has the Google competition called it quits on the search engine marathon?