Most PDF’S are completely unsearchable, because they were created by a scanner—as opposed to being converted to PDF from, say, Microsoft Word.
People don’t realize this, but if you scan a document without turning on OCR (Optical Character Recognition), you’re creating an image PDF, not a text-based PDF. And that’s a serious limitation. Fortunately, it’s a limitation that’s easily remedied if you know how.
Document management software like NetDocuments or Worldox theoretically gives users the ability to text search their entire repository. But in practice, users find out that their scanned PDFs are image-based, and therefore not text searchable.
A few years ago, if you wanted to make the documents in your repository searchable, you’d have to hire someone to manually OCR all these documents. This could take months to years, and obviously cost a lot of money.
Today, you can use Symphony OCR (created by a company called Trumpet), which is a “content crawler” that will automatically find any PDF’s in your repository and OCR them for you. I’ve installed Symphony OCR on about 80% of my Worldox installs, which allows my clients to get the ability to do full text searching.
Figure 1: The above PDF is text searchable.
And now Symphony OCR is even available for the NetDocuments system. I’ve recently been beta testing it for several clients, and it has worked flawlessly.
You have to install Symphony on a workstation at your office. The requirements are fairly minimum 1 GB of Ram, 1GB space, XP or higher. Once installed, you will launch the program and login to your NetDocuments Account. The account you use is important because when a document is OCR-ed, it will list that user as the last person that modified it. So if you use John Does’s NetDocuments account, the PDFS when OCR-ed will be modified by John Doe.
Figure 2: My NetDocuments Repository connected to Symphony OCR
If you are interested in add Symphony OCR to your NetDocuments account, contact us today.