TECHNOLOGY INSPIRATION
Technology-People-Innovation

Making Scanned Documents Searchable and Editable


When you scan a document directly into a PDF file, Acrobat captures all the text and graphics on each page as though they were all just one big graphic image. This is fine as far as it goes, except that it doesn't go very far because you can neither edit nor search the PDF document (because, as far as Acrobat is concerned, the document doesn't contain any text to edit or search, just one humongous graphic). That's where the Paper Capture plug-in in Acrobat 5 for Windows comes into play: You can use it to make a PDF that you can just search or both search and edit.

For some unknown reason, some of the first copies of Acrobat 5 for Windows shipped without the Paper Capture plug-in. If you find that your Tools menu in Acrobat 5 is missing the Paper Capture item, you need to download and install the Paper Capture plug-in from the Adobe Web site. Note that the Paper Capture plug-in has a 50-page document limit. If you need to process PDF documents over 50 pages in length, you need to look into purchasing Adobe Acrobat Capture, a full-blown version of the Paper Capture plug-in that can handle longer documents.

To use Paper Capture, all you have to do is choose Tools --> Paper Capture to open the Paper Capture Plug-In dialog box, select the page or pages to be processed (All Pages, Current Page, or From Page x to y), and then click the OK button; the Paper Capture utility does the rest. As it processes the page or pages in the document that you designated, a Paper Capture Plug-In alert dialog box keeps you informed of its progress in preparing and performing the page recognition. When Paper Capture finishes doing the page recognition, this alert dialog box disappears and you can then save the changes to your PDF document with the File --> Save command.

When doing the page recognition in a PDF document, the Paper Capture plug-in offers you a choice between the following three Output Style options:

  • Formatted Text & Graphics to make the text in the PDF document both editable and searchable. Select this setting if you not only want to be able to find text in the document but also possibly make editing changes to it.
  • Searchable Image (Exact) to make the text in the PDF document searchable but not editable (this is the default setting). Use this setting if you're processing a document that needs to be searchable but should never be edited in any way, such as an executed contract.
  • Searchable Image (Compact) to make the text in the PDF document searchable but not editable and to compress its graphics. Select this setting if you're processing a document whose text requires searching without editing and that also contains a fair number of graphic images that need compressing. When you select this setting, Paper Capture applies JPEG compression to color images and ZIP compression to black-and-white images.

To select a different output style setting, click the Preferences button in the Paper Capture Plug-In dialog box to open the Preferences dialog box. This dialog box not only enables you to select a new output style in the PDF Output Style pop-up menu but also to designate the primary language used in the text in the Primary OCR Language pop-up menu (OCR stands for Optical Character Recognition, which is the kind of software that Paper Capture uses to recognize and convert text captured as a graphic into text that can be searched and edited).

If your PDF document contains graphic images, you can tell Paper Capture how much to compress the images by selecting the maximum resolution in the Downsample Images pop-up menu. This menu offers you three options in addition to None (for no compression): Low (300 dpi), Medium (150 dpi), and High (72 dpi). The Low, Medium, and High options refer to the amount of compression applied to the images, and the values 300, 150, and 72 dpi (dots per inch) refer to their resolution and thus their quality. As always, the higher the amount of compression, the smaller the file size and the lower the image quality.

After processing the pages of your PDF document with the Paper Capture plug-in, use the Find feature (Ctrl+F on Windows and Command key+F on the Mac) to search for words or phrases in the text to verify it can be searched. If you used the Formatted Text & Graphics output style in doing the page recognition, you can select the TouchUp Text Tool by clicking its button on the Editing toolbar or by typing T, and then click the I-beam pointer in a line of text to select the line with a bounding box to verify that you can edit the text as well. Always remember to use File --> Save to save the changes made to your document by processing with Paper Capture.

Post a Comment

[blogger]

Contact Form

Name

Email *

Message *

Powered by Blogger.
Javascript DisablePlease Enable Javascript To See All Widget