BookTools

Introduction

The BookTools Project is a cooperative, free-software project to create a set of public domain utilities for the image processing of scanned book page images.

These tools, in combination with other public domain and low-cost commercial image manipulation utilities, will form a set of facilities for cleaning up, understanding, beautifying, quality-checking and compressing book page images, furthering the goal of a universal, on-line library based upon these images.

The BookTools are placed into the public domain under the conditions of the Gnu Public License (GPL), which guarantees the availability of source code. This supports the goal of an extensible set of tools which are cooperatively enhanced and maintained for the common good.

Since much attention is already focussed on creating access images for Internet browsing, the focus of the BookTools Project will be more on manipulating the high-quality source images captured during scanning. When these are appropriately cleaned up and organized using BookTools, derivative access images can be created using more conventional utilities.


Existing BookTools

The following BookTools are now available:

gather
"Gather up" individual TIFF/Group4 and JPEG page images into a multi-page image-only PDF (Portable Document File).
 
Sponsor: Library of Congress Law Library. contact: Nick Kozura, nkoz@loc.gov
Author: Steve Williams, Picture Elements, Inc., steve@picturel.com
* This is currently being prepared for release.

Current BookTools Development

The following BookTools are under development:

find_ht
Locate the largest rectangular halftone region on a page known to contain a halftone, ignoring any specified rectangular regions. Use iteratively to locate all halftone regions.
 
Sponsor: Library of Congress Office of Preservation. contact: Basil Manns, bman@loc.gov
Collaborator: Cornell University Department of Preservation and Conservation. contact: Anne Kenney, ark3@cornell.edu
Author: Picture Elements, Inc., info@picturel.com
 
un_ht
Create a grayscale image from a specified rectangular halftone region of a grayscale page image.
 
Sponsor: Library of Congress Office of Preservation. contact: Basil Manns, bman@loc.gov
Collaborator: Cornell University Department of Preservation and Conservation. contact: Anne Kenney, ark3@cornell.edu
Author: Picture Elements, Inc., info@picturel.com

Wish List / Future BookTools Development

The BookTools Project is actively seeking sponsors to fund the development of the utilities listed below and other new efforts.

Other organizations or graduate students wishing to undertake developments under the BookTools Project of any of the utilities proposed below are welcomed and will be assisted in keeping those developments compatible with the rest of the BookTools.

Suggested additions to the Wish List are solicited.

Contributions of existing utilities to the public domain under the BookTools Project are welcomed. Guidance and assistance will be provided to allow them to be modified for compatibility with the existing BookTools.

The following utilities are planned or desired:

gather
Enhance the existing gather utility to allow PDF pages to be interspersed with TIFF/Group 4 images and JPEG images.
compound_PDF
Create a compound, single-page, image-only PDF file from independent image files. One image is specified as background, other images are placed in specified rectangular regions, with scaling and/or cropping used.
find_txtblk
Locate the main (largest) text block on a page, ignoring internal paragraph boundaries.
place_txtblk
Place the main text block in desired position on a page, maintaining the relative positions of outlyer objects (pgnum, header, footer).
set_pgsize
Regularize the page sizes of a set of pages.
find_pgnum
Locate a rectangular region containing the page number.
find_title
Locate the main title page page number.
find_title_verso
Locate the back of the title page (where copyright notice, publishing date and other cataloging information is found).
find_toc
Locate the table of contents pages.
find_idx
Locate the page range of the index pages.
find_header
Locate a rectangular region of a page containing a header.
find_footer
Locate a rectangular region of a page containing a footer.
find_hdline
Locate rectangular regions containing the largest point size text on the page. Return a region set. Ignore any specified rectangular "keep out" regions as specified by an input region set. This may be used iterative to find successively smaller text regions for building navigation aids.
Dump Utilities
Utilities to dump image file format parameters in comprehensive, compatible and automatically useable ways.
Jam Utilities
Utilities to insert information, parameters, comments, copyright notices into image files of various file formats.
Quality Checking Utilities
Utilities for page-to-page consistency checking of image sets, automatic image quality assurance, test chart analysis, etc.

The beginnings of an architecture document exist. Eventually, this will serve as a guide to developing new BookTools.


Sponsors

Sponsors are actively being sought to fund the development of public domain BookTools. This sponsorship may go forward in a variety of ways:

Sponsoring organizations will receive prominent credit for their philanthropy.

This is a simple way for organizations needing a utility for a conversion project of their own to get it developed, while assuring its wide accessibility to the community and its continuing support by a collaborative group of programmers on the Internet. Just add a line item to your conversion budget for the development of the BookTool you need.

Help make the inexpensive mass conversion of books to images a reality!


Further Information

For more information, please contact:

Lou Sharpe, BookTools Project coordinator
lsharpe@picturel.com
303-444-6767


| Home |


info@picturel.com
Copyright © 1997 Picture Elements, Inc.