1. Removal of extraneous text from electronic documents

    Method and apparatus for removing lines of extraneous text from a document. Similarities are identified between lines of text on each page and corresponding lines on a selected subset of pages. Different weight values are associated with different line numbers of text on a page, each weight value indicating a degree of likelihood that a line of text contains extraneous text. One or more lines of text are selectively removed from a page as a function of the similarities and associated weight values of line numbers of the lines of text.
