1. Removal of extraneous text from electronic documents

    Method and apparatus for removing lines of extraneous text from a document. Similarities are identified between lines of text on each page and corresponding lines on a selected subset of pages. Different weight values are associated with different line numbers of text on a page, each weight value indicating a degree of likelihood that a line of text contains extraneous text. One or more lines of text are selectively removed from a page as a function of the similarities and associated weight values of line numbers of the lines of text.
    Read Full Article

    Login to comment.

  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD