1. Articles in category: Segmentation

    793-816 of 834 « 1 2 ... 31 32 33 34 35 »
    1. Method for segmentation of text

      A computerized method, and a corresponding apparatus, for segmentation of a stream of text elements comprising analyzed tokens into one or more initial clauses is disclosed. In the method, a predetermined number of consecutive text elements of said stream of text elements are scanned, starting from a given position. The predetermined number of consecutive text elements are compared with each pattern of a set of patterns for beginnings of initial clauses, and a beginning of an initial clause is identified in the predetermined number of consecutive text elements, if the predetermined number of consecutive text elements match one pattern of ...
      Read Full Article
    2. Computer method and apparatus for segmenting text streams

      Computer method and apparatus for segmenting text streams is disclosed. Given is an input text stream formed of a series of words. A probability member provides working probabilities that a group of words is of a topic selected from a plurality of predetermined topics. The probability member accounts for relationships between words. A processing module receives the input text stream and using the probability member determines probability of certain words in the input text stream being of a same topic. As such, the processing module segments the input text stream into single topic groupings of words, where each grouping is ...
      Read Full Article
    3. System and method for incorporating concept-based retrieval within boolean search engines

      Disclosed is a method for linguistic pattern recognition of information. Initially, textual information is retrieved from a data source utilizing a network. The textual information is then segmented into a plurality of phrases, which are then scanned for patterns of interest. For each pattern of interest found a corresponding event structure is built. Event structures that provide information about essentially the same incident are then merged.
      Read Full Article
    4. Method and apparatus for making predictions about entities represented in documents

      A method and apparatus is disclosed for making predictions about entities represented in documents and for information analysis of text documents or the like, from a large number of such documents. Predictive models are executed responsive to variables derived from canonical documents to determine documents containing desired attributes or characteristics. The canonical documents are derived from standardized documents, which, in turn, are derived from original documents.
      Read Full Article
    5. Method and system for reducing lexical ambiguity

      A method and system for reducing lexical ambiguity in an input stream are described. In one embodiment, the input stream is broken into tokens. The tokens are used to create a connection graph comprising a number of paths. Each of the paths is assigned a cost. At least one best path is defined based upon a corresponding cost to generate an output graph. The generated output graph is provided to reduce lexical ambiguity.
      Read Full Article
    6. Method and system for video browsing and editing by employing audio

      A system for browsing and editing video, in accordance with the present invention, includes a video source for providing a video document which includes audio information, and an audio classifier coupled to the video source, the audio classifier being adapted to classify audio segments of the audio information into a plurality of classes. An audio spectrogram generator is coupled to the video source for generating spectrograms for the audio information to check that the audio segments have been identified correctly by the audio classifier. A browser is coupled to the audio classifier for searching the classified audio segments for editing ...
      Read Full Article
    7. Proper name identification in chinese

      A word segmentation method to identify proper names in input text includes locating a sequence of single-characters in the input text not forming part of a multiple-character word. The method further includes comparing the sequence of single-characters to a lexical knowledge base to identify if a first portion of the sequence corresponds to stored identifiable portions of a proper name, and comparing the sequence of single-characters to the lexical knowledge base to identify if a second portion of the sequence proximate the first portion includes characters known to comprise a second portion of a proper name. Instructions can be provided ...
      Read Full Article
    8. Parameterized word segmentation of unsegmented text

      The present invention segments a non-segmented input text. The input text is received and segmented based on parameter values associated with parameterized word formation rules. In one illustrative embodiment, the input text is processed into a form which includes parameter indications, but which preserves the word-internal structure of the input text. Thus, the parameter values can be changed without entirely re-processing the input text.
      Read Full Article
    9. Word segmentation in chinese text

      The present invention provides a facility for selecting from a sequence of natural language characters combinations of characters that may be words. The facility uses indications, for each of a plurality of characters, of (a) the characters that occur in the second position of words that begin with the character and (b) the positions in which the character occurs in words. For each of a plurality of contiguous combinations of characters occurring in the sequence, the facility determines whether the character occurring in the second position of the combination is indicated to occur in words that begin with the character ...
      Read Full Article
      Mentions: Buenos Aires
    10. System for organizing videos based on closed-caption information

      A system for organizing digital videos to archive and access them at different levels of abstraction uses data available from a closed-caption text along with off-the-shelf natural language processing tools to segment the video into self-contained story sections and speaker blocks. If the subject changes are marked, the system uses these points to divide the video into distinct stories which are represented as nodes attached to the root node in a tree structure and groups speaker segments belonging to a story under the story node as its children.
      Read Full Article
      Mentions: Japan England Seattle
    11. Method For Dynamically Delivering Contents Encapsulated With Capsule Overviews Corresonding To The Plurality Of Documents, Resolving Co-Referentiality Related To Frequency Within Document, Determining Topic Stamps For Each Document Segments

      A method for the dynamic presentation of the contents of a plurality of documents on a display is disclosed. The method comprises receiving a plurality of documents and providing a plurality of topically rich capsule overviews corresponding to the plurality of documents. The method also includes dynamically delivering document content encapsulated in the plurality of capsule overviews. In so doing, the method in accordance with the present invention can present thematic capsule overviews of the documents to users. The capsule overviews, delivered in a variety of dynamic presentation modes, allow the user to quickly get a sense of what a ...
      Read Full Article
    12. Method to compress linguistic structures

      A method and system for compressing a data structure. A segment is identified within the data structure. Each segment identified is counted for the number of occurrences of the segment within the data structure. If the number of occurrences is greater than one, the segment is saved in a recurring data structure. Also, the recurring segment within the data structure is replaced with an index to the segment stored in the recurring data structure.
      Read Full Article
    13. Method for database address specification

      A method for use with a processor which automatically creates hyperlinks between references to records in a record set which appear in a second record and the records in the record set, the method for eliminating ambiguity when record references overlap and including steps whereby resolution rules are applied which recognize references to a subset of records which are referenced by overlapping references and also a method for recognizing specific record information as a particular type and inserting tags which can be used by certain applications to identify the specific information within the record.
      Read Full Article
    14. Method and system for topical segmentation, segment significance and segment function

      A "domain-general" method for topical segmentation of a document input includes the steps of: extracting one or more selected terms from a document; linking occurrences of the extracted terms based upon the proximity of similar terms; and assigning weighted scores to paragraphs of the document input corresponding to the linked occurrences. In accordance with the present invention, the values of the assigned scores depend upon the type of the selected terms, e.g., common noun, proper noun, pronominal, and the position of the linked occurrences with respect to the paragraphs, e.g., front, during, rear, etc. Upon zero-sum normalization, the ...
      Read Full Article
    15. Automatic segmentation of a text

      A system 100 is capable of segmenting a connected text, such as Japanese or Chinese sentence, into words. The system includes means 110 for reading an input string representing the connected text. Segmentation means 120 identifies at least one word sequence in the connected text by building a tree structure representing word sequence(s) in the input string in an iterative manner. Initially the input string is taken as a working string. Each word of a dictionary 122 is compared with the beginning of the working string. A match is represented by a node in the tree, and the process ...
      Read Full Article
    16. E-mail signature block analysis

      A technique analyzing loosely constrained text blocks, such as e-mail signature blocks by performing a two-dimensional geometrical analysis and a one-dimensional language analysis in order to classify sub-blocks at the loosely constrained text block into particular functional classes. The present technique may also be utilized to identify a personal name from a user name in a loosely constrained text block, such as an e-mail signature block.
      Read Full Article
    17. E-mail signature block segmentation

      A technique for segmenting a loosely constrained text block, such as an e-mail signature block into sub-blocks by performing line segment extraction and connected component analysis on the foreground characters and background characters and recursively repeating connected component analysis on both the foreground and background characters and line segment extraction on the background characters until a text output includes no mixed reading blocks. A technique for correcting over segmentation errors in a line of text from a loosely constrained text block which has undergone geometrical analysis.
      Read Full Article
    18. Methods for analysis and evaluation of the semantic content of a writing based on vector length

      The present invention is a methodology for analyzing and evaluating a sample text, such as essay(s), or document(s). This methodology compares sample text to a reference essay(s), document(s), or text segment(s) within a reference essay or document. The methodology analyzes the amount of subject-matter information in the sample text, analyzes the relevance of subject matter information in the sample and evaluates the semantic coherence of the sample. This methodology presumes there is an underlying, latent semantic structure in the usage of words. The method parses and stores text objects and text segments from the sample ...
      Read Full Article
    19. Method for dynamic presentation of the contents topically rich capsule overviews corresponding to the plurality of documents, resolving co-referentiality in document segments

      A method for the dynamic presentation of the contents of a plurality of documents on a display is disclosed. The method comprises receiving a plurality of documents and providing a plurality of topically rich capsule overviews corresponding to the plurality of documents. The method also includes dynamically delivering document content encapsulated in the plurality of capsule overviews. In so doing, the method in accordance with the present invention can present thematic capsule overviews of the documents to users. The capsule overviews, delivered in a variety of dynamic presentation modes, allow the user to quickly get a sense of what a ...
      Read Full Article
    20. System for chinese tokenization and named entity recognition

      A system (100, 200) for tokenization and named entity recognition of ideographic language is disclosed. In the system, a word lattice is generated for a string of ideographic characters using finite state grammars (150) and a system lexicon (240). Segmented text is generated by determining word boundaries in the string of ideographic characters using the word lattice dependent upon a contextual language model (152A) and one or more entity language models (152B). One or more named entities is recognized in the string of ideographic characters using the word lattice dependent upon the contextual language model (152A) and the one or ...
      Read Full Article
    21. System and method for estimating accuracy of an automatic natural language translation

      A computer system and method for natural language translation uses a translation process to translate a source natural language segment (e.g. English) of one or more source words/elements into a target natural language (e.g. German) segment of one or more target words/elements. An evaluation module determines a confidence measure of the natural language translation. Typically, the confidence measure indicates less confidence as the complexity of the translation increases. Various novel features for determining complexity and confidence measure at different steps in the translation are used. The translation process can be terminated if the confidence measure fails ...
      Read Full Article
    22. System and method for automated testing of writing skill

      A system and method for administering a composition problem in a language, such as English, to an examinee. The examinee is provided with textual items such as brief essays needing correction. The examinee selects a predetermined segment of the text and moves it to an editing window where the examinee may change the segment using standard word processing techniques. The segment may be deleted or modified and then replaced in the text. The examinee continues this process by selecting other predetermined text segments. When the examinee has finished selecting and editing all the segments that the examinee wishes to change ...
      Read Full Article
    23. Bootstrapping sense characterizations of occurrences of polysemous words in dictionary representations of a lexical knowledge base in computer memory

      The present invention is directed to characterizing the sense of an occurrence of a polysemous word in a representation of a dictionary. In a preferred embodiment, the representation of the dictionary is made up of a plurality of text segments containing word occurrences having a word sense characterization and word occurrences not having a word sense characterization. The embodiment first selects a plurality of the dictionary text segments that each contain a first word. The embodiment then identifies from among the selected text segments a first and a second occurrence of a second word. The identified second occurrence of the ...
      Read Full Article
      Mentions: Inventiona
    24. Identifying language and character set of data representing text

      The present invention provides a facility for identifying the unknown language of text represented by a series of data values in accordance with a character set that associates character glyphs with particular data values. The facility first generates a characterization that characterizes the series of data values in terms of the occurrence of particular data values on the series of data values. For each of a plurality of languages, the facility then retrieves a model that models the language in terms of the statistical occurrence of particular data values in representative samples of text in that language. The facility then ...
      Read Full Article
    793-816 of 834 « 1 2 ... 31 32 33 34 35 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles

  3. Organizations in the News

    1. (39 articles) Microsoft
    2. (33 articles) Google
    3. (20 articles) Nuance Communications
    4. (20 articles) Apac
    5. (19 articles) Intel
    6. (19 articles) SMEs
    7. (18 articles) Healthcare
    8. (18 articles) Service
    9. (18 articles) IBM
    10. (17 articles) IBM Corporation
    11. (17 articles) Bfsi
    12. (15 articles) NLP
  4. Locations in the News

    1. (29 articles) India
    2. (23 articles) Japan
    3. (22 articles) China
    4. (19 articles) Pune
    5. (18 articles) New York
    6. (14 articles) Canada
    7. (13 articles) Germany
    8. (12 articles) Africa
    9. (12 articles) France
    10. (9 articles) Washington
    11. (9 articles) Massachusetts
    12. (9 articles) California
  5. People in the News

    1. (3 articles) Laura Wood