1. Articles in category: Segmentation

    889-912 of 919 « 1 2 ... 35 36 37 38 39 »
    1. Method to compress linguistic structures

      A method and system for compressing a data structure. A segment is identified within the data structure. Each segment identified is counted for the number of occurrences of the segment within the data structure. If the number of occurrences is greater than one, the segment is saved in a recurring data structure. Also, the recurring segment within the data structure is replaced with an index to the segment stored in the recurring data structure.
      Read Full Article
    2. Method for database address specification

      A method for use with a processor which automatically creates hyperlinks between references to records in a record set which appear in a second record and the records in the record set, the method for eliminating ambiguity when record references overlap and including steps whereby resolution rules are applied which recognize references to a subset of records which are referenced by overlapping references and also a method for recognizing specific record information as a particular type and inserting tags which can be used by certain applications to identify the specific information within the record.
      Read Full Article
    3. Method and system for topical segmentation, segment significance and segment function

      A "domain-general" method for topical segmentation of a document input includes the steps of: extracting one or more selected terms from a document; linking occurrences of the extracted terms based upon the proximity of similar terms; and assigning weighted scores to paragraphs of the document input corresponding to the linked occurrences. In accordance with the present invention, the values of the assigned scores depend upon the type of the selected terms, e.g., common noun, proper noun, pronominal, and the position of the linked occurrences with respect to the paragraphs, e.g., front, during, rear, etc. Upon zero-sum normalization, the ...
      Read Full Article
    4. Automatic segmentation of a text

      A system 100 is capable of segmenting a connected text, such as Japanese or Chinese sentence, into words. The system includes means 110 for reading an input string representing the connected text. Segmentation means 120 identifies at least one word sequence in the connected text by building a tree structure representing word sequence(s) in the input string in an iterative manner. Initially the input string is taken as a working string. Each word of a dictionary 122 is compared with the beginning of the working string. A match is represented by a node in the tree, and the process ...
      Read Full Article
    5. E-mail signature block analysis

      A technique analyzing loosely constrained text blocks, such as e-mail signature blocks by performing a two-dimensional geometrical analysis and a one-dimensional language analysis in order to classify sub-blocks at the loosely constrained text block into particular functional classes. The present technique may also be utilized to identify a personal name from a user name in a loosely constrained text block, such as an e-mail signature block.
      Read Full Article
    6. E-mail signature block segmentation

      A technique for segmenting a loosely constrained text block, such as an e-mail signature block into sub-blocks by performing line segment extraction and connected component analysis on the foreground characters and background characters and recursively repeating connected component analysis on both the foreground and background characters and line segment extraction on the background characters until a text output includes no mixed reading blocks. A technique for correcting over segmentation errors in a line of text from a loosely constrained text block which has undergone geometrical analysis.
      Read Full Article
    7. Methods for analysis and evaluation of the semantic content of a writing based on vector length

      The present invention is a methodology for analyzing and evaluating a sample text, such as essay(s), or document(s). This methodology compares sample text to a reference essay(s), document(s), or text segment(s) within a reference essay or document. The methodology analyzes the amount of subject-matter information in the sample text, analyzes the relevance of subject matter information in the sample and evaluates the semantic coherence of the sample. This methodology presumes there is an underlying, latent semantic structure in the usage of words. The method parses and stores text objects and text segments from the sample ...
      Read Full Article
    8. Method for dynamic presentation of the contents topically rich capsule overviews corresponding to the plurality of documents, resolving co-referentiality in document segments

      A method for the dynamic presentation of the contents of a plurality of documents on a display is disclosed. The method comprises receiving a plurality of documents and providing a plurality of topically rich capsule overviews corresponding to the plurality of documents. The method also includes dynamically delivering document content encapsulated in the plurality of capsule overviews. In so doing, the method in accordance with the present invention can present thematic capsule overviews of the documents to users. The capsule overviews, delivered in a variety of dynamic presentation modes, allow the user to quickly get a sense of what a ...
      Read Full Article
    9. System for chinese tokenization and named entity recognition

      A system (100, 200) for tokenization and named entity recognition of ideographic language is disclosed. In the system, a word lattice is generated for a string of ideographic characters using finite state grammars (150) and a system lexicon (240). Segmented text is generated by determining word boundaries in the string of ideographic characters using the word lattice dependent upon a contextual language model (152A) and one or more entity language models (152B). One or more named entities is recognized in the string of ideographic characters using the word lattice dependent upon the contextual language model (152A) and the one or ...
      Read Full Article
    10. System and method for estimating accuracy of an automatic natural language translation

      A computer system and method for natural language translation uses a translation process to translate a source natural language segment (e.g. English) of one or more source words/elements into a target natural language (e.g. German) segment of one or more target words/elements. An evaluation module determines a confidence measure of the natural language translation. Typically, the confidence measure indicates less confidence as the complexity of the translation increases. Various novel features for determining complexity and confidence measure at different steps in the translation are used. The translation process can be terminated if the confidence measure fails ...
      Read Full Article
    11. System and method for automated testing of writing skill

      A system and method for administering a composition problem in a language, such as English, to an examinee. The examinee is provided with textual items such as brief essays needing correction. The examinee selects a predetermined segment of the text and moves it to an editing window where the examinee may change the segment using standard word processing techniques. The segment may be deleted or modified and then replaced in the text. The examinee continues this process by selecting other predetermined text segments. When the examinee has finished selecting and editing all the segments that the examinee wishes to change ...
      Read Full Article
    12. Bootstrapping sense characterizations of occurrences of polysemous words in dictionary representations of a lexical knowledge base in computer memory

      The present invention is directed to characterizing the sense of an occurrence of a polysemous word in a representation of a dictionary. In a preferred embodiment, the representation of the dictionary is made up of a plurality of text segments containing word occurrences having a word sense characterization and word occurrences not having a word sense characterization. The embodiment first selects a plurality of the dictionary text segments that each contain a first word. The embodiment then identifies from among the selected text segments a first and a second occurrence of a second word. The identified second occurrence of the ...
      Read Full Article
      Mentions: Inventiona
    13. Identifying language and character set of data representing text

      The present invention provides a facility for identifying the unknown language of text represented by a series of data values in accordance with a character set that associates character glyphs with particular data values. The facility first generates a characterization that characterizes the series of data values in terms of the occurrence of particular data values on the series of data values. For each of a plurality of languages, the facility then retrieves a model that models the language in terms of the statistical occurrence of particular data values in representative samples of text in that language. The facility then ...
      Read Full Article
    14. Machine assisted translation tools utilizing an inverted index and list of letter n-grams

      A translation memory for computer assisted translation based upon an aligned file having a number of source language text strings paired with target language text strings. A posting vector file includes a posting vector associated with each source language text string in the aligned file. Each posting vector includes a document identification number corresponding to a selected one of the source language text strings in the aligned file and a number of entropy weight values, each of the number of weight values corresponding to a unique letter n-gram that appears in the selected source language text string. Preferably, the translation ...
      Read Full Article
      Mentions: Unicode Trados
    15. Bootstrapping sense characterizations of occurrences of polysemous words in dictionaries

      The present invention is directed to characterizing the sense of an occurrence of a polysemous word in a representation of a dictionary. In a preferred embodiment, the representation of the dictionary is made up of a plurality of text segments containing word occurrences having a word sense characterization and word occurrences not having a word sense characterization. The embodiment first selects a plurality of the dictionary text segments that each contain a first word. The embodiment then identifies from among the selected text segments a first and a second occurrence of a second word. The identified second occurrence of the ...
      Read Full Article
      Mentions: Inventiona
    16. Bootstrapping sense characterizations of occurrences of polysemous words

      The present invention is directed to characterizing the sense of an occurrence of a polysemous word in a representation of a dictionary. In a preferred embodiment, the representation of the dictionary is made up of a plurality of text segments containing word occurrences having a word sense characterization and word occurrences not having a word sense characterization. The embodiment first selects a plurality of the dictionary text segments that each contain a first word. The embodiment then identifies from among the selected text segments a first and a second occurrence of a second word. The identified second occurrence of the ...
      Read Full Article
      Mentions: Inventiona
    17. Character strings reading device

      A character strings reading device for reading character strings from input image data comprises cut-out recognition means for cutting out a segment corresponding to one character from the image data to perform individual character recognition every segment, a recognition result buffer for storing a recognition result of the cut-out recognition means, word searching means for searching a word string candidate corresponding to a combination of character candidates in the recognition result buffer, a word string candidate buffer for storing a search result of the word searching means, check portion determining means for determining a check target portion and a presumed ...
      Read Full Article
    18. Prosodic databases holding fundamental frequency templates for use in speech synthesis

      Prosodic databases hold fundamental frequency templates for use in a speech synthesis system. Prosodic database templates may hold fundamental frequency values for syllables in a given sentence. These fundamental frequency values may be applied in synthesizing a sentence of speech. The templates are indexed by tonal pattern markings. A predicted tonal marking pattern is generated for each sentence of text that is to be synthesized, and this predicted pattern of tonal markings is used to locate a best-matching template. The templates are derived by calculating fundamental frequencies on a pursuable basis for sentences that are spoken by a human trainer ...
      Read Full Article
    19. Method and apparatus for creating a searchable digital video library and a system and method of using such a library

      An apparatus and method of creating a digital library from audio data and video images. The method includes the steps of transcribing the audio data and marking the transcribed audio data with a first set of time-stamps and indexing the transcribed audio data. The method also includes the steps of digitizing the video data and marking the digitized video data with a second set of time-stamps related to the first set of time-stamps and segmenting the digitized video data into paragraphs according to a set of rules. The steps of storing the indexed audio data and the digitized video data ...
      Read Full Article
    20. Methods for controlling the generation of speech from text representing one or more names

      Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
      Read Full Article
    21. System with collaborative interface agent

      The present invention relates to a discourse manager which permits effective collaboration between a user and a computer agent. The system operates according to a theory of collaborative discourse between humans, with the computer agent playing the same role as a human collaborator. The present invention does not concern the internal operation of a particular agent, but relates rather to the structures for managing a collaborative discourse between any type of agent and the user. The discourse manager includes a memory in which application-specific recipes are stored and a memory in which the discourse state is stored. Each recipe specifies ...
      Read Full Article
    22. Compilation of weighted finite-state transducers from decision trees

      A method for automatically converting a decision tree into one or more weighted finite-state transducers. Specifically, the method in accordance with an illustrative embodiment of the present invention processes one or more terminal (i.e., leaf) nodes of a given decision tree to generate one or more corresponding weighted rewrite rules. Then, these weighted rewrite rules are processed to generate weighted finite-state transducers corresponding to the one or more terminal nodes of the decision tree. In this manner, decision trees may be advantageously compiled into weighted finite-state transducers, and these transducers may then be used directly in various speech and ...
      Read Full Article
    23. Text processor

      A text enhancement method and apparatus for the presentation of text for improved human reading. The method includes extracting text specific attributes from machine readable text and varying the text presentation in accordance with the attributes. The preferred embodiment of the method: extracts parts of speech and punctuation from a sentence, applies folding rules which use the parts of speech to determine folding points, uses the folding points to divide the sentence into text segments, applies horizontal displacement rules to determine horizontal displacement for the text segments, and presents the text segments each on a new line and having the ...
      Read Full Article
    24. Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals

      Knowledge based speech recognition apparatus and methods are provided for translating an input speech signal to text. The speech recognition apparatus captures an input speech signal, segments it based on the detection of pitch period, and generates a series of hypothesized acoustic feature vectors for the input speech signal that characterizes the signal in terms of primary acoustic events, detectable vowel sounds and other acoustic features. The apparatus and methods employ a largely speaker-independent dictionary based upon the application of phonological and phonetic/acoustic rules to generate acoustic event transcriptions against which the series of hypothesized acoustic feature vectors are ...
      Read Full Article
    889-912 of 919 « 1 2 ... 35 36 37 38 39 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles

  3. Organizations in the News

    1. (24 articles) NLP
    2. (22 articles) Microsoft
    3. (15 articles) IBM
    4. (14 articles) Apac
    5. (13 articles) USD
    6. (13 articles) Cagr
    7. (13 articles) Service
    8. (12 articles) Market Data Tables
    9. (12 articles) Intel
    10. (12 articles) SMEs
    11. (12 articles) Google
    12. (10 articles) Region
  4. Locations in the News

    1. (29 articles) India
    2. (20 articles) Japan
    3. (19 articles) Germany
    4. (17 articles) Pune
    5. (14 articles) China
    6. (13 articles) France
    7. (10 articles) Canada
    8. (10 articles) Mexico
    9. (8 articles) Africa
    10. (8 articles) Spain
    11. (6 articles) South Korea
    12. (5 articles) Brazil