1. Articles in category: Segmentation

    769-792 of 923 « 1 2 ... 30 31 32 33 34 35 36 ... 37 38 39 »
    1. Google Puts Human Touch Into Machine Translation

      Google Puts Human Touch Into Machine Translation
      Google has announced the launch of the Google Translator toolkit , an editor designed to give translators an easy means of bringing the "human touch" to machine translation, which everybody knows is often flawed. Michael Galvez and Sanjay Bhansali of the Google Translator Toolkit team explain how the toolkit works: For example, if an Arabic-speaking reader wants to translate a Wikipedia™ article ...
      Read Full Article
    2. The importance of input representations

      The importance of input representations
      As some of you know, I run a (machine learning) reading group every semester. This summer we're doing "assorted" topics, which basically means students pick a few papers from the past 24 months that are related and present on them. The week before I went out of town, we read two papers about inferring features from raw data; one was a deep learning approach; the other was more Bayesian. (As a total aside, I found it funny that in the latter paper they talk a lot about trying to find independent features, but in all cog sci papers I ...
      Read Full Article
    3. Using source-channel models for word segmentation

      BACKGROUND OF THE INVENTIONThe present invention relates to segmenting text. In particular, the present invention relates to segmenting text that is not delimited by spaces.In many languages, such as Chinese and Japanese, it is difficult to segment characters into words because the words are not delimited by spaces. Methods used in the past to perform such segmentation can roughly be classified intodictionary-based methods or statistical-based methods.In dictionary-based methods, substrings of c
      Read Full Article
    4. Conceptual world representation natural language understanding system and method

      A portion of the disclosure recited in the specification contains material which is subject to copyright protection. This application includes a compact diskappendix containing source code listings that list instructions for a system and method by which the present invention may be practiced in a computer system. Two identical copies of the source code listing, volume name L&C;, comprising 959 files,6,598,598 bytes, are provided on compact disks created on Jul. 4, 2002. The copyright owner has n
      Read Full Article
    5. Image-based document indexing and retrieval

      A system that facilitates document retrieval and/or indexing is provided. A component receives an image of a document, and a search component searches data store(s) for a match to the document image. The match is performed over word-level topological properties of images of documents stored in the data store(s).
      Read Full Article
    6. Word processing with artificial language validation

      BACKGROUNDThe present invention relates to data processing by digital computer, and more particularly to word processing.Word processing systems (also referred to as word processors) allow users to create documents, primarily textual documents that might otherwise be prepared on a typewriter. Users can also edit, print or save the documents using the wordprocessor. Such documents will be referred to as word processing documents.Modern word processors offer a greater range of functions than the f
      Read Full Article
    7. Chinese character-based parser

      BACKGROUND OF THEINVENTION1. Technical FieldThe present invention relates to data processing and, in particular, to parsing Chinese character streams. Still more particularly, the present invention provides word segmentation, part-of-speech tagging and parsing for Chinese characters.2. Description of Related ArtThere are many natural language processing (NLP) applications, such as machine translation (MT) and question answering systems, that use structural information of a sentence. As word segm
      Read Full Article
    8. Effects of Repair Support Agent for Accurate Multilingual Communication

      Translation repair plays an important role in intercultural communication using machine translation. It can be used to create messages that have very few translation mistakes. However, translation repair is a laborious task. It is important to carry out translation repair efficiently. Therefore, we propose a repair support agent that provides the segments that have not been translated accurately. We perform experiments on the translation repair efficiency to evaluate the effectiveness of the repair support agent. The results of these experiments are as follows. (1) Providing inaccurately translated segments improves the ability to detect inaccurate segments. (2) The inaccurate-judgment rate can ...
      Read Full Article
    9. Method and apparatus for window matching in delta compressors

      The present invention relates generally to data compression and, more particularly, to a method for efficient window partition matching indelta compressors to enhance compression performance based on the idea of modeling a dataset with the frequencies of its n-grams.BACKGROUND OF THE INVENTIONCompression programs routinely limit the data to be compressed together in segments called windows. The process of doing this is called windowing. Delta compression techniques were developed to compress a t
      Read Full Article
      Mentions: lamda Bell Labs
    10. Development and evaluation of a clinical note section header terminology.

      Related Articles Development and evaluation of a clinical note section header terminology. AMIA Annu Symp Proc. 2008;:156-60 Authors: Denny JC, Miller RA, Johnson KB, Spickard A Clinical documentation is often expressed in natural language text, yet providers often use common organizations that segment these notes in sections, such as history of present illness or physical examination. We developed a hierarchical section header terminology, supporting mappings to LOINC and other vocabularies; it contained 1109 concepts and 4332 synonyms. Physicians evaluated it compared to LOINC and the Evaluation and Management billing schema using a randomly selected corpus of history and physical ...
      Read Full Article
    11. Systems and methods for interactive topic-based text summarization

      INCORPORATION BY REFERENCEThis Application incorporates by reference: entitled "SYSTEMS AND METHODS FOR DETERMINING THE TOPIC STRUCTURE OF A PORTION OF TEXT" by I. Tsochantaridis et al., filed Mar.22, 2002 as U.S. patent application Ser. No. 10/103,053; entitled"SYSTEMS AND METHODS FOR DISPLAYING INTERACTIVE TOPIC BASED TEXT SUMMARIES" by F. Chen et al., filed Dec. 16, 2002, as U.S. patent application Ser. No. 10/319,545; entitled "SYSTEMS AND METHODS FOR SENTENCE BASED INTERACTIVE TOPIC BASED T
      Read Full Article
    12. CoZo+ - A Content Zoning Engine for textual documents. (arXiv:0811.0453v1 [cs.CL])

      Content zoning can be understood as a segmentation of textual documents into zones. This is inspired by [6] who initially proposed an approach for the argumentative zoning of textual documents. With the prototypical CoZo+ engine, we focus on content zoning towards an automatic processing of textual streams while considering only the actors as the zones. We gain information that can be used to realize an automatic recognition of content for pre-defined actors. We understand CoZo+ as a necessary pre-step towards an automatic generation of summaries and to make intellectual ownership of documents detectable.
      Read Full Article
    13. Compound word breaker and spell checker

      CROSS-REFERENCE TO RELATED APPLICATIONSReference is hereby made to the following co-pending and commonly assigned patent applications: U.S. application Ser. No. 10/804,883, filed Mar. 19, 2004, entitled "SYSTEM AND METHOD FOR PERFORMING ANALYSIS ON WORD VARIANTS" and U.S. application Ser. No. 10/804,998, filed Mar. 19, 2004, entitled "FULL-FORM LEXICON WITH TAGGED DATA AND METHODS OF CONSTRUCTING AND USING THE SAME", both of which are incorporated by reference in their entirety.BACKGROUND OF THE
      Read Full Article
    14. Analyse spectrale des textes: d\'etection automatique des fronti\`eres de langue et de discours. (arXiv:0810.1212v1 [cs.CL])

      We propose a theoretical framework within which information on the vocabulary of a given corpus can be inferred on the basis of statistical information gathered on that corpus. Inferences can be made on the categories of the words in the vocabulary, and on their syntactical properties within particular languages. Based on the same statistical data, it is possible to build matrices of syntagmatic similarity (bigram transition matrices) or paradigmatic similarity (probability for any pair of words to share common contexts). When clustered with respect to their syntagmatic similarity, words tend to group into sublanguage vocabularies, and when clustered with respect ...
      Read Full Article
      Mentions: Markov
    15. Aligning lay and specialized passages in comparable medical corpora.

      Related Articles Aligning lay and specialized passages in comparable medical corpora. Stud Health Technol Inform. 2008;136:89-94 Authors: Deleger L, Zweigenbaum P While the public has increasingly access to medical information, specialized medical language is often difficult for non-experts to understand and there is a need to bridge the gap between specialized language and lay language. As a first step towards this end, we describe here a method to build a comparable corpus of expert and non-expert medical French documents and to identify similar text segments of lay and specialized language. Among the top 400 pairs of text segments ...
      Read Full Article
    16. Multimodal Processing

      With a multimedia document, its semantics are embedded in multiple forms that are usually complimentary each other. For example, a live report on TV about a tsunami conveys information that is far beyond what we read from the newspaper. Therefore, it is necessary to analyze all types of data: image frames, sound tracks, text that can be extracted from image frames, and spoken words that can be deciphered from the audio track [Wang00]. For some applications, automated techniques that process single media, for example, audio or images, may be error-prone, and multimodal processing is used to improve the overall system ...
      Read Full Article
    17. Text Processing

      Text provides crucial cues for understanding content. For example, the closed captions in broadcast television programs and subtitles in DVD movies facilitate video consumption for viewers. When a transcript is not available for certain content, automatic speech recognition can be used to extract linguistic information. Text information is much more concise than corresponding audio or video. The reason is that we need language knowledge to understand text, and the knowledge itself does not need to be embedded in the text data. For example, we only need five characters to express a “plane,” but to show a video clip of plane ...
      Read Full Article
    18. Automatic extraction of translations from web-based bilingual materials

      Abstract  This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares web-based translation texts of Statistics Canada (StatCan) news releases in the StatCan publication The Daily. The goal is to extract translations for translation memory systems, for translation terminology building, for cross-language information retrieval and for corpus-based machine translation systems. Three years of officially published statistical news release texts at http://www.statcan.ca were collected to compose the StatCan Daily data bank. The English and French texts in this collection were roughly aligned using the Gale-Church statistical algorithm. After ...
      Read Full Article
    769-792 of 923 « 1 2 ... 30 31 32 33 34 35 36 ... 37 38 39 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles

  3. Organizations in the News

    1. (22 articles) NLP
    2. (19 articles) Microsoft
    3. (17 articles) Cagr
    4. (13 articles) USD
    5. (13 articles) SMEs
    6. (12 articles) Region
    7. (12 articles) Service
    8. (12 articles) Apac
    9. (12 articles) IBM
    10. (11 articles) Market Data Tables
    11. (11 articles) Intel
    12. (11 articles) Google
  4. Locations in the News

    1. (29 articles) India
    2. (20 articles) Germany
    3. (18 articles) Japan
    4. (18 articles) Pune
    5. (13 articles) France
    6. (12 articles) China
    7. (10 articles) Mexico
    8. (8 articles) Canada
    9. (8 articles) Spain
    10. (7 articles) Switzerland
    11. (7 articles) Netherlands
    12. (7 articles) Brazil