1. Articles in category: Segmentation

    505-528 of 835 « 1 2 ... 19 20 21 22 23 24 25 ... 33 34 35 »
    1. Method and apparatus for constructing a link structure between documents

      TECHNICAL FIELDThe present invention relates to document information management technology, more particularly, relates to method and apparatus for constructing a link structure between documents.BACKGROUNDIn most cases, information is related to other information. Information is linked together via links and a link topology structure is formed. The link topology is important information about information. A typical example of important linkedsystems is WWW. The WWW is a hyperlinked collection. I
      Read Full Article
      Mentions: SIM
    2. A Transliteration Based Word Segmentation System for Shahmukhi Script

      Word Segmentation is an important prerequisite for almost all Natural Language Processing (NLP) applications. Since word is a fundamental unit of any language, almost every NLP system first needs to segment input text into a sequence of words before further processing. In this paper, Shahmukhi word segmentation has been discussed in detail. The presented word segmentation module is part of Shahmukhi-Gurmukhi transliteration system. Shahmukhi script is usually written without short vowels leading to ambiguity. Therefore, we have designed a novel approach for Shahmukhi word segmentation in which we used target Gurmukhi script lexical resources instead of Shahmukhi resources. We employ ...
      Read Full Article
    3. Self-adjusting Bootstrapping

      Bootstrapping has been used as a very efficient method to extract a group of items similar to a given set of seeds. However, the bootstrapping method intrinsically has several parameters whose optimal values differ from task to task, and from target to target. In this paper, first, we will demonstrate that this is really the case and serious problem. Then, we propose self-adjusting bootstrapping, where the original seed is segmented into the real seed and validation data. We initially bootstrap starting with the real seed, trying alternative parameter settings, and use the validation data to identify the optimal settings. This ...
      Read Full Article
    4. Word Segmentation for Dialect Translation

      This paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches for the translation of local dialects by exploiting linguistic information of the standard language. The method iteratively learns multiple segmentation schemes that are consistent with (1) the standard dialect segmentations and (2) the phrasal segmentations of an SMT system trained on the resegmented bitext of the local dialect. In a second step multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging ...
      Read Full Article
      Mentions: Japan Kyoto Russian
    5. System and method for call center dialog management

      A system and method for call center dialog management is disclosed. The method discloses: presenting a contact with a first call center dialog segment having a current call center dialog property; receiving from the contact a contact dialog segment; identifying a dialog property keyword within the contact dialog segment; replacing the current call center dialog property with a new call center dialog property in response to the dialog property keyword; and presenting a second call center dialog segment having the new call center dialog property to the contact. The system of the present invention, discloses means for implementing the method.
      Read Full Article
    6. Improving Text Segmentation with Non-systematic Semantic Relation

      Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration, which is a category of lexical cohesion, such as word repetition, synonym or superordinate. In this research, we investigate the non-systematic semantic relation, which is classified as collocation in lexical cohesion. This relation holds between two words or phrases in a discourse when ...
      Read Full Article
    7. Text retrieval from early printed books

      Abstract  Retrieving text from early printed books is particularly difficult because in these documents, the words are very close one to the other and, similarly to medieval manuscripts, there is a large use of ligatures and abbreviations. To address these problems, we propose a word indexing and retrieval technique that does not require word segmentation and is tolerant to errors in character segmentation. Two main principles characterize the approach. First, characters are identified in the pages and clustered with self-organizing map (SOM). During the retrieval, the similarity of characters is estimated considering the proximity of cluster centroids in the SOM ...
      Read Full Article
    8. Disfluency detection for a speech-to-speech translation system using phrase-level machine translation with weighted finite state transducers

      A computer-implemented method for creating a disfluency translation lattice includes providing a plurality of weighted finite state transducers including a translation model, a language model, and a phrase segmentation model as input, performing a cascaded composition of the weighted finite state transducers to create a disfluency translation lattice, and storing the disfluency translation lattice to a computer-readable media.
      Read Full Article
    9. Skeleton Simplification by Key Points Identification

      The current skeletonisation algorithms, based on thinning, extract the morphological features of an object in an image but the skeletonized objects are coarsely presented. This paper proposes an algorithm which goes beyond that approach by changing the coarse line segments into perfect “straight” line segments, obtaining points, angles, line segment size and proportions. Our technique is applied in the post-processing phase of the skeleton, which improves it no matter which skeletonisation technique is used, as long as the structure is made with one-pixel width continuous line segments. This proposal is a first step towards human activity recognition through the analysis ...
      Read Full Article
      Mentions: Spain Informatica
    10. A Local Generative Model for Chinese Word Segmentation

      This paper presents a local generative model for Chinese word segmentation, which has faster learning process than discriminative models and can do unsupervised learning. It has the ability to make use of larger resources. In this model, four successive characters are used to determine whether a character interval should be a word boundary or not. The Gibbs sampling algorithm, as well as three additional rules, is applied for the unsupervised learning. Besides words, the word candidates that are generated by our model can improve the performance of Chinese information retrieval. The experiments show that in supervised learning our method outperforms ...
      Read Full Article
      Mentions: Beijing China Boeing
    11. Sanskrit Compound Processor

      Sanskrit is very rich in compound formation. Typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components, discover the relations between them and finally generate a paraphrase of the compound. In this paper, we discuss the automatic segmentation and type identification of a compound using simple statistics that results from the manually annotated data. Content Type Book ChapterDOI 10.1007/978-3-642-17528-2_5Authors Anil Kumar, Department of Sanskrit Studies, University of Hyderabad, IndiaVipul Mittal, Language Technologies Research Centre, IIIT, Hyderabad, IndiaAmba Kulkarni, Department of Sanskrit Studies ...
      Read Full Article
    12. A word spotting framework for historical machine-printed documents

      Abstract  In this paper, we propose a word spotting framework for accessing the content of historical machine-printed documents without the use of an optical character recognition engine. A preprocessing step is performed in order to improve the quality of the document images, while word segmentation is accomplished with the use of two complementary segmentation methodologies. In the proposed methodology, synthetic word images are created from keywords, and these images are compared to all the words in the digitized documents. A user feedback process is used in order to refine the search procedure. The methodology has been evaluated in early Modern ...
      Read Full Article
    13. Transmembrane helix prediction using amino acid property features and latent semantic analysis

      Abstract Background  Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict ...
      Read Full Article
    14. Text Segmentation by Clustering Cohesion

      An automatic linear text segmentation in order to detect the best topic boundaries is a difficult and very useful task in many text processing systems. Some methods have tried to solve this problem with reasonable results, but they present some drawbacks as well. In this work, we propose a new method, called ClustSeg, based on a predefined window and a clustering algorithm to decide the topic cohesion. We compare our proposal against the best known methods, with a better performance against these algorithms. Content Type Book ChapterDOI 10.1007/978-3-642-16687-7_37Authors Raúl Abella Pérez, Advanced Technologies Application Centre (CENATAV), 7a #21812 ...
      Read Full Article
      Mentions: Computer Vision
    505-528 of 835 « 1 2 ... 19 20 21 22 23 24 25 ... 33 34 35 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles

  3. Organizations in the News

    1. (40 articles) Microsoft
    2. (34 articles) Google
    3. (21 articles) Apac
    4. (20 articles) Nuance Communications
    5. (19 articles) Intel
    6. (19 articles) SMEs
    7. (18 articles) Healthcare
    8. (18 articles) Service
    9. (18 articles) IBM
    10. (17 articles) IBM Corporation
    11. (17 articles) Bfsi
    12. (15 articles) NLP
  4. Locations in the News

    1. (29 articles) India
    2. (23 articles) Japan
    3. (22 articles) China
    4. (19 articles) Pune
    5. (18 articles) New York
    6. (14 articles) Canada
    7. (13 articles) Germany
    8. (12 articles) Africa
    9. (12 articles) France
    10. (9 articles) Washington
    11. (9 articles) Massachusetts
    12. (9 articles) California
  5. People in the News

    1. (4 articles) Laura Wood