Articles in category: Segmentation

    1. Method and system of selecting word sequence for text written in language without word boundary markers

      The present disclosure discloses a method and apparatus of selecting a word sequence for a text written in a language without word boundary in order to solve the problem of having excessively large computation load when selecting an optimal word sequence in existing technologies. The disclosed method includes: segmenting a segment of the text to obtain different word sequences; determining a common word boundary for the word sequences; and performing optimal word sequence selection for portions of the word sequences prior to the common word boundary. Because optimal word sequence selection is performed for portions of word sequences prior to ...

    2. Semi-supervised training for statistical word alignment

      A system and method for aligning words in parallel segments is provided. A first probability distribution of word alignments within a first corpus comprising unaligned word-level parallel segments according to a model estimate is calculated. The model estimate is modified according to the first probability distribution. One or more sub-models associated with the modified model estimate are discriminatively re-ranked according to word-level annotated parallel segments. A second probability distribution of the word alignments within the first corpus is calculated according to the re-ranked sub-models associated with the modified model estimate.
    3. Automatic segmentation of video

      Content items may be segmented and labeled by topic to provide for the capture, analysis, indexing, retrieval and/or distribution of information within information rich media, such as audio or video, with greater functionality, accuracy and speed. The segments and other related information may be stored in a database and made accessible to users through, for example, a search service and/or an on-demand service. Automatic segmentation may include receiving a text representation, calculating relevance intervals based on the text representation, determining a nodal representation based on the relevance intervals, and determining segments of the content item based on the ...
    4. Method and apparatus for detection of sentiment in automated transcriptions

      A method for automatically detecting sentiments in an audio signal of an interaction held in a call center, including, receiving the audio signal from a logging and capturing unit. Performing audio analysis on the audio signal to obtain text spoken within the interaction. Segmenting the text into context units according to acoustic information acquired from the audio signal to identify units of speech bound by non-speech segments, wherein each context unit includes one or more words. Extracting a sentiment candidate context unit from the context units using a phonetic based search. Extracting linguistic features from the text of the sentiment ...
    5. Character-based automated media summarization

      Methods, devices, systems and tools are presented that allow the summarization of text, audio, and audiovisual presentations, such as movies, into less lengthy forms. High-content media files are shortened in a manner that preserves important details, by splitting the files into segments, rating the segments, and reassembling preferred segments into a final abridged piece. Summarization of media can be customized by user selection of criteria, and opens new possibilities for delivering entertainment, news, and information in the form of dense, information-rich content that can be viewed by means of broadcast or cable distribution, "on-demand" distribution, internet and cell phone digital ...
    6. Systems and methods for defining and processing text segmentation rules

      Computer-implemented methods and systems are provided for text segmentation of textual data. Rules are accessed that define how the input stream is to be segmented into textual data elements through pattern matching. The one or more rules are applied to the input stream to determine the textual data elements in the input stream which are then provided as output.
    7. An Empirical Study for Determining Relevant Features for Sentiment Summarization of Online Conversational Documents

      The phenomenon of big data makes managing, processing, and extracting valuable information from the Web an increasingly challenging task. As such, the abundance of user-generated content with opinions about products or brands requires appropriate tools in order to be able to capture consumer sentiment. Such tools can be used to aggregate content by means of sentiment summarization techniques, extracting text segments that reflect the overall sentiment of a text in a compressed form. We explore what features distinguish relevant from irrelevant text segments in terms of the extent to which they reflect the overall sentiment of conversational documents. In our ...
    8. Exploration of N-gram Features for the Domain Adaptation of Chinese Word Segmentation

      A key problem in Chinese Word Segmentation is that the performance of a system will decrease when applied to a different domain. We propose an approach in which n-gram features from large raw corpus are explored to realize domain adaptation for Chinese Word Segmentation. The n-gram features include n-gram frequency feature and AV feature. We used the CRF model and a raw corpus consisting of 1 million patent description sentences to verify the proposed method. For test data, 300 patent description sentences are randomly selected and manually annotated. The results show that the improvement of Chinese Word Segmentation on the ...
    9. Predicting V(D)J Recombination Using Conditional Random Fields

      V(D)J gene segments undergo combinatorial recombination in the T-cells and B-cells to provide humans and other vertebrates with a large number of antibodies required for immunity. Each such recombination further undergoes mutations in their DNA sequences so that they can recognize diverse antigens. Predicting the combination of gene segments which formed a particular antibody is an essential task for studying disease propagation and analysis. We propose a model based on conditional random fields (CRFs) for predicting the boundary positions between V-D-J gene segments. We train the CRFs by generating synthetic gene recombinations using all of the alleles of ...
    10. Information retrieving apparatus, information retrieving method, information retrieving program, and recording medium on which information retrieving program is recorded

      The present invention provides an information retrieving apparatus and the like which replies a search result accurately to a question from the user. In the present invention, sentence information of a sentence in collected documents is stored, information of a questioning sentence from the user is received from a terminal 2, the questioning sentence from the user is decomposed into segments (S10), documents having common arc segments are extracted from segments in the questioning sentence from the user, the documents are compared with the questioning sentence, and a leaf segment missing in the questioning sentence is retrieved (S12 to S16 ...
    11. A Novel System for Unlabeled Discourse Parsing in the RST Framework

      This paper presents UDRST, an unlabeled discourse parsing system in the RST framework. UDRST consists of a segmentation model and a parsing model. The segmentation model exploits subtree features to rerank N-best outputs of a base segmenter, which uses syntactic and lexical features in a CRF framework. In the parsing model, we present two algorithms for building a discourse tree from a segmented text: an incremental algorithm and a dual decomposition algorithm. Our system achieves 77.3% in the unlabeled score on the standard test set of the RST Discourse Treebank corpus, which improves 5.0% compared to HILDA [6 ...
    12. Sign Segmentation Using Dynamics and Hand Configuration for Semi-automatic Annotation of Sign Language Corpora

      This paper address the problem of sign language video annotation. Nowadays sign language segmentation is manually performed. This is time consuming, error prone and no reproducible. In this paper we intend to provide an automatic approach to segment signs. We use a particle filter based approach to track hands and head. Motion features are used to classify segments performed with one or two hands and to detect events. Events that have been detected in the middle of a sign are removed considering hand shape features. Hand shape is characterized using similarity measurements. Evaluation has been performed and has shown the ...
    13. Segmenting printed media pages into articles

      Methods and systems for segmenting printed media pages into individual articles quickly and efficiently. A printed media based image that may include a variety of columns, headlines, images, and text is input into the system which comprises a block segmenter and a article segmenter system. The block segmenter identifies and produces blocks of textual content from a printed media image while the article segmenter system determines which blocks of textual content belong to one or more articles in the printed media image based on a classifier algorithm. A method for segmenting printed media pages into individual articles is also presented.
    14. System and method for supporting document navigation on mobile devices using segmentation and keyphrase summarization

      Described is system that characterizes segments of document with one or more keyphrases and then uses keyphrases to help users find interesting parts of document. Keyphrases are displayed with information about the location of the phrase in the document and are used as pointers to quickly move to from overview to section of potential interest. In another implementation, when there are many documents in a collection, inventive multi-document view can be used to reduce number of documents presented, helping user to more efficiently find documents of interest. In this view, a user (possibly repeatedly) filters documents displayed based on metadata ...
    15. Subjective Testing System Based on Chinese Word Segmentation

      Subjective testing system based on computer can improve the efficiency of marking the papers greatly. However, the subjective testing system which is used widely at present can only solve the scoring process of objective test which have given answers. Because of the flexibility of objective test’s answers, researching the scoring of objective test becomes the problem needed to be solved in the examination system. In this paper, we put forward a subjective testing system based on Chinese word segmentation. We described the introduction, requirement analysis and the core modular of the system in detail. At last, we do the ...
    16. Labeling TV Stream Segments with Conditional Random Fields

      In this paper, we consider the issue of structuring large TV streams. More precisely, we focus on the labeling problem: once segments have been extracted from the stream, the problem is to automatically label them according to their type (eg. programs vs. commercial breaks). In the literature, several machine learning techniques have been proposed to solve this problem: Inductive Logic Programming, numeric classifiers like SVM or decision trees... In this paper, we assimilate the problem of labeling segments to the problem of labeling a sequence of data. We propose to use a very effective approach based on another classifier: the ...
      Mentions: France Rennes SVM
    17. Generating Xpath Expressions for Structured Web Data Record Segmentation

      Record segmentation is a core problem in structured web data extraction. In this paper we present a novel technique that segments structured web data into individual data records that come from underlying database. Proposed technique exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to segment data records. During the segmentation process the technique also generates Xpath expressions. These expressions can be later used to directly extract data records from same template generated web pages without need to redo all the clustering ...
    18. Dynamic translation memory using statistical machine translation

      A translation method comprises: retrieving a fuzzy match text segment translation pair from a translation memory (TM) for an input source language text segment, the fuzzy match text segment translation pair comprising a fuzzy source language text segment having a fuzzy match to the input source language text segment and a corresponding translated target language text segment; extracting from the fuzzy match text segment translation pair an exact match phrase pair comprising a source language phrase that exactly matches a phrase of the input source language text segment and a corresponding translated target language phrase; and invoking a statistical machine ...
    19. Optimizing Non-Decomposable Loss Functions in Structured Prediction.

      Optimizing Non-Decomposable Loss Functions in Structured Prediction. IEEE Trans Pattern Anal Mach Intell. 2012 Aug 2; Authors: Ranjbar M, Lan T, Wang Y, Robinovitch SN, Li ZN, Mori G Abstract We develop an algorithm for structured prediction with non-decomposable performance measures. The algorithm learns parameters of Markov random fields and can be applied to multivariate performance measures. Examples include performance measures such as F_\beta score (natural language processing), intersection over union (object category segmentation), Precision/Recall at k (search engines) and ROC area (binary classifiers). We attack this optimization problem by approximating the loss function with a piecewise linear ...
      Mentions: Markov Wang Y
    20. Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases

      Methods and systems are described that involve recognizing complex entities from text documents with the help of structured data and Natural Language Processing (NLP) techniques. In one embodiment, the method includes receiving a document as input from a set of documents, wherein the document contains text or unstructured data. The method also includes identifying a plurality of text segments from the document via a set of tagging techniques. Further, the method includes matching the identified plurality of text segments against attributes of a set of predefined entities. Lastly, a best matching predefined entity is selected for each text segment from ...
    21. Automatic Segmentation of Manipuri (Meiteilon) Word into Syllabic Units. (arXiv:1207.3932v1 [cs.CL])

      The work of automatic segmentation of a Manipuri language (or Meiteilon) word into syllabic units is demonstrated in this paper. This language is a scheduled Indian language of Tibeto-Burman origin, which is also a very highly agglutinative language. This language usages two script: a Bengali script and Meitei Mayek (Script). The present work is based on the second script. An algorithm is designed so as to identify mainly the syllables of Manipuri origin word. The result of the algorithm shows a Recall of 74.77, Precision of 91.21 and F-Score of 82.18 which is a reasonable score with ...
      Mentions: Meitei Mayek
    22. Color Image Segmentation

      Splitting an input image into connected sets of pixels is the purpose of image segmentation. The resulting sets, called regions, are defined based on visual properties extracted by local features. To reduce the gap between the computed segmentation and the one expected by the user, these properties tend to embed the perceived complexity of the regions and sometimes their spatial relationship as well. Therefore, we developed different segmentation approaches, sweeping from classical color texture to recent color fractal features, in order to express this visual complexity and show how it can be used to express homogeneity, distances, and similarity measures ...
      Mentions: Koblenz CSC
    23. Inference-driven multi-source semantic search

      A method, system and computer program product are disclosed for searching for information using a knowledge base. In one embodiment, the method comprises receiving a query; formulizing the query, including dividing the query into a plurality of parts; for each of the parts, identifying a source, using the knowledge, that addresses that part; and combining the sources to answer the query. In one embodiment, the query includes text; the text is separated into a plurality of segments; and, for each of the segments, at least one source is identified addressing the segment. In an embodiment, a logical proof is formulated ...
