1. Articles in category: Segmentation

    793-816 of 923 « 1 2 ... 31 32 33 34 35 36 37 38 39 »
    1. Word segmentation based on database semantics in NChiql

      Abstract  In this paper a novel word-segmentation algorithm is presented to delimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable literatures on Chinese segmentation, they cannot satisfy particular requirements in this system. The novel word-segmentation algorithm is based on the database semantics, namely Semantic Conceptual Model (SCM) for specific domain knowledge. Based on SCM, the segmenter labels the database semantics to words directly, which eases the disambiguation and translation (from natural language to database query) in NChiql. Content Type Journal ArticleDOI 10.1007/BF02948870Authors Xiaofeng Meng, Renmin ...
      Read Full Article
    2. Statistical Properties of Overlapping Ambiguities in Chinese Word Segmentation and a Strategy for Their Disambiguation

      Overlapping ambiguity is a major ambiguity type in Chinese word segmentation. In this paper, the statistical properties of overlapping ambiguities are intensively studied based on the observations from a very large balanced general-purpose Chinese corpus. The relevant statistics are given from different perspectives. The stability of high frequent maximal overlapping ambiguities is tested based on statistical observations from both general-purpose corpus and domain-specific corpora. A disambiguation strategy for overlapping ambiguities, with a predefined solution for each of the 5,507 pseudo overlapping ambiguities, is proposed consequently, suggesting that over 42% of overlapping ambiguities in Chinese running text could be solved ...
      Read Full Article
    3. A Comparison of Language Models for Dialog Act Segmentation of Meeting Transcripts

      This paper compares language modeling techniques for dialog act segmentation of multiparty meetings. The evaluation is twofold; we search for a convenient representation of textual information and an efficient modeling approach. The textual features capture word identities, parts-of-speech, and automatically induced classes. The models under examination include hidden event language models, maximum entropy, and BoosTexter. All presented methods are tested using both human-generated reference transcripts and automatic transcripts obtained from a state-of-the-art speech recognizer. Content Type Book ChapterDOI 10.1007/978-3-540-87391-4_17Authors Jáchym Kolář, University of West Bohemia Department of Cybernetics at Faculty of Applied Sciences Univerzitní 8 CZ-306 14 Plzeň ...
      Read Full Article
      Mentions: Czech Republic
    4. Integration of Named Entity Information for Chinese Word Segmentation Based on Maximum Entropy

      Word segmentation is an essential process in Chinese information processing. Although related researches were reported and made progresses, the Unknown Named Entity (UNE) problem in segmentation is not fully solved. This usually degrades the accuracy of segmentation in general. In this paper, a model to identify UNEs for improving the overall performance of the segmentation is presented. In order to capture the NE information, functions of characters or words are defined with tags. In addition, useful surrounding contexts are collected from a corpus and used as features. The model is constructed based on Maximum Entropy to handle the UNE identification ...
      Read Full Article
      Mentions: China Macau
    5. Full-form lexicon with tagged data and methods of constructing and using the same

      CROSS-REFERENCE TO RELATED APPLICATIONSReference is hereby made to the following co-pending and commonly assigned patent applications: U.S. application Ser. No. 10/804,930, filed Mar. 19, 2004, entitled "Compound Word Breaker and Spell Checker" and U.S. application Ser. No.10/804,883, filed Mar. 19, 2004, entitled "System and Method for Performing Analysis on Word Variants", both of which are incorporated by reference in their entirety.BACKGROUND OF THE INVENTIONThe present invention relates to
      Read Full Article
    6. Automatic Annotation of Direct Reported Speech in Arabic and French, According to a Semantic Map of Enunciative Modalities

      We present an analysis of the linguistic markers of the enunciative modalities in direct reported speech, in a multilingual framework concerning Arabic and French. Furthermore, we present a platform for automatic annotation of semantic relations, based on the Contextual Exploration method. This platform allows the automatic annotation and categorisation of quotational segments in both languages, exploiting a semantic map based on the notion of speaker commitment in enunciation. Content Type Book ChapterDOI 10.1007/978-3-540-85287-2_5Authors Motasem Alrahabi, Maison de la Recherche – Université de Paris-Sorbonne 28, Rue Serpente 75006 Paris FranceJean-Pierre Desclés, Maison de la Recherche – Université de Paris-Sorbonne 28, Rue ...
      Read Full Article
      Mentions: France Paris
    7. Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

      The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 ...
      Read Full Article
    8. Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

      Abstract  This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the expressions of pitch declination and reset. Our data-oriented study shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker normalized pitch features due to their large variations across different Mandarin syllable tones. We thus propose to use speaker- and tone-normalized pitch features that can provide clear separations between utterance and story boundaries. Our study also shows that speaker-normalized pause duration is quite effective ...
      Read Full Article
    9. Question Answering from Lecture Videos Based on Automatically-Generated Learning Objects

      In the past decade, we have witnessed a dramatic increase in the availability of online academic lecture videos. There are technical problems in the use of recorded lectures for learning: the problem of easy access to the multimedia lecture video content and the problem of finding the semantically appropriate information very quickly. The retrieval of audiovisual lecture recordings is a complex task comprising many objects. In our solution, speech recognition is applied to create a tentative and deficient transcription of the lecture video recordings. The transcription and the words from the power point slides are sufficient to generate semantic metadata ...
      Read Full Article
      Mentions: Germany Potsdam
    10. A Joint Segmenting and Labeling Approach for Chinese Lexical Analysis

      This paper introduces an approach which jointly performs a cascade of segmentation and labeling subtasks for Chinese lexical analysis, including word segmentation, named entity recognition and part-of-speech tagging. Unlike the traditional pipeline manner, the cascaded subtasks are conducted in a single step simultaneously, therefore error propagation could be avoided and the information could be shared among multi-level subtasks. In this approach, Weighted Finite State Transducers (WFSTs) are adopted. Within the unified framework of WFSTs, the models for each subtask are represented and then combined into a single one. Thereby, through one-pass decoding the joint optimal outputs for multi-level processes will ...
      Read Full Article
    11. An unsupervised machine learning approach to segmentation of clinician-entered free text.

      An unsupervised machine learning approach to segmentation of clinician-entered free text. AMIA Annu Symp Proc. 2007;:811-5 Authors: Wrenn J, Stetson PD, Johnson SB Natural language processing, an important tool in biomedicine, fails without successful segmentation of words and sentences. Tokenization is a form of segmentation that identifies boundaries separating semantic units, for example words, dates and numbers, within a text. We sought to construct a highly generalizeable tokenization algorithm with no prior knowledge of characters or their function, based on the inherent statistical properties of token and sentence boundaries. Tokenizing clinician-entered free text, we achieved precision and recall of ...
      Read Full Article
    12. System for identifying paraphrases using machine translation

      BACKGROUND OF THE INVENTIONThe present invention deals with identifying paraphrases in text. More specifically, the present invention deals with using machine translation techniques to identify and generate paraphrases.The recognition and generation of paraphrases is a key facet to many applications of natural language processing systems. Being able to identify that two different pieces of text are equivalent in meaning enables a system to behave much moreintelligently. A fundamental goal of wor
      Read Full Article
    13. Evaluating machine translation with LFG dependencies

      Abstract  In this paper we show how labelled dependencies produced by a Lexical-Functional Grammar parser can be used in Machine Translation evaluation. In contrast to most popular evaluation metrics based on surface string comparison, our dependency-based method does not unfairly penalize perfectly valid syntactic variations in the translation, shows less bias towards statistical models, and the addition of WordNet provides a way to accommodate lexical differences. In comparison with other metrics on a Chinese–English newswire text, our method obtains high correlation with human scores, both on a segment and system level. Content Type Journal ArticleDOI 10.1007/s10590-008-9038-1Authors Karolina ...
      Read Full Article
    14. An Evidence-Based Approach to Handle Semantic Heterogeneity in Interoperable Distributed User Models

      Nowadays, the idea of personalization is regarded as crucial in many areas. This requires quick and robust approaches for developing reliable user models. The next generation user models will be distributed (segments of the user model will be stored by different applications) and interoperable (systems will be able to exchange and use user model fractions to enrich user experiences). We propose a new approach to deal with one of the key challenges of interoperable distributed user models - semantic heterogeneity. The paper presents algorithms to automate the user model exchange across applications based on evidential reasoning and advances in the Semantic ...
      Read Full Article
    15. The canonical processes of a dramatized approach to information presentation

      Abstract  This paper describes the application “Carletto the spider” in terms of the mapping with the canonical processes of media production. “Carletto the spider” is a character-based guide to a historical site and implements the Dramatour approach for the design of drama-based interactive presentations. Dramatization makes presentations more engaging, thus improving the reception of the content by the user. The major technical issue of the approach is the segmentation of the presentation into audiovisual units that are edited on-the-fly in a way that guarantees dramatic continuity while adapting to the user response. We describe the workflow of the application and ...
      Read Full Article
    16. System and method for performing analysis on word variants

      BACKGROUND OF THE INVENTIONThe present invention is related to natural language processing. More particularly, the present invention is related to natural language systems and methods for processing words and associated word-variant forms, such as verb-clitic forms, inone of a range of languages, for example Spanish, when they are encountered in a textual input.Numerous natural language processing applications rely upon a lexicon for operation. Such applications include word breaking (for search
      Read Full Article
    17. Dynamic Browsing of Audiovisual Lecture Recordings Based on Automated Speech Recognition

      The number of digital lecture video recordings has increased dramatically since recording technology became available. The accessibility and the search inside of this large archive are limited and difficult. Manual annotation and segmentation is time-consuming and useless. A promising approach is based on using the audio layer of a lecture recording to get information about the lecture contents. In this paper, we are presenting a retrieval method and a user-interface based on existing recorded lectures. A deficient transcription from a speech recognition engine (SRE) is sufficient for browsing in the video-archive. A user-interface for dynamic browsing of the e-learning contents ...
      Read Full Article
      Mentions: Germany Potsdam GmbH
    18. Chinese Word Segmentation for Terrorism-Related Contents

      In order to analyze security and terrorism related content in Chinese, it is important to perform word segmentation on Chinese documents. There are many previous studies on Chinese word segmentation. The two major approaches are statistic-based and dictionary-based approaches. The pure statistic methods have lower precision, while the pure dictionary-based method cannot deal with new words and are restricted to the coverage of the dictionary. In this paper, we propose a hybrid method that avoids the limitations of both approaches. Through the use of suffix tree and mutual information (MI) with the dictionary, our segmenter, called IASeg, achieves a high ...
      Read Full Article
    19. Speech and sliding text aided sign retrieval from hearing impaired sign news videos

      Abstract  The objective of this study is to automatically extract annotated sign data from the broadcast news recordings for the hearing impaired. These recordings present an excellent source for automatically generating annotated data: In news for the hearing impaired, the speaker also signs with the hands as she talks. On top of this, there is also corresponding sliding text superimposed on the video. The video of the signer can be segmented via the help of either the speech or both the speech and the text, generating segmented, and annotated sign videos. We call this application as Signiary, and aim to ...
      Read Full Article
      Mentions: Berlin Heidelberg
    20. System and method for semantic video segmentation based on joint audiovisual and text analysis

      System and method for partitioning a video into a series of semantic units where each semantic unit relates to a generally complete thematic topic. A computer implemented method for partitioning a video into a series of semantic units wherein each semantic unit relates to a theme or a topic, comprises dividing a video into a plurality of homogeneous segments, analyzing audio and visual content of the video, extracting a plurality of keywords from the speech content of each of the plurality of homogeneous segments of the video, and detecting and merging a plurality of groups of semantically related and temporally ...
      Read Full Article
    21. A Survey of Chinese Text Similarity Computation

      There is not a natural delimiter between words in Chinese texts. Moreover, Chinese is a semotactic language with complicated structures focusing on semantics. Its differences from Western languages bring more difficulties in Chinese word segmentation and more challenges in Chinese natural language understanding. How to compute the Chinese text similarity with high precision, recall and low cost is a very important but challenging task. Many researchers have studied it for long time. In this paper, we examine existing Chinese text similarity measures, including measures based on statistics and semantics. Our work provides insights into the advantages and disadvantages of each ...
      Read Full Article
    22. Semi-joint Labeling for Chinese Named Entity Recognition

      Named entity recognition (NER) is an essential component of text mining applications. In Chinese sentences, words do not have delimiters; thus, incorporating word segmentation information into an NER model can improve its performance. Based on the framework of dynamic conditional random fields, we propose a novel labeling format, called semi-joint labeling which partially integrates word segmentation information and named entity tags for NER. The model enhances the interaction of segmentation tags and NER achieved by traditional approaches. Moreover, it allows us to consider interactions between multiple chains in a linear-chain model. We use data from the SIGHAN 2006 NER bakeoff ...
      Read Full Article
      Mentions: Taipei Taiwan Hsinchu
    23. Recognizing Biomedical Named Entities in Chinese Research Abstracts

      Most research on biomedical named entity recognition has focused on English texts, e.g., MEDLINE abstracts. However, recent years have also seen significant growth of biomedical publications in other languages. For example, the Chinese Biomedical Bibliographic Database has collected over 3 million articles published after 1978 from 1600 Chinese biomedical journals. We present here a Conditional Random Field (CRF) based system for recognizing biomedical named entities in Chinese texts. Viewing Chinese sentences as sequences of characters, we trained and tested the CRF model using a manually annotated corpus containing 106 research abstracts (481 sentences in total). The features we used ...
      Read Full Article
    24. A Statistical Model for Topic Segmentation and Clustering

      This paper presents a statistical model for discovering topical clusters of words in unstructured text. The model uses a hierarchical Bayesian structure and it is also able to identify segments of text which are topically coherent. The model is able to assign each segment to a particular topic and thus categorizes the corresponding document to potentially multiple topics. We present some initial results indicating that the word topics discovered by the proposed model are more consistent compared to other models. Our early experiments show that our model clustering performance compares well with other clustering models on a real text corpus ...
      Read Full Article
    793-816 of 923 « 1 2 ... 31 32 33 34 35 36 37 38 39 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles

  3. Organizations in the News

    1. (22 articles) NLP
    2. (19 articles) Microsoft
    3. (17 articles) Cagr
    4. (13 articles) USD
    5. (13 articles) SMEs
    6. (12 articles) Region
    7. (12 articles) Service
    8. (12 articles) Apac
    9. (12 articles) IBM
    10. (11 articles) Market Data Tables
    11. (11 articles) Intel
    12. (11 articles) Google
  4. Locations in the News

    1. (29 articles) India
    2. (20 articles) Germany
    3. (18 articles) Japan
    4. (18 articles) Pune
    5. (13 articles) France
    6. (12 articles) China
    7. (10 articles) Mexico
    8. (8 articles) Canada
    9. (8 articles) Spain
    10. (7 articles) Switzerland
    11. (7 articles) Netherlands
    12. (7 articles) Brazil