    1. Deep Syntactic Analysis and Rule Based Accentuation in Text-to-Speech Synthesis

      With the emergence of the HMM-synthesis paradigm, producing natural, expressive prosody has become viable in speech synthesis. This paper describes the development of rule-based prominence prediction model for Finnish Text-to-Speech system, based on deep syntactic analysis and discourse structure. Content Type Book ChapterDOI 10.1007/978-3-540-87391-4_68Authors Antti Suni, University of Helsinki Department of Speech Sciences FinlandMartti Vainio, University of Helsinki Department of Speech Sciences Finland Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5246/2008 Book Text, Speech and DialogueDOI 10.1007/978-3-540-87391-4Print ISBN 978-3-540-87390-7
    2. Using Semantic Prototypes for Discourse Status Classification

      Discourse status is related to different aspects of entity mention in the discourse, such as whether they are first or subsequently mentioned and on what grounds. This paper presents the evaluation of semantic prototype as input feature for discourse status classification considering Decision Trees as machine learning algorithm. We show that the semantic prototypes improves classification of two specially difficult and scarce classes. Content Type Book ChapterDOI 10.1007/978-3-540-85980-2_28Authors Sandra Collovini, Pontifícia Universidade do Rio Grande do Sul Porto Alegre BrasilLuiz Carlos Ribeiro, Pontifícia Universidade do Rio Grande do Sul Porto Alegre BrasilPatricia Nunes Gonçalves, Pontifícia Universidade do Rio ...
    3. Improving Chinese Pronominal Anaphora Resolution by Extensive Feature Representation and Confidence Estimation

      Pronominal anaphora resolution denotes antecedent identification for anaphoric pronouns expressed in discourses. Effective resolution relies on the kinds of features to be concerned and how they are appropriately weighted at antecedent identification. In this paper, a rich feature set including the innovative discourse features are employed so as to resolve those commonly-used Chinese pronouns in modern Chinese written texts. Moreover, a maximum-entropy based model is presented to estimate the confidence for each antecedent candidate. Experimental results show that our method achieves 83.5% success rate which is better than those obtained by rule-based and SVM-based methods. Content Type Book ChapterDOI ...
    4. Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

      The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 ...
    5. Systems and methods for determining and using interaction models

      BACKGROUND OF THE INVENTION1. Field of InventionThis invention relates to natural language processing.2. Description of Related ArtNatural language speech offers a number of advantages over conventional keyboard, tactile and other interfaces. Natural language interfaces are among the earliest interfaces learned. Natural language interfaces are among the most intuitiveinterfaces for users which may reduce cognitive effort in accomplishing certain tasks.Many command and control systems, knowledge
    6. A Joint Topic and Perspective Model for Ideological Discourse

      Polarizing discussions on political and social issues are common in mass and user-generated media. However, computer-based understanding of ideological discourse has been considered too difficult to undertake. In this paper we propose a statistical model for ideology discourse. By ideology we mean “a set of general beliefs socially shared by a group of people.” For example, Democratic and Republican are two major political ideologies in the United States. The proposed model captures lexical variations due to an ideological text’s topic and due to an author or speaker’s ideological perspective. To cope with the non-conjugacy of the logistic-normal prior ...
    7. Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis

      Abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. In this research, we present a process that attempts to solve the problem of abbreviation ambiguity. Various features have been explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew, which are known to be rich in ambiguous abbreviations. Various variants of the one sense per discourse hypothesis (by varying the scope of discourse) have been implemented. Several common machine learning methods have been tested ...
    8. A Term Association Inference Model for Single Documents: A Stepping Stone for Investigation through Information Extraction

      In this paper, we propose a term association model which extracts significant terms as well as the important regions from a single document. This model is a basis for a systematic form of subjective data analysis which captures the notion of relatedness of different discourse structures considered in the document, without having a predefined knowledge-base. This is a paving stone for investigation or security purposes, where possible patterns need to be figured out from a witness statement or a few witness statements. This is unlikely to be possible in predictive data mining where the system can not work efficiently in ...
    9. Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.

      Related Articles Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction. BMC Bioinformatics. 2008;9 Suppl 3:S9 Authors: Gobeill J, Tbahriti I, Ehrler F, Mottaz A, Veuthey AL, Ruch P BACKGROUND: This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second ...
    10. Computational Representation of Linguistic Structures using Domain-Specific Languages. (arXiv:0805.3366v1 [cs.CL])

      We describe a modular system for generating sentences from formal definitions of underlying linguistic structures using domain-specific languages. The system uses Java in general, Prolog for lexical entries and custom domain-specific languages based on Functional Grammar and Functional Discourse Grammar notation, implemented using the ANTLR parser generator. We show how linguistic and technological parts can be brought together in a natural language processing system and how domain-specific languages can be used as a tool for consistent formal notation in linguistic description.
    11. Exploring a type-theoretic approach to accessibility constraint modelling. (arXiv:0805.3410v1 [cs.CL])

      The type-theoretic modelling of DRT that [degroote06] proposed features continuations for the management of the context in which a clause has to be interpreted. This approach, while keeping the standard definitions of quantifier scope, translates the rules of the accessibility constraints of discourse referents inside the semantic recipes. In this paper, we deal with additional rules for these accessibility constraints. In particular in the case of discourse referents introduced by proper nouns, that negation does not block, and in the case of rhetorical relations that structure discourses. We show how this continuation-based approach applies to those accessibility constraints and how ...
    12. Finding the Best Picture: Cross-Media Retrieval of Content

      We query the pictures of Yahoo! News for persons and objects by using the accompanying news captions as an indexing annotation. Our aim is to find these pictures on top of the answer list in which the sought persons or objects are most prominently present. We demonstrate that an appearance or content model based on syntactic, semantic and discourse analysis of the short news text is only useful for finding the best picture of a person of object if the database contains photos each picturing many entities. In other circumstances a simpler bag-of-nouns representation has a good performance. The appearance ...
    13. A case study of gesture expressivity breaks

      Abstract  In this paper we propose a study of co-verbal gesture expressivity during a conversational interaction. The work is based on the analysis of gesture expressivity over time, that we have conducted on two clips of 2D animations. The first results point out two types of modulations in gesture expressivity that we relate to the rhetorical functions of the discourse. These results extend the knowledge about gesture expressivity from emotion and personality issues to pragmatical ones. An evaluation study is proposed to measure the effects of the modulations. Content Type Journal ArticleDOI 10.1007/s10579-007-9051-7Authors Nicolas Ech Chafai, Université of ...
      Mentions: France Paris
    14. Chinese Abbreviation-Definition Identification: A Svm Approach Using Context Information

      As a special form of unknown words, Chinese abbreviations represent significant problems for Chinese text processing. The goal of this study is to automatically find the definition for a Chinese abbreviation in the context where both the abbreviation and its definition occur, enforcing the constraint of one sense per discourse for an abbreviation. First, the candidate abbreviation-definition pairs are collected, and then a SVM approach using context information is employed to classify candidate abbreviation-definition pairs so that the pairs can be identified. The performance of the approach is evaluated on a manually annotated test corpus, and is also compared with ...
      Mentions: Beijing China Xu Sun
    15. Meeting Structure Annotation

      We describe a generic set of tools for representing, annotating, and analysing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and NOMOS – a flexible and extensible toolkit for browsing and annotating discourse.We describe applications built using the NOMOS framework to facilitate a real annotation task, as well as for visualising and adjusting features for machine learning tasks. We then present a set of hierarchical topic segmentations and action item subdialogues collected over 56 meetings from the ICSI and ISL meeting corpora using our tools. These annotations are designed to support research towards automatic ...
    16. Sense Annotation in the Penn Discourse Treebank

      An important aspect of discourse understanding and generation involves the recognition and processing of discourse relations. These are conveyed by discourse connectives, i.e., lexical items like because and as a result or implicit connectives expressing an inferred discourse relation. The Penn Discourse TreeBank (PDTB) provides annotations of the argument structure, attribution and semantics of discourse connectives. In this paper, we provide the rationale of the tagset, detailed descriptions of the senses with corpus examples, simple semantic definitions of each type of sense tags as well as informal descriptions of the inferences allowed at each level. Content Type Book ChapterDOI ...
    17. Keyphrase Extraction in Scientific Publications

      We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in document with respect to logical sections found in scientific discourse. We also introduce features that capture salient morphological phenomena found in scientific keyphrases, such as whether a candidate keyphrase is an acronyms or uses specific terminologically productive suffixes. We have implemented these features on top of a baseline feature set used by Kea [1]. In our evaluation using a corpus of 120 scientific publications multiply annotated for keyphrases, our system significantly outperformed Kea at the p  Content ...
    18. Processing of inconsistent emotional information: an fMRI study.

      Processing of inconsistent emotional information: an fMRI study. Exp Brain Res. 2007 Dec 20; Authors: Rota G, Veit R, Nardo D, Weiskopf N, Birbaumer N, Dogil G Previous studies investigating the anterior cingulate cortex (ACC) have relied on a number of tasks which involved cognitive control and attentional demands. In this fMRI study, we tested the model that ACC functions as an attentional network in the processing of language. We employed a paradigm that requires the processing of concurrent linguistic information predicting that the cognitive costs imposed by competing trials would engender the activation of ACC. Subjects were confronted with ...
    19. Using discourse analysis to improve text categorization in Medline.

      Related Articles Using discourse analysis to improve text categorization in MEDLINE. Medinfo. 2007;12(Pt 1):710-5 Authors: Ruch P, Geissbühler A, Gobeill J, Lisacek F, Tbahriti I, Veuthey AL, Aronson AR PROBLEM: Automatic keyword assignment has been largely studied in medical informatics in the context of the MEDLINE database, both for helping search in MEDLINE and in order to provide an indicative "gist" of the content of an article. Automatic assignment of Medical Subject Headings (MeSH), which is formally an automatic text categorization task, has been proposed using different methods or combination of methods, including machine learning (na ...
    20. Automatic Evaluation of Text Coherence: Models and Representations

      This paper investigates the automatic evaluation of text coherence for machine-generated texts. We introduce a fully-automatic, linguistically rich model of local coherence that correlates with human judgments. The modeling approach taken relies on shallow text properties and is relatively inexpensive. We present experimental results that comparatively assess the predictive power of various discourse representations prop...
    21. 2006 ACL Lifetime Achievement Award

      One of the highlights of COLING/ACL 2006 was the presentation of the 2006 ACL Lifetime Achievement Award to: Eva Hajicova The following is an excerpt from the introduction made by Jun'ichi Tsujii (president of the ACL) at the COLING/ACL 2006 award ceremony. I first met Eva, 26 years ago, at Coling in Tokyo, 1980. I had just started my career as researcher then, while she was already an established researcher, a member of ICCL (International Committee of Computational Linguistics) and the representative, the banner carrier of the legendary Prague school of linguistics. Prague is the birth place ...
    22. Systems and method for resolving ambiguity

      Techniques are provided for resolving ambiguity in natural language speech. Speech is recognized using automatic speech recognition. A theory of discourse analysis is determined and at least one set of candidate discourse functions is determined based on the theory of discourse analysis. Prosodic features in the speech and a correlation between the prosodic features and the discourse functions is determined. The sets of candidate discourse functions are ranked based on the prosodic features in the speech information and a correlation to the prosodic features expected for the determined discourse functions. Ambiguity is resolved between sets of candidate discourse functions based ...
