    1. Systems and methods for hybrid text summarization

      Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse ...
    2. Marking-up multiple views of a Text: Discourse and Reference. (arXiv:0909.2715v1 [cs.CL])

      We describe an encoding scheme for discourse structure and reference, based on the TEI Guidelines and the recommendations of the Corpus Encoding Specification (CES). A central feature of the scheme is a CES-based data architecture enabling the encoding of and access to multiple views of a marked-up document. We describe a tool architecture that supports the encoding scheme, and then show how we have used the encoding scheme and the tools to perform a discourse analytic task in support of a model of global discourse cohesion called Veins Theory (Cristea & Ide, 1998).
    3. Reference Resolution within the Framework of Cognitive Grammar. (arXiv:0909.2626v1 [cs.CL])

      Following the principles of Cognitive Grammar, we concentrate on a model for reference resolution that attempts to overcome the difficulties previous approaches, based on the fundamental assumption that all reference (independent on the type of the referring expression) is accomplished via access to and restructuring of domains of reference rather than by direct linkage to the entities themselves. The model accounts for entities not explicitly mentioned but understood in a discourse, and enables exploitation of discursive and perceptual context to limit the set of potential referents for a given referring expression. As the most important feature, we note that a ...
    4. Visual Analysis of Public Discourse on Environmental Issues

      The public discourse on environmental issues employs the news media and the emerging consumer generated media as its primary communication channels. Analyzing the use of these channels by the various discourse participants yields valuable insight into the status of opinion formation on environmental problems. This chapter outlines common methods for the monitoring and visualization of public discourse in the news media and it proposes requirements for the application of such methods to environmental discourse. The integration of geospatial visualizations with semantic dimensions and numeric data is identified as the key challenge in visualizing public discourse on environmental issues. A showcase ...
    5. A Noisy-Channel Model for Document Compression. (arXiv:0907.0806v1 [cs.CL])

      We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. The system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. Our results support the claim that discourse knowledge plays an ...
    6. Systems and methods for determining predictive models of discourse functions

      Techniques are provided for determining predictive models of discourse functions based on prosodic features of natural language speech. Inter and intra sentential discourse functions in a training corpus of natural language speech utterances are determined. The discourse functions are clustered. The exemplary prosodic features associated with each type of discourse function are determined. Machine learning, observation and the like are used to determine a subset of prosodic features associated with each type of discourse function useful in predicting the likelihood of each type of discourse function.
    7. Applying the Discourse Theory to the Moderator’s Interferences in Web Debates

      This paper presents a methodology for supporting the moderation phase in DCC (Democratic Citizenship Community), a virtual community for supporting e-democratic processes in e-life systems and applications. Based on the Government-Citizen Interactive Model, the DCC encompasses an innovative debate structure, as well as the moderator’s participation based on Discourse Theory, specially concerning argumentative mistakes. Concerning the moderator’s role, efforts have been made in order to improve the formalization of arguments and opinions while maintaining the usability of the platform. This research focuses on the moderator’s participation via a case study and the experiment is analyzed in a ...
    8. Capturing Text Semantics for Concept Detection in News Video

      The overwhelming amounts of multimedia contents have triggered the need for automatic semantic concept detection. However, as there are large variations in the visual feature space, text from automatic speech recognition (ASR) has been extensively used and found to be effective to complement visual features in the concept detection task. Generally, there are two common text analysis methods. One is text classification and the other is text retrieval. Both methods have their own strengths and weaknesses. In addition, fusion of text and visual analysis is still an open problem. In this paper, we present a novel multiresolution, multisource and multimodal ...
    9. Opportunities for Natural Language Processing Research in Education

      This paper discusses emerging opportunities for natural language processing (NLP) researchers in the development of educational applications for writing, reading and content knowledge acquisition. A brief historical perspective is provided, and existing and emerging technologies are described in the context of research related to content, syntax, and discourse analyses. Two systems, e-rater® and Text Adaptor, are discussed as illustrations of NLP-driven technology. The development of each system is described, as well as how continued development provides significant opportunities for NLP research. Content Type Book ChapterDOI 10.1007/978-3-642-00382-0_2Authors Jill Burstein, Educational Testing Service Rosedale Road, MS 12R Princeton New Jersey ...
    10. The SWAN biomedical discourse ontology.

      Related Articles The SWAN biomedical discourse ontology. J Biomed Inform. 2008 Oct;41(5):739-51 Authors: Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, Clark T Developing cures for highly complex diseases, such as neurodegenerative disorders, requires extensive interdisciplinary collaboration and exchange of biomedical information in context. Our ability to exchange such information across sub-specialties today is limited by the current scientific knowledge ecosystem's inability to properly contextualize and integrate data and discourse in machine-interpretable form. This inherently limits the productivity of research and the progress toward cures for devastating diseases such as Alzheimer's ...
    11. Accumulation

      Accumulation, it could be said, is a reconstruction of the past in a way that suits the current situation. As the epigraph by Judith Butler relays, bringing back to the present time a discourse that recalls or reconstructs a language that was part of the narration brings the authority of the prior discourse as the condition through which it is possible to construct a new narrative. It is this idea of citation that Butler uses to argue how bodies become gendered through the repeated performance of gender. One common mistake that many theorists make of Butler’s performative theory of ...
    12. Deep Syntactic Analysis and Rule Based Accentuation in Text-to-Speech Synthesis

      With the emergence of the HMM-synthesis paradigm, producing natural, expressive prosody has become viable in speech synthesis. This paper describes the development of rule-based prominence prediction model for Finnish Text-to-Speech system, based on deep syntactic analysis and discourse structure. Content Type Book ChapterDOI 10.1007/978-3-540-87391-4_68Authors Antti Suni, University of Helsinki Department of Speech Sciences FinlandMartti Vainio, University of Helsinki Department of Speech Sciences Finland Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5246/2008 Book Text, Speech and DialogueDOI 10.1007/978-3-540-87391-4Print ISBN 978-3-540-87390-7
    13. Using Semantic Prototypes for Discourse Status Classification

      Discourse status is related to different aspects of entity mention in the discourse, such as whether they are first or subsequently mentioned and on what grounds. This paper presents the evaluation of semantic prototype as input feature for discourse status classification considering Decision Trees as machine learning algorithm. We show that the semantic prototypes improves classification of two specially difficult and scarce classes. Content Type Book ChapterDOI 10.1007/978-3-540-85980-2_28Authors Sandra Collovini, Pontifícia Universidade do Rio Grande do Sul Porto Alegre BrasilLuiz Carlos Ribeiro, Pontifícia Universidade do Rio Grande do Sul Porto Alegre BrasilPatricia Nunes Gonçalves, Pontifícia Universidade do Rio ...
    14. Improving Chinese Pronominal Anaphora Resolution by Extensive Feature Representation and Confidence Estimation

      Pronominal anaphora resolution denotes antecedent identification for anaphoric pronouns expressed in discourses. Effective resolution relies on the kinds of features to be concerned and how they are appropriately weighted at antecedent identification. In this paper, a rich feature set including the innovative discourse features are employed so as to resolve those commonly-used Chinese pronouns in modern Chinese written texts. Moreover, a maximum-entropy based model is presented to estimate the confidence for each antecedent candidate. Experimental results show that our method achieves 83.5% success rate which is better than those obtained by rule-based and SVM-based methods. Content Type Book ChapterDOI ...
    15. Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

      The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 ...
    16. Systems and methods for determining and using interaction models

      BACKGROUND OF THE INVENTION1. Field of InventionThis invention relates to natural language processing.2. Description of Related ArtNatural language speech offers a number of advantages over conventional keyboard, tactile and other interfaces. Natural language interfaces are among the earliest interfaces learned. Natural language interfaces are among the most intuitiveinterfaces for users which may reduce cognitive effort in accomplishing certain tasks.Many command and control systems, knowledge
    17. A Joint Topic and Perspective Model for Ideological Discourse

      Polarizing discussions on political and social issues are common in mass and user-generated media. However, computer-based understanding of ideological discourse has been considered too difficult to undertake. In this paper we propose a statistical model for ideology discourse. By ideology we mean “a set of general beliefs socially shared by a group of people.” For example, Democratic and Republican are two major political ideologies in the United States. The proposed model captures lexical variations due to an ideological text’s topic and due to an author or speaker’s ideological perspective. To cope with the non-conjugacy of the logistic-normal prior ...
    18. Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis

      Abbreviations are very common and are widely used in both written and spoken language. However, they are not always explicitly defined and in many cases they are ambiguous. In this research, we present a process that attempts to solve the problem of abbreviation ambiguity. Various features have been explored, including context-related methods and statistical methods. The application domain is Jewish Law documents written in Hebrew, which are known to be rich in ambiguous abbreviations. Various variants of the one sense per discourse hypothesis (by varying the scope of discourse) have been implemented. Several common machine learning methods have been tested ...
    19. A Term Association Inference Model for Single Documents: A Stepping Stone for Investigation through Information Extraction

      In this paper, we propose a term association model which extracts significant terms as well as the important regions from a single document. This model is a basis for a systematic form of subjective data analysis which captures the notion of relatedness of different discourse structures considered in the document, without having a predefined knowledge-base. This is a paving stone for investigation or security purposes, where possible patterns need to be figured out from a witness statement or a few witness statements. This is unlikely to be possible in predictive data mining where the system can not work efficiently in ...
    20. Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.

      Related Articles Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction. BMC Bioinformatics. 2008;9 Suppl 3:S9 Authors: Gobeill J, Tbahriti I, Ehrler F, Mottaz A, Veuthey AL, Ruch P BACKGROUND: This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second ...
    21. Computational Representation of Linguistic Structures using Domain-Specific Languages. (arXiv:0805.3366v1 [cs.CL])

      We describe a modular system for generating sentences from formal definitions of underlying linguistic structures using domain-specific languages. The system uses Java in general, Prolog for lexical entries and custom domain-specific languages based on Functional Grammar and Functional Discourse Grammar notation, implemented using the ANTLR parser generator. We show how linguistic and technological parts can be brought together in a natural language processing system and how domain-specific languages can be used as a tool for consistent formal notation in linguistic description.
    22. Exploring a type-theoretic approach to accessibility constraint modelling. (arXiv:0805.3410v1 [cs.CL])

      The type-theoretic modelling of DRT that [degroote06] proposed features continuations for the management of the context in which a clause has to be interpreted. This approach, while keeping the standard definitions of quantifier scope, translates the rules of the accessibility constraints of discourse referents inside the semantic recipes. In this paper, we deal with additional rules for these accessibility constraints. In particular in the case of discourse referents introduced by proper nouns, that negation does not block, and in the case of rhetorical relations that structure discourses. We show how this continuation-based approach applies to those accessibility constraints and how ...
