    1. Method and system for determining text coherence

      A method and system for determining text coherence in an essay is disclosed. A method of evaluating the coherence of an essay includes receiving an essay having one or more discourse elements and text segments. The one or more discourse elements are annotated either manually or automatically. A text segment vector is generated for each text segment in a discourse element using sparse random indexing vectors. The method or system then identifies one or more essay dimensions and measures the semantic similarity of each text segment based on the essay dimensions. Finally, a coherence level is assigned to the essay ...
    2. Learning Recursive Segments for Discourse Parsing. (arXiv:1003.5372v1 [cs.CL])

      Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse like SDRT allows for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques combined with a simple repairing heuristic that enforces global coherence ...
    3. A Sequential Model for Discourse Segmentation

      Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential ...
    4. Discourse Relations and Document Structure

      This chapter addresses the requirements and linguistic foundations of automatic relational discourse analysis of complex text types such as scientific journal articles. It is argued that besides lexical and grammatical discourse markers, which have traditionally been employed in discourse parsing, cues derived from the logical and generical document structure and the thematic structure of a text must be taken into account. An approach to modelling such types of linguistic information in terms of XML-based multi-layer annotations and to a text-technological representation of additional knowledge sources is presented. By means of quantitative and qualitative corpus analyses, cues and constraints for automatic ...
    5. Motivations and implications of veins theory: a discussion of discourse cohesion

      Abstract  The paper deals with the cohesion part of a model of global discourse interpretation, usually known as Veins Theory (VT). By taking the notion of nuclearity (though ignoring relations), from the Rhetorical Structure Theory, VT computes strings of discourse units, called veins, from which domains of accessibility can be determined for each discourse unit. VT’s constructs best fit with an incremental view on discourse processing. Linguistic observations that lead to the elaboration of the theory are presented. Cognitive aspects like short-term memory and on-line summarization are explained in terms of VT’s constructs. Complementary remarks are made on ...
    6. A Study of the Expressive Possibilities of SK-Languages

      In this chapter we will continue the analysis of the expressive possibilities of SK-languages. The collection of examples considered above doesn’t demonstrate the real power of the constructed mathematical model. That is why let’s consider a number of additional examples in order to illustrate some important possibilities of SK-languages concerning the construction of semantic representations of sentences and discourses and describing the pieces of knowledge about the world. The advantages of the theory of SK-languages in comparison, in particular, with Discourse Representation Theory, Episodic Logic, Theory of Conceptual Graphs, and Database Semantics of Natural Language are set forth ...
    7. A Mathematical Model for Describing Structured Meanings of Natural Language Sentences and Discourses

      The purpose of this chapter is to construct a mathematical model describing a system consisting of ten partial operations on the finite sequences with the elements being structured meanings of Natural Language (NL) expressions. Informally, the goal is to develop a mathematical tool being convenient for building semantic representations both of separate sentences in NL and of complex discourses of arbitrary big length pertaining to technology, medicine, economy, and other fields of professional activity. The starting point for developing this model is the definition of the class of conceptual bases introduced in the previous chapter. The constructed mathematical model includes ...
    8. AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

      Abstract  This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85–89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur ...
    9. Challenges in natural language processing: the case of metaphor (commentary)

      Abstract  This article comments on some ways in which metaphor is relevant to practical language technology, for either text or speech. While the article mentions some deep problems, it nevertheless points out that certain issues are less troublesome than they might appear to be, and that metaphor in real discourse has some characteristics that could help, rather than hinder, practical discourse-processing. The article also mentions the author’s ongoing work on developing a new view of how metaphor and metonymy relate to each other. This view is based on a deconstruction into underlying dimensions. Content Type Journal ArticleDOI 10.1007 ...
    10. Automatic Recognition of the Function of Singular Neuter Pronouns in Texts and Spoken Data

      We describe the results of unsupervised (clustering) and supervised (classification) learning experiments with the purpose of recognising the function of singular neuter pronouns in Danish corpora of written and spoken language. Danish singular neuter pronouns comprise personal and demonstrative pronouns. They are very frequent and have many functions such as non-referential, cataphoric, deictic and anaphoric. The antecedents of discourse anaphoric singular neuter pronouns can be nominal phrases of different gender and number, verbal phrases, adjectival phrases, clauses or discourse segments of different size and they can refer to individual and abstract entities. Danish neuter pronouns occur in more constructions and ...
    11. Systems and methods for hybrid text summarization

      Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse ...
    12. Marking-up multiple views of a Text: Discourse and Reference. (arXiv:0909.2715v1 [cs.CL])

      We describe an encoding scheme for discourse structure and reference, based on the TEI Guidelines and the recommendations of the Corpus Encoding Specification (CES). A central feature of the scheme is a CES-based data architecture enabling the encoding of and access to multiple views of a marked-up document. We describe a tool architecture that supports the encoding scheme, and then show how we have used the encoding scheme and the tools to perform a discourse analytic task in support of a model of global discourse cohesion called Veins Theory (Cristea & Ide, 1998).
    13. Reference Resolution within the Framework of Cognitive Grammar. (arXiv:0909.2626v1 [cs.CL])

      Following the principles of Cognitive Grammar, we concentrate on a model for reference resolution that attempts to overcome the difficulties previous approaches, based on the fundamental assumption that all reference (independent on the type of the referring expression) is accomplished via access to and restructuring of domains of reference rather than by direct linkage to the entities themselves. The model accounts for entities not explicitly mentioned but understood in a discourse, and enables exploitation of discursive and perceptual context to limit the set of potential referents for a given referring expression. As the most important feature, we note that a ...
    14. Visual Analysis of Public Discourse on Environmental Issues

      The public discourse on environmental issues employs the news media and the emerging consumer generated media as its primary communication channels. Analyzing the use of these channels by the various discourse participants yields valuable insight into the status of opinion formation on environmental problems. This chapter outlines common methods for the monitoring and visualization of public discourse in the news media and it proposes requirements for the application of such methods to environmental discourse. The integration of geospatial visualizations with semantic dimensions and numeric data is identified as the key challenge in visualizing public discourse on environmental issues. A showcase ...
    15. A Noisy-Channel Model for Document Compression. (arXiv:0907.0806v1 [cs.CL])

      We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. The system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. Our results support the claim that discourse knowledge plays an ...
    16. Systems and methods for determining predictive models of discourse functions

      Techniques are provided for determining predictive models of discourse functions based on prosodic features of natural language speech. Inter and intra sentential discourse functions in a training corpus of natural language speech utterances are determined. The discourse functions are clustered. The exemplary prosodic features associated with each type of discourse function are determined. Machine learning, observation and the like are used to determine a subset of prosodic features associated with each type of discourse function useful in predicting the likelihood of each type of discourse function.
    17. Applying the Discourse Theory to the Moderator’s Interferences in Web Debates

      This paper presents a methodology for supporting the moderation phase in DCC (Democratic Citizenship Community), a virtual community for supporting e-democratic processes in e-life systems and applications. Based on the Government-Citizen Interactive Model, the DCC encompasses an innovative debate structure, as well as the moderator’s participation based on Discourse Theory, specially concerning argumentative mistakes. Concerning the moderator’s role, efforts have been made in order to improve the formalization of arguments and opinions while maintaining the usability of the platform. This research focuses on the moderator’s participation via a case study and the experiment is analyzed in a ...
    18. Capturing Text Semantics for Concept Detection in News Video

      The overwhelming amounts of multimedia contents have triggered the need for automatic semantic concept detection. However, as there are large variations in the visual feature space, text from automatic speech recognition (ASR) has been extensively used and found to be effective to complement visual features in the concept detection task. Generally, there are two common text analysis methods. One is text classification and the other is text retrieval. Both methods have their own strengths and weaknesses. In addition, fusion of text and visual analysis is still an open problem. In this paper, we present a novel multiresolution, multisource and multimodal ...
    19. Opportunities for Natural Language Processing Research in Education

      This paper discusses emerging opportunities for natural language processing (NLP) researchers in the development of educational applications for writing, reading and content knowledge acquisition. A brief historical perspective is provided, and existing and emerging technologies are described in the context of research related to content, syntax, and discourse analyses. Two systems, e-rater® and Text Adaptor, are discussed as illustrations of NLP-driven technology. The development of each system is described, as well as how continued development provides significant opportunities for NLP research. Content Type Book ChapterDOI 10.1007/978-3-642-00382-0_2Authors Jill Burstein, Educational Testing Service Rosedale Road, MS 12R Princeton New Jersey ...
    20. The SWAN biomedical discourse ontology.

      Related Articles The SWAN biomedical discourse ontology. J Biomed Inform. 2008 Oct;41(5):739-51 Authors: Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, Clark T Developing cures for highly complex diseases, such as neurodegenerative disorders, requires extensive interdisciplinary collaboration and exchange of biomedical information in context. Our ability to exchange such information across sub-specialties today is limited by the current scientific knowledge ecosystem's inability to properly contextualize and integrate data and discourse in machine-interpretable form. This inherently limits the productivity of research and the progress toward cures for devastating diseases such as Alzheimer's ...
    21. Accumulation

      Accumulation, it could be said, is a reconstruction of the past in a way that suits the current situation. As the epigraph by Judith Butler relays, bringing back to the present time a discourse that recalls or reconstructs a language that was part of the narration brings the authority of the prior discourse as the condition through which it is possible to construct a new narrative. It is this idea of citation that Butler uses to argue how bodies become gendered through the repeated performance of gender. One common mistake that many theorists make of Butler’s performative theory of ...
    12. (1 articles) Regina Barzilay