    1. A Summarization Strategy of Chinese News Discourse

      Due to the problem of information overloading, automatic text summarization is becoming more and more necessary. This paper proposes a strategy for Chinese news discourse summarization based on veins theory. This method can produce a summary of an original text without requiring its full semantic interpretation, but instead relying on the discourse structure. Content Type Book ChapterPages 389-394DOI 10.1007/978-3-642-28308-6_53Authors Deliang Wang, School of Foreign Languages and Literatures, Beijing Normal University, Beijing, China Book Series Advances in Intelligent and Soft ComputingPrint ISSN 1867-5662 Book Series Volume Volume 145/2012 Book Proceedings of the 2011 2nd International Congress on Computer ...
      Mentions: Beijing China
    2. Contested Collective Intelligence: Rationale, Technologies, and a Human-Machine Annotation Study

      Abstract  We propose the concept of Contested Collective Intelligence (CCI) as a distinctive subset of the broader Collective Intelligence design space. CCI is relevant to the many organizational contexts in which it is important to work with contested knowledge, for instance, due to different intellectual traditions, competing organizational objectives, information overload or ambiguous environmental signals. The CCI challenge is to design sociotechnical infrastructures to augment such organizational capability. Since documents are often the starting points for contested discourse, and discourse markers provide a powerful cue to the presence of claims, contrasting ideas and argumentation, discourse and rhetoric provide an annotation ...
    3. Processing Coordinated Structures in PENG Light

      PENG Light is a controlled natural language designed to write unambiguous specifications that can be translated automatically via discourse representation structures into a formal target language. Instead of writing axioms in a formal language, an author writes a specification and the associated background axioms directly in controlled natural language. In this paper, we first review the controlled natural language PENG Light and show how a discourse representation structure is generated for sentences written in PENG Light. We then discuss two different solutions of how discourse representation structures can be implemented for coordinated structures. Finally, we show how an efficient implementation ...
    4. Discourse Segmentation for Sentence Compression

      Earlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization systems. However, there are few studies that consider sentence-level discourse segmentation for compression task; to our knowledge, none in Spanish. In this paper, we study the relationship between discourse segmentation and compression for sentences in Spanish. We use a discourse segmenter and observe to what extent the passages deleted by annotators fit in discourse structures detected by the system. The main idea is to ...
      Mentions: Pompeu Fabra
    5. Processing Text-Technological Resources in Discourse Parsing

      Discourse parsing of complex text types such as scientific research articles requires the analysis of an input document on linguistic and structural levels that go beyond traditionally employed lexical discourse markers. This chapter describes a text-technological approach to discourse parsing. Discourse parsing with the aim of providing a discourse structure is seen as the addition of a new annotation layer for input documents marked up on several linguistic annotation levels. The discourse parser generates discourse structures according to the Rhetorical Structure Theory. An overview of the knowledge sources and components for parsing scientific journal articles is given. The parser’s ...
    6. The biomedical discourse relation bank.

      The biomedical discourse relation bank. BMC Bioinformatics. 2011;12:188 Authors: Prasad R, McRoy S, Frid N, Joshi A, Yu H Abstract BACKGROUND: Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource. RESULTS: We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 ...
    7. Learning Content Selection Rules for Generating Object Descriptions in Dialogue. (arXiv:1109.2136v1 [cs.CL])

      A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of models have been proposed. In this paper, we use the annotated COCONUT corpus of task-oriented design dialogues to develop feature sets based on Dale and Reiters (1995) incremental model, Brennan and Clarks (1996) conceptual pact model, and Jordans (2000b) intentional influences model, and use these feature sets in a machine learning experiment to ...
    8. On the Explicit and Implicit Spatiotemporal Architecture of Narratives of Personal Experience

      Expanding on recent research into the predictability of explicit linguistic spatial information relative to features of discourse structure, we present the results of several machine learning studies which leverage rhetorical relations, events, temporal information, text sequence, and both explicit and implicit linguistic spatial information in three different corpora of narrative discourses. On average, classifiers predict figure, ground, spatial verb and preposition and frame of reference to 75% accuracy, rhetorical relations to 72% accuracy, and events to 76% accuracy (all values have statistical significance above majority class baselines). These results hold independent of the number of authors, subject matter, length and ...
    9. Event in Compositional Dynamic Semantics. (arXiv:1108.5017v1 [cs.CL])

      We present a framework which constructs an event-style dis- course semantics. The discourse dynamics are encoded in continuation semantics and various rhetorical relations are embedded in the resulting interpretation of the framework. We assume discourse and sentence are distinct semantic objects, that play different roles in meaning evalua- tion. Moreover, two sets of composition functions, for handling different discourse relations, are introduced. The paper first gives the necessary background and motivation for event and dynamic semantics, then the framework with detailed examples will be introduced.
      Mentions: Loria
    11. From Connectives to Argumentative Markers: A Quest for Markers of Argumentative Moves and of Related Aspects of Argumentative Discourse

      Abstract  In this paper, I explore the potential of systematically studying the linguistic surface of discourse for the purposes of identifying markers of argumentative moves and other related categories, such as types of arguments and argumentative strategies. Such a list of argumentative markers can prove useful for the (semi)automatic treatment of a large corpus of texts. After reviewing literature on the linguistic realization of argumentative moves as well as literature on the subject of discourse markers, it becomes clear that the search for representative items of argumentative markers cannot be restricted to those elements marking relations but that it ...
      Mentions: France Paris
    12. Towards a Discourse-driven Taxonomic Inference Model

      This chapter describes ongoing work, the goal of which is to create a discourse-driven inference model, as well as to construct resources using such a model. The data process consists of texts from two encyclopedias of the medical domain–stylistic properties characteristic of encyclopedia entries constitute the mechanisms underlying the inference model, such as layout-based features alongside with semantic (conceptual) document structuring. Three parts of the model are explained in detail, providing experimental results that are based on language processing techniques: (i) identifying taxonomic document structure by machine learning; (ii) discourse-driven construction of text–hypothesis pairs for examining types of ...
    13. FootbOWL: Using a Generic Ontology of Football Competition for Planning Match Summaries

      We present a two-layer OWL ontology-based Knowledge Base (KB) that allows for flexible content selection and discourse structuring in Natural Language text Generation (NLG) and discuss its use for these two tasks. The first layer of the ontology contains an application-independent base ontology. It models the domain and was not designed with NLG in mind. The second layer, which is added on top of the base ontology, models entities and events that can be inferred from the base ontology, including inferable logico-semantic relations between individuals. The nodes in the KB are weighted according to learnt models of content selection, such ...
    14. Identifying discourse connectives in biomedical text.

      Identifying discourse connectives in biomedical text. AMIA Annu Symp Proc. 2010;2010:657-61 Authors: Ramesh BP, Yu H Discourse connectives are words or phrases that connect or relate two coherent sentences or phrases and indicate the presence of discourse relations. Automatic recognition of discourse connectives may benefit many natural language processing applications. In this pilot study, we report the development of the supervised machine-learning classifiers with conditional random fields (CRFs) for automatically identifying discourse connectives in full-text biomedical articles. Our first classifier was trained on the open-domain 1 million token Penn Discourse Tree Bank (PDTB). We performed cross validation on ...
    15. Comparing Approaches to Tag Discourse Relations

      It is widely accepted that in a text, sentences and clauses cannot be understood in isolation but in relation with each other through discourse relations that may or may not be explicitly marked. Discourse relations have been found useful in many applications such as machine translation, text summarization, and question answering; however, they are often not considered in computational language applications because domain and genre independent robust discourse parsers are very few. In this paper, we analyze existing approaches to identify five discourse relations automatically (namely, comparison, contingency, illustration, attribution, and topic-opinion), and propose a new approach to identify attributive ...
    16. Semi-supervised Discourse Relation Classification with Structural Learning

      The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a time-consuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classification based on Structural Learning. First, we solve a set of auxiliary classification problems using unlabeled data. Second, the learned classifiers are used to extend feature vectors to train a discourse relation classifier. By defining a relevant set of auxiliary classification problems, we show that the ...
    17. The influence of global discourse on lexical ambiguity resolution

      Abstract  The influence of global discourse on the resolution of lexical ambiguity was examined in a series of naming experiments. Two-sentence passages were constructed to bias either the dominant or the subordinate meaning of a homonym that was embedded in a locally ambiguous sentence. The results provided evidence for the immediate (0-msec interstimulus interval) resolution of lexical ambiguity and were subsequently replicated in Experiment 2, in which an 80-msec stimulus onset asynchrony exposure duration was employed for the homonyms. Strong dominant and subordinate biased discourse contexts activated only the contextually appropriate sense of a homonym. In Experiment 3, each sentence ...
    18. Categorial Minimalist Grammar. (arXiv:1012.2661v1 [cs.CL])

      We first recall some basic notions on minimalist grammars and on categorial grammars. Next we shortly introduce partially commutative linear logic, and our representation of minimalist grammars within this categorial system, the so-called categorial minimalist grammars. Thereafter we briefly present \lambda\mu-DRT (Discourse Representation Theory) an extension of \lambda-DRT (compositional DRT) in the framework of \lambda\mu calculus: it avoids type raising and derives different readings from a single semantic representation, in a setting which follows discourse structure. We run a complete example which illustrates the various structures and rules that are needed to derive a semantic representation from the ...
    19. A Discourse and Dialogue Infrastructure for Industrial Dissemination

      We think that modern speech dialogue systems need a prior usability analysis to identify the requirements for industrial applications. In addition, work from the area of the Semantic Web should be integrated. These requirements can then be met by multimodal semantic processing, semantic navigation, interactive semantic mediation, user adaptation/personalisation, interactive service composition, and semantic output representation which we will explain in this paper.We will also describe the discourse and dialogue infrastructure these components develop and provide two examples of disseminated industrial prototypes. Content Type Book ChapterDOI 10.1007/978-3-642-16202-2_12Authors Daniel Sonntag, German Research Center for AI (DFKI), Stuhlsatzenhausweg ...
    20. Automated annotation

      To automatically annotate an essay, a sentence of the essay is identified and a feature associated with the sentence is determined. In addition, a probability of the sentence being a discourse element is determined by mapping the feature to a model. The model having been generated by a machine learning application based on at least one annotated essay. Furthermore, the essay is annotated based on the probability.
    21. Why Discourse Structure?

      I come from a strong lineage of discourse folks. Writing a parser for Rhetorical Structure Theory was one of the first class projects I had when I was a grad student. Recently, with the release of the Penn Discourse Treebank, there has been a bit of a flurry of interest in this problem (I had some snarky comments right after ACL about this). I've also talked about why this is a hard problem, but never really about why it is an interesting problem.My thinking about discourse has changed a lot over the years. My current thinking about it ...
