1. Articles in category: WSD

    73-96 of 353 « 1 2 3 4 5 6 7 ... 13 14 15 »
    1. FASTSUBS: An Efficient Admissible Algorithm for Finding the Most Likely Lexical Substitutes Using a Statistical Language Model. (arXiv:1205.5407v1 [cs.CL])

      Lexical substitutes have found use in the context of word sense disambiguation, unsupervised part-of-speech induction, paraphrasing, machine translation, and text simplification. Using a statistical language model to find the most likely substitutes in a given context is a successful approach, but the cost of a naive algorithm is proportional to the vocabulary size. This paper presents the Fastsubs algorithm which can efficiently and correctly identify the most likely lexical substitutes for a given context based on a statistical language model without going through most of the vocabulary. The efficiency of Fastsubs makes large scale experiments based on lexical substitutes feasible ...
      Read Full Article
      Mentions: Penn Treebank WSJ
    2. An RDF-Based Model for Linguistic Annotation

      This paper proposes the application of the RDF framework to the representation of linguistic annotations. We argue that RDF is a suitable data model to capture multiple annotations on the same text segment, and to integrate multiple layers of annotations. As well as using RDF for this purpose, the main contribution of the paper is an OWL ontology, called TELIX (Text Encoding and Linguistic Information eXchange), which models annotation content. This ontology builds on the SKOS XL vocabulary, a W3C standard for representation of lexical entities as RDF graphs. We extend SKOS XL in order to capture lexical relations between ...
      Read Full Article
      Mentions: France RDF Skos
    3. LODifier: Generating Linked Data from Unstructured Text

      The automated extraction of information from text and its transformation into a formal description is an important goal in both Semantic Web research and computational linguistics. The extracted information can be used for a variety of tasks such as ontology generation, question answering and information retrieval. LODifier is an approach that combines deep semantic analysis with named entity recognition, word sense disambiguation and controlled Semantic Web vocabularies in order to extract named entities and relations between them from text and to convert them into an RDF representation which is linked to DBpedia and WordNet. We present the architecture of our ...
      Read Full Article
    4. Schema - An Algorithm for Automated Product Taxonomy Mapping in E-commerce

      This paper proposes SCHEMA, an algorithm for automated mapping between heterogeneous product taxonomies in the e-commerce domain. SCHEMA utilises word sense disambiguation techniques, based on the ideas from the algorithm proposed by Lesk, in combination with the semantic lexicon WordNet. For finding candidate map categories and determining the path-similarity we propose a node matching function that is based on the Levenshtein distance. The final mapping quality score is calculated using the Damerau-Levenshtein distance and a node-dissimilarity penalty. The performance of SCHEMA was tested on three real-life datasets and compared with PROMPT and the algorithm proposed by Park & Kim. It is ...
      Read Full Article
    5. Exploiting domain information for Word Sense Disambiguation of medical documents.

      Exploiting domain information for Word Sense Disambiguation of medical documents. J Am Med Inform Assoc. 2012 Mar-Apr;19(2):235-40 Authors: Stevenson M, Agirre E, Soroa A Abstract OBJECTIVE: Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. DESIGN: The authors proposed and implemented several methods to extract lists of key terms ...
      Read Full Article
      Mentions: WSD
    6. Improving the Efficiency of Document Clustering and Labeling Using Modified FPF Algorithm

      Document clustering is an effective tool to manage information overload. By grouping similar documents together, we enable a human observer to quickly browse large document collections, make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques related to text information retrieval: document pre-processing and filtering, word sense disambiguation, Further we present text clustering using Modified FPF algorithm and comparison of our clustering algorithms against FPF, which is the most used algorithm in the text clustering context. Further we introduce the problem of cluster labeling: Cluster labeling is achieved ...
      Read Full Article
    7. Creating a system for lexical substitutions from scratch using crowdsourcing

      Abstract  This article describes the creation and application of the Turk Bootstrap Word Sense Inventory for 397 frequent nouns, which is a publicly available resource for lexical substitution. This resource was acquired using Amazon Mechanical Turk. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then, more contexts are collected. Contexts that cannot be assigned to a current target word’s sense inventory re-enter the bootstrapping loop and get a supply of substitutions. This process yields a sense inventory with its granularity determined by substitutions as opposed to psychologically ...
      Read Full Article
    8. Identifying Concepts on Specific Domain by a Unsupervised Graph-Based Approach

      This paper presents an unsupervised approach to Word Sense Disambiguation on a specific domain to automatically to assign the right sense to a given ambiguous word. The approach proposed relies on integration of two source information: context and semantic similarity information. The experiments were carried on English test data of SemEval 2010 and evaluated with a variety of measures that analyze the connectivity of graph structure. The obtained result were evaluated using precision and recall measures and compared with the results of SemEval 2010 the approach is currently under test with another semantic similarity measures, preliminary results look promising. Content ...
      Read Full Article
    9. Trends in word sense disambiguation

      Abstract  The problem and process of identifying the meaning of a word as per its usage context is called word sense disambiguation (WSD). Although research in this field has been ongoing for the past forty years, a distinct change of techniques adopted can be observed over time. Two important parameters govern the direction in which WSD research progresses during any period. These are the underlying requirement of the kind of sense disambiguation, or the domain, and the robustness of available knowledge in the form of corpora or dictionaries. This paper surveys the progress of WSD over time and the important ...
      Read Full Article
    10. A Query Language for WordNet-Like Lexical Databases

      WordNet-like lexical databases are used in many natural language processing tasks, such as word sense disambiguation, information extraction and sentiment analysis. The paper discusses the problem of querying such databases. The types of queries specific to WordNet-like databases are analyzed and previous approaches that were undertaken to query wordnets are discussed. A query language which incorporates data types and syntactic constructs based on concepts that form the core of a WordNet-like database (synsets, word senses, semantic relations, etc.) is proposed as a new solution to the problem of querying wordnets. Content Type Book ChapterPages 436-445DOI 10.1007/978-3-642-28493-9_46Authors Marek Kubis ...
      Read Full Article
    11. A Graph-Based Method to Improve WordNet Domains

      WordNet Domains (WND) is a lexical resource where synsets have been semi-automatically annotated with one or more domain labels from a set of 170 hierarchically organized domains. The uses of WND include the power to reduce the polysemy degree of the words, grouping those senses that belong to the same domain. This paper presents a novel automatic method to propagate domain information through WordNet. We compare both labellings (the original and the new one) allowing us to detect anomalies in the original WND labels. We also compare the quality of both resources (the original labelling and the new one) in ...
      Read Full Article
      Mentions: Chile
    12. A Cognitive Approach to Word Sense Disambiguation

      An unsupervised, knowledge-based, parametric approach to Word Sense Disambiguation is proposed based on the well-known cognitive architecture ACT-R. In this work, the target word is disambiguated based on surrounding context words using an accumulator model of memory search and it is realized by incorporating RACE/A with ACT-R 6.0. In this process, a spreading activation network is built following the strategies of Tsatsaronis et al. proposed in [5] using the chunks and their relations in the declarative memory system of ACT-R and the lexical representation has been achieved by integrating WordNet with the cognitive architecture. The resulting Word Sense ...
      Read Full Article
    13. A graph-Based Approach to WSD Using Relevant Semantic Trees and N-Cliques Model

      In this paper we propose a new graph-based approach to solve semantic ambiguity using a semantic net based on WordNet. Our proposal uses an adaptation of the Clique Partitioning Technique to extract sets of strongly related senses. For that, an initial graph is obtained from senses of WordNet combined with the information of several semantic categories from different resources: WordNet Domains, SUMO and WordNet Affect. In order to obtain the most relevant concepts in a sentence we use the Relevant Semantic Trees method. The evaluation of the results has been conducted using the test data set of Senseval-2 obtaining promising ...
      Read Full Article
    14. A Quick Tour of Word Sense Disambiguation, Induction and Related Approaches

      Word Sense Disambiguation (WSD) and Word Sense Induction (WSI) are two fundamental tasks in Natural Language Processing (NLP), i.e., those of, respectively, automatically assigning meaning to words in context from a predefined sense inventory and discovering senses from text for a given input word. The two tasks have generally been hard to perform with high accuracy. However, today innovations in approach to WSD and WSI are promising to open up many interesting new horizons in NLP and Information Retrieval applications. This paper is a quick tour on how to start doing research in this exciting field and suggests the ...
      Read Full Article
    15. Unsupervised word sense disambiguation with N-gram features

      Abstract  The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Naïve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are “helping” a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature ...
      Read Full Article
    16. Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.

      Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts. BMC Bioinformatics. 2011;12:355 Authors: Plaza L, Jimeno-Yepes AJ, Díaz A, Aronson AR Abstract BACKGROUND: Word sense disambiguation (WSD) attempts to solve lexical ambiguities by identifying the correct meaning of a word based on its context. WSD has been demonstrated to be an important step in knowledge-based approaches to automatic summarization. However, the correlation between the accuracy of the WSD methods and the summarization performance has never been studied. RESULTS: We present three existing knowledge-based WSD approaches and a graph-based summarizer. Both the WSD ...
      Read Full Article
    17. Efficient print operations

      A method, apparatus, and product for reducing resource footprints for printer operation outputs, comprising: specifying a print job rendering criteria; receiving a request to print a print job having a page number amount; and automatically selecting and printing a portion of the print job as a function of the print job rendering criteria, wherein the portion has a page number amount smaller than the print job page number amount. The print job rendering criteria may comprise a set maximum number of pages to print during one printing session, where the portion to print is less than or equal to the ...
      Read Full Article
      Mentions: I/O
    18. Automated non-alphanumeric symbol resolution in clinical texts.

      Automated non-alphanumeric symbol resolution in clinical texts. AMIA Annu Symp Proc. 2011;2011:979-86 Authors: Moon S, Pakhomov S, Ryan J, Melton GB Abstract Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols ('+', '-', '/', and '#') were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated ...
      Read Full Article
    19. A Semi-supervised Approach for Key-Synset Extraction to Be Used in Word Sense Disambiguation

      Nowadays, although many researches is being done in the field of word sense disambiguation in some languages like English, still some other languages like Persian have many things to be done. Some difficulties are in this way which might have made it less interactive for researchers. For example, Persian WordNet or FarsNet is newly developed and there is no sense tagged corpus developed based on it yet. So we propose a semi-supervised approach for extending FarsNet with some new relations and then use it for WSD. Also a method to extract semantic keywords or key-concepts from textual documents is used ...
      Read Full Article
    20. Word sense disambiguation as a traveling salesman problem

      Abstract  Word sense disambiguation (WSD) is a difficult problem in Computational Linguistics, mostly because of the use of a fixed sense inventory and the deep level of granularity. This paper formulates WSD as a variant of the traveling salesman problem (TSP) to maximize the overall semantic relatedness of the context to be disambiguated. Ant colony optimization, a robust nature-inspired algorithm, was used in a reinforcement learning manner to solve the formulated TSP. We propose a novel measure based on the Lesk algorithm and Vector Space Model to calculate semantic relatedness. Our approach to WSD is comparable to state-of-the-art knowledge-based and ...
      Read Full Article
    21. Unsupervised Part-of-Speech Tagging

      In this chapter, homogeneity with respect to syntactic word classes (partsof- speech, POS) is aimed at. The method presented in this section is called unsupervised POS-tagging, as its application results in corpus annotation in a comparable way to what POS-taggers provide. Nevertheless, its application results in slightly different categories as opposed to what is assumed by a linguistically motivated POS-tagger, which hampers evaluation methods that compare unsupervised POS tags to linguistic annotations. To measure the extent to which unsupervised POS tagging can contribute in application-based settings, the system is evaluated in supervised POS tagging, word sense disambiguation, named entity recognition ...
      Read Full Article
    22. Word Sense Induction and Disambiguation

      Major difficulties in language processing are caused by the fact that many words are ambiguous, i.e. they have different meanings in different contexts, but are written (or pronounced) in the same way. While syntactic ambiguities have already been addressed in the previous chapter, now the focus is set on the semantic dimension of this problem. In this chapter, the problem of word sense ambiguity is discussed in detail. A Structure Discovery process is set up, which is used as a feature to successfully improve a supervised word sense disambiguation (WSD) system. On this basis, a high-precision system for automatically ...
      Read Full Article
    23. Aligning hierarchial and sequential document trees to identify parallel data

      BACKGROUND Parallel bilingual corpora, as used herein, refers to textual data in a first language that is identified as a translation of textual data in a second language. For the sake of example, the textual data discussed herein is documents, but othertextual data can be used as well. When one document is a translation of another document, the two documents are referred to as parallel, bilingual documents. Therefore, a parallel, bilingual corpora refers to a corpus of data in a first language
      Read Full Article
    24. English-to-Korean Cross-Lingual Link Detection for Wikipedia

      In this paper, we introduce a method for automatically discovering possible links between documents in different languages. We utilized the large collection of articles in Wikipedia as our resource for keyword extraction, word sense disambiguation and in creating a bilingual dictionary. Our system runs using these set of methods for which given an English text or input document, it automatically determines important words or phrases within the context and links it to a corresponding Wikipedia article in other languages. In this system we use the Korean Wikipedia corpus as the linking document. Content Type Book ChapterPages 274-280DOI 10.1007/978-3-642-27210-3_36Authors ...
      Read Full Article
      Mentions: Busan South Korea
    73-96 of 353 « 1 2 3 4 5 6 7 ... 13 14 15 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles

  3. Organizations in the News

    1. (1 articles) WSD
    2. (1 articles) AMIA Annu Symp Proc
  4. People in the News

    1. (1 articles) Xu H