1. Articles in category: WSD

    289-312 of 354 « 1 2 ... 10 11 12 13 14 15 »
    1. One-row keyboard and approximate typing

      BACKGROUNDIt is fascinating to reflect on the great impact of the PC on society. The PC is clearly capable of doing a whole lot more than what we use it for. Essentially, for many the PC became a replacement for the typewriter, and it is still largely aword processor. The other initial application that drove PC adoption was the spreadsheet. Then it also became a presentation creation tool. Over time, email was introduced and, even more recently, Web browsing. Web browsing, in turn, is giving ris
      Read Full Article
    2. Measures of correlation

      As NLPers, we're often tasked with measuring similarity of things, where things are usually words or phrases or something like that. The most standard example is measuring Namely, for every pair of words, do they co-locate (appear next to each other) more than they "should." There are lots of statistics that are used to measure collocation, but I'm not aware of any in-depth comparison of these. If anyone actually still reads this blog over the summer and would care to comment, it would be apprec
      Read Full Article
    3. Detecting and measuring risk with predictive models using content mining

      Computer implemented methods and systems of processing transactions to determine the risk of transaction convert high categorical information, such as text data, to low categorical information, such as category or cluster IDs. The text data may be merchant names or other textual content of the transactions, or data related to a consumer, or any other type of entity which engages in the transaction. Content mining techniques are used to provide the conversion from high to low categorical information. In operation, the resulting low categorical information is input, along with other data, into a statistical model. The statistical model provides an ...
      Read Full Article
      Mentions: Japan Chicago Brazil
    4. Comparing the University of South Florida Homograph Norms with Empirical Corpus Data

      The basis for most classification algorithms dealing with word sense induction and word sense disambiguation is the assumption that certain context words are typical of a particular sense of an ambiguous word. However, as such algorithms have been only moderately successful in the past, the question that we raise here is if this assumption really holds. Starting with an inventory of predefined senses and sense descriptors taken from the University of South Florida Homograph Norms, we present a quantitative study of the distribution of these descriptors in a large corpus. Hereby, our focus is on the comparison of co-occurrence frequencies ...
      Read Full Article
    5. Distant Collocations between Suppositional Adverbs and Clause-Final Modality Forms in Japanese Language Corpora

      Co-occurring of modal adverbs and clause-final modality forms in the Japanese language exhibits a strong agreement-like behaviour. We refer to such co-occurrences as distant collocations - a notion that warrants further consideration within the fields of corpus linguistics and computational linguistics. In this paper we concentrate on a set of suppositional adverbs and investigate the kinds of clause-final modality forms that they frequently co-occur with. One group of adverbs is found to typically collocate with one group of modality forms (one modality type) to a high degree, but also co-occurs with other modality types. Analyzing a variety of corpora revealed that ...
      Read Full Article
    6. Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus

      Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using ...
      Read Full Article
    7. Bootstrapping Word Sense Disambiguation Using Dynamic Web Knowledge

      Word Sense Disambiguation(WSD) is one of the traditionally most difficult problems in natural language processing and has broad theoretical and practical implications. One of the main difficulties for WSD systems is the lack of relevant knowledge–commonly known as the knowledge acquisition bottleneck problem. We present in this paper a novel method that utilizes dynamic Web data obtained through Web search engines to effectively enrich the semantic knowledge for WSD systems. We demonstrated through a word sense disambiguation system the large quantity and good quality of the extracted knowledge. Content Type Book ChapterDOI 10.1007/978-3-540-36668-3_151Authors Yuanyong Wang, School ...
      Read Full Article
    8. Building Clusters of Related Words: An Unsupervised Approach

      The task of finding semantically related words from a text corpus has applications in - to name a few - lexicon induction, word sense disambiguation and information retrieval. The text data in real world, say from the World Wide Web, need not be grammatical. Hence methods relying on parsing or part-of-speech tagging will not perform well in these applications. Further even if the text is grammatically correct, for large corpora, these methods may not scale well. The task of building semantically related sets of words from a corpus of documents and allied problems have been studied extensively in the literature. Most of ...
      Read Full Article
    9. Method and system for adapting synonym resources to specific domains

      A method and system for processing synonyms that adapts a general-purpose synonym resource to a specific domain. The method selects out a domain-specific subset of synonyms from the set of general-purpose synonyms. The synonym processing method in turn comprises two methods that can be used either together or on their own. A method of synonym pruning eliminates those synonyms that are inappropriate in a specific domain. A method of synonym optimization eliminates those synonyms that are unlikely to be used in a specific domain. The method has many applications including, but not limited to, information retrieval and domain-specific thesauri as ...
      Read Full Article
    10. n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation

      Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a positive way, it is not clear yet how these applications should be integrated to allow the strengths of both to be exploited. This paper aims to contribute to the recent investigation on the usefulness of WSD for SMT by using n-best reranking to efficiently integrate WSD with ...
      Read Full Article
    11. Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power

      For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N 12) and apart (N 1, N 2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N 1, N 2 and N 12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N 1, N 2, and N 12 obtained from AltaVista ...
      Read Full Article
    12. A Semantics-Enhanced Language Model for Unsupervised Word Sense Disambiguation

      An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handle a variety of NLP problems with reasonable success, the standard model does not incorporate semantic information, and consequently limits its applicability to semantic problems such as word sense disambiguation. We propose a framework that integrates semantic information into the language model schema, allowing a system to exploit both syntactic and semantic information to address NLP problems. Furthermore, acknowledging the limited availability of semantically annotated data, we discuss how the proposed model can be learned ...
      Read Full Article
    13. The strength of co-authorship in gene name disambiguation.

      Related Articles The strength of co-authorship in gene name disambiguation. BMC Bioinformatics. 2008 Jan 29;9(1):69 Authors: Farkas R ABSTRACT: BACKGROUND: A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) - a special case of Word Sense Disambiguation (WSD) - is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We ...
      Read Full Article
    14. Structured Machine Learning: Ten Problems for the Next Ten Years

      Pedro Domingos Department of Computer Science and Engineering University of Washington 1 Statistical Predicate Invention Predicate invention in ILP and hidden variable discovery in statistical learning are really two faces of the same problem. Researchers in both communities generally agree that this is a key (if not the key) problem for machine learning. Without predicate invention, learning will always be shallow. In essence,
      Read Full Article
    15. Semantic taxonomy induction from heterogenous evidence

      Semantic Taxonomy Induction from Heterogenous Evidence Rion Snow Computer Science Department Stanford University Stanford, CA 94305 rion@cs.stanford.edu Daniel Jurafsky Linguistics Department Stanford University Stanford, CA 94305 jurafsky@stanford.edu Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 ang@cs.stanford.edu Abstract We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on ind
      Read Full Article
    16. Learning to merge word senses

      Learning to Merge Word Senses Rion Snow Sushant Prakash Computer Science Department Stanford University Stanford, CA 94305 USA {rion,sprakash}@cs.stanford.edu Daniel Jurafsky Linguistics Department Stanford University Stanford, CA 94305 USA jurafsky@stanford.edu Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 USA ang@cs.stanford.edu Abstract It has been widely observed that different NLP applications require different sense granularities in order to best exploi
      Read Full Article
    17. Unsupervised Models for Named Entity Classification

      Michael Collins and Yoram Singer AT&T Labs--Research, 180 Park Avenue, Florham Park, NJ 07932 fmcollins,singerg@research.att.com Abstract This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabeled data c
      Read Full Article
    18. Machine Learning Methods in Natural Language Processing

      Machine Learning Methods in Natural Language Processing Michael Collins MIT CSAIL Some NLP Problems Information extraction – Named entities – Relationships between entities Finding linguistic structure – Part-of-speech tagging – Parsing Machine translation Common Themes Need to learn mapping from one discrete structure to another – Strings to hidden state sequences Named-entity extraction, part-of-speech tagging – Strings to strings Machine translation – Strings to underlying trees Pa
      Read Full Article
    19. A fault model for ontology mapping, alignment, and linking systems.

      Related Articles A fault model for ontology mapping, alignment, and linking systems. Pac Symp Biocomput. 2007;:233-44 Authors: Johnson HL, Cohen KB, Hunter L There has been much work devoted to the mapping, alignment, and linking of ontologies (MALO), but little has been published about how to evaluate systems that do this. A fault model for conducting fine-grained evaluations of MALO systems is proposed, and its application to the system described in Johnson et al. [15] is illustrated. Two judges categorized errors according to the model, and inter-judge agreement was calculated by error category. Overall inter-judge agreement was 98% after ...
      Read Full Article
    20. Extracting Paraphrases from a Parallel Corpus

      While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the...
      Read Full Article
    21. Automatic Evaluation of Text Coherence: Models and Representations

      This paper investigates the automatic evaluation of text coherence for machine-generated texts. We introduce a fully-automatic, linguistically rich model of local coherence that correlates with human judgments. The modeling approach taken relies on shallow text properties and is relatively inexpensive. We present experimental results that comparatively assess the predictive power of various discourse representations prop...
      Read Full Article
    289-312 of 354 « 1 2 ... 10 11 12 13 14 15 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles