1. Articles in category: WSD

    313-336 of 369 « 1 2 ... 11 12 13 14 15 16 »
    1. n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation

      Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a positive way, it is not clear yet how these applications should be integrated to allow the strengths of both to be exploited. This paper aims to contribute to the recent investigation on the usefulness of WSD for SMT by using n-best reranking to efficiently integrate WSD with ...
      Read Full Article
    2. Various Criteria of Collocation Cohesion in Internet: Comparison of Resolving Power

      For extracting collocations from the Internet, it is necessary to numerically estimate the cohesion between potential collocates. Mutual Information cohesion measure (MI) based on numbers of collocate occurring closely together (N 12) and apart (N 1, N 2) is well known, but the Web page statistics deprives MI of its statistical validity. We propose a family of different measures that depend on N 1, N 2 and N 12 in a similar monotonic way and possess the scalability feature of MI. We apply the new criteria for a collection of N 1, N 2, and N 12 obtained from AltaVista ...
      Read Full Article
    3. A Semantics-Enhanced Language Model for Unsupervised Word Sense Disambiguation

      An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handle a variety of NLP problems with reasonable success, the standard model does not incorporate semantic information, and consequently limits its applicability to semantic problems such as word sense disambiguation. We propose a framework that integrates semantic information into the language model schema, allowing a system to exploit both syntactic and semantic information to address NLP problems. Furthermore, acknowledging the limited availability of semantically annotated data, we discuss how the proposed model can be learned ...
      Read Full Article
    4. The strength of co-authorship in gene name disambiguation.

      Related Articles The strength of co-authorship in gene name disambiguation. BMC Bioinformatics. 2008 Jan 29;9(1):69 Authors: Farkas R ABSTRACT: BACKGROUND: A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) - a special case of Word Sense Disambiguation (WSD) - is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We ...
      Read Full Article
    5. Structured Machine Learning: Ten Problems for the Next Ten Years

      Pedro Domingos Department of Computer Science and Engineering University of Washington 1 Statistical Predicate Invention Predicate invention in ILP and hidden variable discovery in statistical learning are really two faces of the same problem. Researchers in both communities generally agree that this is a key (if not the key) problem for machine learning. Without predicate invention, learning will always be shallow. In essence,
      Read Full Article
    6. Semantic taxonomy induction from heterogenous evidence

      Semantic Taxonomy Induction from Heterogenous Evidence Rion Snow Computer Science Department Stanford University Stanford, CA 94305 rion@cs.stanford.edu Daniel Jurafsky Linguistics Department Stanford University Stanford, CA 94305 jurafsky@stanford.edu Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 ang@cs.stanford.edu Abstract We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on ind
      Read Full Article
    7. Learning to merge word senses

      Learning to Merge Word Senses Rion Snow Sushant Prakash Computer Science Department Stanford University Stanford, CA 94305 USA {rion,sprakash}@cs.stanford.edu Daniel Jurafsky Linguistics Department Stanford University Stanford, CA 94305 USA jurafsky@stanford.edu Andrew Y. Ng Computer Science Department Stanford University Stanford, CA 94305 USA ang@cs.stanford.edu Abstract It has been widely observed that different NLP applications require different sense granularities in order to best exploi
      Read Full Article
    8. Unsupervised Models for Named Entity Classification

      Michael Collins and Yoram Singer AT&T Labs--Research, 180 Park Avenue, Florham Park, NJ 07932 fmcollins,singerg@research.att.com Abstract This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use of unlabeled data c
      Read Full Article
    9. Machine Learning Methods in Natural Language Processing

      Machine Learning Methods in Natural Language Processing Michael Collins MIT CSAIL Some NLP Problems Information extraction – Named entities – Relationships between entities Finding linguistic structure – Part-of-speech tagging – Parsing Machine translation Common Themes Need to learn mapping from one discrete structure to another – Strings to hidden state sequences Named-entity extraction, part-of-speech tagging – Strings to strings Machine translation – Strings to underlying trees Pa
      Read Full Article
    10. A fault model for ontology mapping, alignment, and linking systems.

      Related Articles A fault model for ontology mapping, alignment, and linking systems. Pac Symp Biocomput. 2007;:233-44 Authors: Johnson HL, Cohen KB, Hunter L There has been much work devoted to the mapping, alignment, and linking of ontologies (MALO), but little has been published about how to evaluate systems that do this. A fault model for conducting fine-grained evaluations of MALO systems is proposed, and its application to the system described in Johnson et al. [15] is illustrated. Two judges categorized errors according to the model, and inter-judge agreement was calculated by error category. Overall inter-judge agreement was 98% after ...
      Read Full Article
    11. Extracting Paraphrases from a Parallel Corpus

      While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the...
      Read Full Article
    12. Automatic Evaluation of Text Coherence: Models and Representations

      This paper investigates the automatic evaluation of text coherence for machine-generated texts. We introduce a fully-automatic, linguistically rich model of local coherence that correlates with human judgments. The modeling approach taken relies on shallow text properties and is relatively inexpensive. We present experimental results that comparatively assess the predictive power of various discourse representations prop...
      Read Full Article
    13. Apparatus for classifying or disambiguating data

      A computing system has a data storage device (4, 5, 6) for storing a database consisting of a classified vocabulary of terms. A processor (1) of the apparatus is arranged to associate each term with one of a number of different categories of data and to associate all terms falling within the same category with a common code identifying a collocation of terms that exemplify that category so that terms in different categories are associated with different codes and can be disambiguated. The processor (1) is arranged to write, directly or indirectly, a classified vocabulary consisting of the terms together ...
      Read Full Article
    14. Automatic Annotation in Data Integration Systems

      CWSD (Combined Word Sense Disambiguation) is an algorithm for the automatic annotation of structured and semi-structured data sources. Instead of being targeted to textual data sources like most of the traditional WSD algorithms, CWSD can exploit knowledge from the structure of data sources together with the lexical knowledge associated with schema elements (terms in the following). We integrated CWSD in the MOMIS system (Mediator EnvirOment forMultiple Information Sources) [1], which is an framework designed for the integration of data sources, where the lexical annotation of terms was performed manually by the user. CWSD combines a structural disambiguation algorithm, that starts ...
      Read Full Article
    15. Word Sense Disambiguation

      Word Sense Disambiguation Content Type BookPublisher Springer NetherlandsDOI 10.1007/978-1-4020-4809-8Copyright 2006ISBN 978-1-4020-4808-1 (Print) 978-1-4020-4809-8 (Online)Editors Eneko Agirre, University of the Basque Country Department of Computer Science Manuel de Lardizabal 1 E-20018 Donostia Basque Country SpainPhilip Edmonds, Oxford Science Park Sharp Laboratories of Europe Limited OX4 4GB Oxford UK Book Series Text, Speech and Language TechnologyPrint ISSN 1386-291X Book Series Volume Volume 33
      Read Full Article
      Mentions: Eneko Agirre
    16. Evaluation of WSD Systems

      Evaluation of WSD Systems Content Type Book ChapterDOI 10.1007/978-1-4020-4809-8_4Authors Martha Palmer, University of Colorado Departments of Linguistics and Computer Science Hellems 295 80309 Boulder CO USAHwee Ng, National University of Singapore Department of Computer Science 3 Science Drive 2 117543 SingaporeHoa Dang, National Institute of Standards and Technology 100 Bureau Drive 8940 20899-8940 Gaithersburg MD USA Book Series Text, Speech and Language TechnologyPrint ISSN 1386-291X Book Series Volume Volume 33 Book Word Sense DisambiguationDOI 10.1007/978-1-4020-4809-8Online ISBN 978-1-4020-4809-8Print ISBN 978-1-4020-4808-1
      Read Full Article
    17. System and method of finding documents related to other documents and of finding related words in response to a query to refine a search

      A computer-implemented system and method is disclosed for retrieving documents using context-dependant probabilistic modeling of words and documents. The present invention uses multiple overlapping vectors to represent each document. Each vector is centered on each of the words in the document and includes the local environment. The vectors are used to build probability models that are used for predictions of related documents and related keywords. The results of the statistical analysis are used for retrieving an indexed document, for extracting features from a document, or for finding a word within a document. The statistical evaluation is also used to evaluate ...
      Read Full Article
    18. Training machine learning by sequential conditional generalized iterative scaling

      A system and method facilitating training machine learning systems utilizing sequential conditional generalized iterative scaling is provided. The invention includes an expected value update component that modifies an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable. The invention further includes an error calculator that calculates an error based, at least in part, upon the expected value and an observed value. The invention also includes a parameter update component that modifies a trainable parameter based, at least in part, upon the ...
      Read Full Article
      Mentions: Scgis lamda
    19. Disambiguation of term occurrences

      A method for extracting information from a corpus of data includes specifying a topic and a query term associated with the topic, and defining adjunct terms which may occur in the corpus in a context of the query term, the adjunct terms comprising one or more off-topic terms. Occurrences of the query term are found in the corpus, the occurrences including at least one occurrence of the query term together with at least one of the off-topic terms in the context of the query term. The at least one occurrence of the query term is classified as non-relevant to the ...
      Read Full Article
    20. Linguistic disambiguation system and method using string-based pattern training learn to resolve ambiguity sites

      A linguistic disambiguation system and method creates a knowledge base by training on patterns in strings that contain ambiguity sites. The string patterns are described by a set of reduced regular expressions (RREs) or very reduced regular expressions (VRREs). The knowledge base utilizes the RREs or VRREs to resolve ambiguity based upon the strings in which the ambiguity occurs. The system is trained on a training set, such as a properly labeled corpus. Once trained, the system may then apply the knowledge base to raw input strings that contain ambiguity sites. The system uses the RRE- and VRRE-based knowledge base ...
      Read Full Article
    21. Systems and methods for improved spell checking

      The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides for spell checking that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows for a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring ...
      Read Full Article
    313-336 of 369 « 1 2 ... 11 12 13 14 15 16 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles