1. Articles in category: WSD

    289-312 of 369 « 1 2 ... 10 11 12 13 14 15 16 »
    1. Co-training, 10 years later

      At this year's ICML, they gave out a "10 year" award to a paper published in an ICML-related venue from 1998. This year it went to a COLT 1998 paper by Avrim Blum and Tom Mitchell: Combining Labeled and Unlabeled Data with Co-Training. While I'm not super familiar with what might have been a contender, I have to say that I definitely think this is a good choice. For those unfamiliar with the idea of co-training, you should really read the paper. There's also a wikipedia entry that describes it a
      Read Full Article
    2. Using Sense Recognition to Resolve the Problem of Polysemy in Building a Taxonomic Hierarchy

      Hyponymy is used to build a taxonomic hierarchy. But the terms in hyponymy may have multiple senses. It will cause the problem of polysemy and affect the building of taxonomic hierarchy. In order to solve the problem, we present a method of sense recognition of hyponymy based on vector space model. Firstly we acquire the contexts of hyponymy from Chinese free corpus. Secondly we use Cilin to construct a relation-word vector space. Then we use latent semantic analysis to reduce the dimension of the vector space. In the final phase, we recognize the senses of hyponymy using average-group clustering. Experimental ...
      Read Full Article
    3. Uncovering the Deep Web: Transferring Relational Database Content and Metadata to OWL Ontologies

      Organizing the publicly available Web content into highly systematized domain ontologies is a necessary step in the evolvement of the Semantic Web. A large portion of that content called the deep Web is stored in relational databases and it is not accessible to Web search engines. Incorporation of the deep Web data results in domain ontologies richer both in content and in semantic relations. In this paper we introduce a framework for an automatic mapping of relational database metadata and content to domain ontologies written in OWL. Relational constructs: relations, attributes and primary-foreign key associations are translated to OWL classes ...
      Read Full Article
    4. Word Sense Disambiguation with Semantic Networks

      Word sense disambiguation (WSD) methods evolve towards exploring all of the available semantic information that word thesauri provide. In this scope, the use of semantic graphs and new measures of semantic relatedness may offer better WSD solutions. In this paper we propose a new measure of semantic relatedness between any pair of terms for the English language, using WordNet as our knowledge base. Furthermore, we introduce a new WSD method based on the proposed measure. Experimental evaluation of the proposed method in benchmark data shows that our method matches or surpasses state of the art results. Moreover, we evaluate the ...
      Read Full Article
    5. Improving Unsupervised WSD with a Dynamic Thesaurus

      The method proposed by Diana McCarthy et al. [1] obtains the predominant sense for an ambiguous word based on a weighted list of terms related to the ambiguous word. This list of terms is obtained using the distributional similarity method proposed by Lin [2] to obtain a thesaurus. In that method, every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where it occurs. Every different word to be disambiguated uses the same thesaurus. In this paper we explore a different method that accounts for the context of a word when determining the most frequent sense ...
      Read Full Article
    6. Statistical Word Sense Disambiguation in Contexts for Russian Nouns Denoting Physical Objects

      The paper presents experimental results on automatic word sense disambiguation (WSD). Contexts for polysemous and/or homonymic Russian nouns denoting physical objects serve as an empirical basis of the study. Sets of contexts were extracted from the Russian National Corpus (RNC). Machine learning software for WSD was developed within the framework of the project. WSD tool used in experiments is aimed at statistical processing and classification of noun contexts. WSD procedure was performed taking into account lexical markers of word meanings in contexts and semantic annotation of contexts. Sets of experiments allowed to define optimal conditions for WSD in Russian ...
      Read Full Article
      Mentions: Moscow Russian Russia
    7. A New Decision Rule for Statistical Word Sense Disambiguation

      Word Sense Disambiguation (WSD) is usually considered to be a pattern classification to research and it has always being a key problem and one of difficult points in natural language processing. Statistical learning theory is a mainstream of the research method for WSD. The distribution of the word-senses of an ambiguous word is always not symmetrical and the distinction between word-senses’ emergence frequency is great sometimes, so the judgment results are inclined to the maximum probability word-sense in the word-sense classification. The reflection of this phenomenon is obviously in the Bayesian model. When using the Bayesian model to carry on ...
      Read Full Article
      Mentions: China
    8. A Vicarious Words Method for Word Sense Discrimination

      This paper presents a new approach based on Vicarious Words (VWs) to resolve Word Sense Discrimination (WSD) in Chinese language. VWs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the VW solution on Senseval-3 Chinese test suite. The performance is better than state-of-the-art results with an average F-measure of 0.80. The experiment verifies the value of VW for unsupervised method in WSD. Content Type Book ChapterDOI 10.1007/978-3-540-87442-3_50Authors Zhimao Lu, Harbin Engineering University Harbin ChinaDongmei Fan, Harbin Engineering University Harbin ChinaRubo Zhang, Harbin ...
      Read Full Article
      Mentions: China
    9. Learning MultiLinguistic Knowledge for Opinion Analysis

      Most existing opinion analysis techniques used word-level sentiment knowledge but lack the learning capacity on the behaviors of context-dependent opinion words. Meanwhile, the use of collocation-level sentiment knowledge is not well studied. This paper presents an opinion analysis system, namely OA, which incorporates the word-level and collocation-level sentiment knowledge. Based on the observation on the NTCIR-6 opinion training corpus, some word-level and collocation-level linguistic clues for opinion analysis are discovered. Learning techniques are developed to learn the features corresponding to these discovered clues. These features are in turn incorporated into a classifier based on support vector machine to identify opinionated ...
      Read Full Article
    10. Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

      This paper describes disambiguation of Farsi homographs in unrestricted text using thesaurus and corpus. The proposed method is based on [1] with some differences. These differences consist of first using collocational information to avoid the collection of spurious contexts caused by polysemous words in thesaurus categories, and second contribution of all words in the test data context, even those not appeared in the collected contexts to the calculation of the conceptual classes’ score. Using a Farsi corpus and a Farsi thesaurus, this method correctly disambiguated 91.46% of the instances of 15 Farsi homographs. This method was compared to three ...
      Read Full Article
    11. Word Sense Disambiguation as the Primary Step of Ontology Integration

      The recommendable primary step of ontology integration is annotation of ontology components with entries from WordNet or other dictionary sources in order to disambiguate their meaning. This paper presents an approach to automatically disambiguating the meaning of OWL ontology classes by providing sense annotation from WordNet. A class name is disambiguated using the names of the related classes, by comparing the taxonomy of the ontology with the portions of the WordNet taxonomy corresponding to all possible meanings of the class. The equivalence of the taxonomies is expressed by a probability function called affinity function. We apply two different basic techniques ...
      Read Full Article
    12. Web-Based Measure of Semantic Relatedness

      Semantic relatedness measures quantify the degree in which some words or concepts are related, considering not only similarity but any possible semantic relationship among them. Relatedness computation is of great interest in different areas, such as Natural Language Processing, Information Retrieval, or the Semantic Web. Different methods have been proposed in the past; however, current relatedness measures lack some desirable properties for a new generation of Semantic Web applications: maximum coverage, domain independence, and universality. In this paper, we explore the use of a semantic relatedness measure between words, that uses the Web as knowledge source. This measure exploits the ...
      Read Full Article
    13. Word Sense Disambiguation for Vocabulary Learning

      Words with multiple meanings are a phenomenon inherent to any natural language. In this work, we study the effects of such lexical ambiguities on second language vocabulary learning. We demonstrate that machine learning algorithms for word sense disambiguation can induce classifiers that exhibit high accuracy at the task of disambiguating homonyms (words with multiple distinct meanings). Results from a user study that compared two versions of a vocabulary tutoring system, one that applied word sense disambiguation to support learning and another that did not, support rejection of the null hypothesis that learning outcomes with and without word sense disambiguation are ...
      Read Full Article
    14. A Method for Automatic Text Categorization Using Word Sense Disambiguation

      At present time, Information plays a relevant role in current societies. In this context, Internet is one of the most extended mechanisms to communicate and distribute information around the word. Today, due to the extremely large number of information sources, automatic mechanisms are needed to filter the information that could be useful for each user. However, one of the problems that the usual techniques of automatic text categorization have not been able to handle is polysemy (words with two o more senses). In this paper, we have faced this problem by proposing a semantic analyzer for the automatic categorization of ...
      Read Full Article
    15. Improving Data Integration through Disambiguation Techniques

      In this paper Word Sense Disambiguation (WSD) issue in the context of data integration is outlined and an Approximate Word Sense Disambiguation approach (AWSD) is proposed for the automatic lexical annotation of structured and semi-structured data sources. Content Type Book ChapterDOI 10.1007/978-3-540-69858-6_47Authors Laura Po, Universitá di Modena e Reggio Emilia Dipartimento di Ingegneria dell’Informazione Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5039/2008 Book Natural Language and Information SystemsDOI 10.1007/978-3-540-69858-6Print ISBN 978-3-540-69857-9
      Read Full Article
    16. One-row keyboard and approximate typing

      BACKGROUNDIt is fascinating to reflect on the great impact of the PC on society. The PC is clearly capable of doing a whole lot more than what we use it for. Essentially, for many the PC became a replacement for the typewriter, and it is still largely aword processor. The other initial application that drove PC adoption was the spreadsheet. Then it also became a presentation creation tool. Over time, email was introduced and, even more recently, Web browsing. Web browsing, in turn, is giving ris
      Read Full Article
    17. Measures of correlation

      As NLPers, we're often tasked with measuring similarity of things, where things are usually words or phrases or something like that. The most standard example is measuring Namely, for every pair of words, do they co-locate (appear next to each other) more than they "should." There are lots of statistics that are used to measure collocation, but I'm not aware of any in-depth comparison of these. If anyone actually still reads this blog over the summer and would care to comment, it would be apprec
      Read Full Article
    18. Detecting and measuring risk with predictive models using content mining

      Computer implemented methods and systems of processing transactions to determine the risk of transaction convert high categorical information, such as text data, to low categorical information, such as category or cluster IDs. The text data may be merchant names or other textual content of the transactions, or data related to a consumer, or any other type of entity which engages in the transaction. Content mining techniques are used to provide the conversion from high to low categorical information. In operation, the resulting low categorical information is input, along with other data, into a statistical model. The statistical model provides an ...
      Read Full Article
      Mentions: Japan Chicago Brazil
    19. Comparing the University of South Florida Homograph Norms with Empirical Corpus Data

      The basis for most classification algorithms dealing with word sense induction and word sense disambiguation is the assumption that certain context words are typical of a particular sense of an ambiguous word. However, as such algorithms have been only moderately successful in the past, the question that we raise here is if this assumption really holds. Starting with an inventory of predefined senses and sense descriptors taken from the University of South Florida Homograph Norms, we present a quantitative study of the distribution of these descriptors in a large corpus. Hereby, our focus is on the comparison of co-occurrence frequencies ...
      Read Full Article
    20. Distant Collocations between Suppositional Adverbs and Clause-Final Modality Forms in Japanese Language Corpora

      Co-occurring of modal adverbs and clause-final modality forms in the Japanese language exhibits a strong agreement-like behaviour. We refer to such co-occurrences as distant collocations - a notion that warrants further consideration within the fields of corpus linguistics and computational linguistics. In this paper we concentrate on a set of suppositional adverbs and investigate the kinds of clause-final modality forms that they frequently co-occur with. One group of adverbs is found to typically collocate with one group of modality forms (one modality type) to a high degree, but also co-occurs with other modality types. Analyzing a variety of corpora revealed that ...
      Read Full Article
    21. Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus

      Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using ...
      Read Full Article
    22. Bootstrapping Word Sense Disambiguation Using Dynamic Web Knowledge

      Word Sense Disambiguation(WSD) is one of the traditionally most difficult problems in natural language processing and has broad theoretical and practical implications. One of the main difficulties for WSD systems is the lack of relevant knowledge–commonly known as the knowledge acquisition bottleneck problem. We present in this paper a novel method that utilizes dynamic Web data obtained through Web search engines to effectively enrich the semantic knowledge for WSD systems. We demonstrated through a word sense disambiguation system the large quantity and good quality of the extracted knowledge. Content Type Book ChapterDOI 10.1007/978-3-540-36668-3_151Authors Yuanyong Wang, School ...
      Read Full Article
    23. Building Clusters of Related Words: An Unsupervised Approach

      The task of finding semantically related words from a text corpus has applications in - to name a few - lexicon induction, word sense disambiguation and information retrieval. The text data in real world, say from the World Wide Web, need not be grammatical. Hence methods relying on parsing or part-of-speech tagging will not perform well in these applications. Further even if the text is grammatically correct, for large corpora, these methods may not scale well. The task of building semantically related sets of words from a corpus of documents and allied problems have been studied extensively in the literature. Most of ...
      Read Full Article
    24. Method and system for adapting synonym resources to specific domains

      A method and system for processing synonyms that adapts a general-purpose synonym resource to a specific domain. The method selects out a domain-specific subset of synonyms from the set of general-purpose synonyms. The synonym processing method in turn comprises two methods that can be used either together or on their own. A method of synonym pruning eliminates those synonyms that are inappropriate in a specific domain. A method of synonym optimization eliminates those synonyms that are unlikely to be used in a specific domain. The method has many applications including, but not limited to, information retrieval and domain-specific thesauri as ...
      Read Full Article
    289-312 of 369 « 1 2 ... 10 11 12 13 14 15 16 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles