1. Articles in category: WSD

    313-336 of 354 « 1 2 ... 11 12 13 14 15 »
    1. Apparatus for classifying or disambiguating data

      A computing system has a data storage device (4, 5, 6) for storing a database consisting of a classified vocabulary of terms. A processor (1) of the apparatus is arranged to associate each term with one of a number of different categories of data and to associate all terms falling within the same category with a common code identifying a collocation of terms that exemplify that category so that terms in different categories are associated with different codes and can be disambiguated. The processor (1) is arranged to write, directly or indirectly, a classified vocabulary consisting of the terms together ...
      Read Full Article
    2. Automatic Annotation in Data Integration Systems

      CWSD (Combined Word Sense Disambiguation) is an algorithm for the automatic annotation of structured and semi-structured data sources. Instead of being targeted to textual data sources like most of the traditional WSD algorithms, CWSD can exploit knowledge from the structure of data sources together with the lexical knowledge associated with schema elements (terms in the following). We integrated CWSD in the MOMIS system (Mediator EnvirOment forMultiple Information Sources) [1], which is an framework designed for the integration of data sources, where the lexical annotation of terms was performed manually by the user. CWSD combines a structural disambiguation algorithm, that starts ...
      Read Full Article
    3. Word Sense Disambiguation

      Word Sense Disambiguation Content Type BookPublisher Springer NetherlandsDOI 10.1007/978-1-4020-4809-8Copyright 2006ISBN 978-1-4020-4808-1 (Print) 978-1-4020-4809-8 (Online)Editors Eneko Agirre, University of the Basque Country Department of Computer Science Manuel de Lardizabal 1 E-20018 Donostia Basque Country SpainPhilip Edmonds, Oxford Science Park Sharp Laboratories of Europe Limited OX4 4GB Oxford UK Book Series Text, Speech and Language TechnologyPrint ISSN 1386-291X Book Series Volume Volume 33
      Read Full Article
      Mentions: Eneko Agirre
    4. Evaluation of WSD Systems

      Evaluation of WSD Systems Content Type Book ChapterDOI 10.1007/978-1-4020-4809-8_4Authors Martha Palmer, University of Colorado Departments of Linguistics and Computer Science Hellems 295 80309 Boulder CO USAHwee Ng, National University of Singapore Department of Computer Science 3 Science Drive 2 117543 SingaporeHoa Dang, National Institute of Standards and Technology 100 Bureau Drive 8940 20899-8940 Gaithersburg MD USA Book Series Text, Speech and Language TechnologyPrint ISSN 1386-291X Book Series Volume Volume 33 Book Word Sense DisambiguationDOI 10.1007/978-1-4020-4809-8Online ISBN 978-1-4020-4809-8Print ISBN 978-1-4020-4808-1
      Read Full Article
    5. System and method of finding documents related to other documents and of finding related words in response to a query to refine a search

      A computer-implemented system and method is disclosed for retrieving documents using context-dependant probabilistic modeling of words and documents. The present invention uses multiple overlapping vectors to represent each document. Each vector is centered on each of the words in the document and includes the local environment. The vectors are used to build probability models that are used for predictions of related documents and related keywords. The results of the statistical analysis are used for retrieving an indexed document, for extracting features from a document, or for finding a word within a document. The statistical evaluation is also used to evaluate ...
      Read Full Article
    6. Training machine learning by sequential conditional generalized iterative scaling

      A system and method facilitating training machine learning systems utilizing sequential conditional generalized iterative scaling is provided. The invention includes an expected value update component that modifies an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable. The invention further includes an error calculator that calculates an error based, at least in part, upon the expected value and an observed value. The invention also includes a parameter update component that modifies a trainable parameter based, at least in part, upon the ...
      Read Full Article
      Mentions: Scgis lamda
    7. Disambiguation of term occurrences

      A method for extracting information from a corpus of data includes specifying a topic and a query term associated with the topic, and defining adjunct terms which may occur in the corpus in a context of the query term, the adjunct terms comprising one or more off-topic terms. Occurrences of the query term are found in the corpus, the occurrences including at least one occurrence of the query term together with at least one of the off-topic terms in the context of the query term. The at least one occurrence of the query term is classified as non-relevant to the ...
      Read Full Article
    8. Linguistic disambiguation system and method using string-based pattern training learn to resolve ambiguity sites

      A linguistic disambiguation system and method creates a knowledge base by training on patterns in strings that contain ambiguity sites. The string patterns are described by a set of reduced regular expressions (RREs) or very reduced regular expressions (VRREs). The knowledge base utilizes the RREs or VRREs to resolve ambiguity based upon the strings in which the ambiguity occurs. The system is trained on a training set, such as a properly labeled corpus. Once trained, the system may then apply the knowledge base to raw input strings that contain ambiguity sites. The system uses the RRE- and VRRE-based knowledge base ...
      Read Full Article
    9. Systems and methods for improved spell checking

      The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides for spell checking that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows for a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring ...
      Read Full Article
    10. ACL and Emnlp 2007 Report

      ACL/EMNLP just concluded. Overall, I thought both conferences were a success, though by now I am quite ready to return home. Prague was very nice. I especially enjoyed Tom Mitchell's invited talk on linking fMRI experiments to language. They actually use lexical semantic information to be able to identify what words people are thinking about when they scan their brains. Scary mind-reading stuff going on here. I think this is a very interesting avenue of research---probably not one I'll follow my
      Read Full Article
    11. Exponential priors for maximum entropy models

      The subject invention provides for systems and methods that facilitate optimizing one or mores sets of training data by utilizing an Exponential distribution as the prior on one or more parameters in connection with a maximum entropy (maxent) model to mitigate overfitting. Maxent is also known as logistic regression. More specifically, the systems and methods can facilitate optimizing probabilities that are assigned to the training data for later use in machine learning processes, for example. In practice, training data can be assigned their respective weights and then a probability distribution can be assigned to those weights.
      Read Full Article
    12. Method and system for theme-based word sense ambiguity reduction

      Word sense ambiguity, for "thematic" words in a sentence, is achieved based on thematic prediction. The senses of "thematic" words are disambiguated in a sentence by determining and weighting possible themes for that sentence. Possible themes are determined for that sentence based on thematic information associated with the different senses of each word in the sentence. A highly deterministic thematic-based word sense disambiguation method is used to preprocess the sentence prior to further syntactic and semantic analysis, thereby enhancing accuracy and decreasing the demand for computational resources (memory and CPU) by reducing input ambiguities.
      Read Full Article
      Mentions: United Nations
    13. Text analysis technique

      One embodiment of the present invention includes means determining a concept representation for a set of text documents based on partial order analysis and modifying this representation if it is determined to be unidentifiable. Furthermore, the embodiment includes means for labeling the representation, mapping documents to it to provide a corresponding document representation, generating a number of document signatures each of a different type, and performing several data processing applications each with a different one of the document signatures of differing types.
      Read Full Article
    14. Inferencing using disambiguated natural language rules

      A method and structure for automatically producing bridging inferences that join two related input sentences, by applying a lexicon and ontology data structure to a first input sentence to produce first input tagged sentences, applying the lexicon and ontology data structure to a second input sentence to produce second input tagged sentences, matching each first input tagged sentence to first rules, generating first inferred tagged sentences from the first rules, matching the first inferred tagged sentences to second rules, generating second inferred tagged sentences from the second rules, matching the second inferred tagged sentences to third rules, generating third inferred ...
      Read Full Article
    15. Search Doesn't Work: Story 2

      Search Doesn't Work: Story 2: NLP means many things. To me it means Natural Language Processing. To others it means neurolinguistic programming. When I search for the bare term 'nlp' in Google, I just get results with the second sense - same for other search engines. If I search for 'William Cohen', the first result on Google is for my friend Prof. William Cohen and the second for the other chap. [...] So why don't I get this for NLP? Why no mixture of results? [... ] Word sense disambiguation is a core requirement for a search engine. The problem - the same ...
      Read Full Article
    16. Search Doesn't Work: Story 2

      Search Doesn't Work: Story 2: NLP means many things. To me it means Natural Language Processing. To others it means neurolinguistic programming. When I search for the bare term 'nlp' in Google, I just get results with the second sense - same for other search engines. If I search for 'William Cohen', the first result on Google is for my friend Prof. William Cohen and the second for the other chap. [...] So why don't I get this for NLP? Why no mixture of results? [... ] Word sense disambiguation is a core requirement for a search engine. The problem - the same ...
      Read Full Article
    17. Training machine learning by sequential conditional generalized iterative scaling

      A system and method facilitating training machine learning systems utilizing sequential conditional generalized iterative scaling is provided. The invention includes an expected value update component that modifies an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable. The invention further includes an error calculator that calculates an error based, at least in part, upon the expected value and an observed value. The invention also includes a parameter update component that modifies a trainable parameter based, at least in part, upon the ...
      Read Full Article
      Mentions: Scgis lamda
    18. System and method for automatically discovering a hierarchy of concepts from a corpus of documents

      The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then ...
      Read Full Article
      Mentions: Japan Yahoo Finland
    19. Method and system for naming a cluster of words and phrases

      The present invention provides a method, system and computer program for naming a cluster, or a hierarchy of clusters, of words and phrases that have been extracted from a set of documents. The invention takes these clusters as the input and generates appropriate labels for the clusters using a lexical database. Naming involves first finding out all possible word senses for all the words in the cluster, using the lexical database; and then augmenting each word sense with words that are semantically similar to that word sense to form respective definition vectors. Thereafter, word sense disambiguation is done to find ...
      Read Full Article
      Mentions: Finland Espoo Madison
    20. Method and system for encoding and accessing linguistic frequency data

      Linguistic frequency data is encoded by identifying a plurality of sets of character strings in a source text, where each set comprises at least a first and a second character string. Frequency data is obtained for each set and stored at a memory position in a first memory array that is assigned to each first character string. A pointer pointing to a position in the first memory array that has been assigned to the corresponding first character string of the respective set and which has stored the frequency data of the respective set, is stored in a second memory array ...
      Read Full Article
    21. Method for generating training data for medical text abbreviation and acronym normalization

      A method for electronically generating high-quality feature vectors that can be used in connection with electronic data processing systems implementing Maximum Entropy or other statistical models to accurately normalize abbreviations in text such as medical records. An abbreviation database and a training text database are provided. The abbreviation database includes abbreviation data representative of abbreviations and associated expansions to be normalized. The training text database includes a corpus of text having expansions of the abbreviations to be normalized. The corpus of text is processed as a function of the abbreviation data to identify the expansions in the corpus of text ...
      Read Full Article
    22. Differential LSI space-based probabilistic document classifier

      A computerized method for automatic document classification based on a combined use of the projection and the distance of the differential document vectors to the differential latent semantics index (DLSI) spaces. The method includes the setting up of a DLSI space-based classifier to be stored in computer storage and the use of such classifier by a computer to evaluate the possibility of a document belonging to a given cluster using a posteriori probability function and to classify the document in the cluster. The classifier is effective in operating on very large numbers of documents such as with document retrieval systems ...
      Read Full Article
    23. Word sense disambiguation

      Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense ...
      Read Full Article
    24. System and method for matching a textual input to a lexical knowledge based and for utilizing results of that match

      The present invention can be used in a natural language processing system to determine a relationship (such as similarity in meaning) between two textual segments. The relationship can be identified or determined based on logical graphs generated from the textual segments. A relationship between first and second logical graphs is determined. This is accomplished regardless of whether there is an exact match between the first and second logical graphs. In one embodiment, the first graph represents an input textual discourse unit. The second graph, in one embodiment, represents information in a lexical knowledge base (LKB). The input graph can be ...
      Read Full Article
    313-336 of 354 « 1 2 ... 11 12 13 14 15 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles