1. Articles in category: WSD

    337-360 of 370 « 1 2 ... 12 13 14 15 16 »
    1. Systems and methods for improved spell checking

      The present invention leverages iterative transformations of search query strings along with statistics extracted from search query logs and/or web data to provide possible alternative spellings for the search query strings. This provides for spell checking that can be influenced to provide individualized suggestions for each user. By utilizing search query logs, the present invention can account for substrings not found in a lexicon but still acceptable as a search query of interest. This allows for a higher quality proposal for alternative spellings, beyond the content of the lexicon. One instance of the present invention operates at a substring ...
      Read Full Article
    2. ACL and Emnlp 2007 Report

      ACL/EMNLP just concluded. Overall, I thought both conferences were a success, though by now I am quite ready to return home. Prague was very nice. I especially enjoyed Tom Mitchell's invited talk on linking fMRI experiments to language. They actually use lexical semantic information to be able to identify what words people are thinking about when they scan their brains. Scary mind-reading stuff going on here. I think this is a very interesting avenue of research---probably not one I'll follow my
      Read Full Article
    3. Exponential priors for maximum entropy models

      The subject invention provides for systems and methods that facilitate optimizing one or mores sets of training data by utilizing an Exponential distribution as the prior on one or more parameters in connection with a maximum entropy (maxent) model to mitigate overfitting. Maxent is also known as logistic regression. More specifically, the systems and methods can facilitate optimizing probabilities that are assigned to the training data for later use in machine learning processes, for example. In practice, training data can be assigned their respective weights and then a probability distribution can be assigned to those weights.
      Read Full Article
    4. Method and system for theme-based word sense ambiguity reduction

      Word sense ambiguity, for "thematic" words in a sentence, is achieved based on thematic prediction. The senses of "thematic" words are disambiguated in a sentence by determining and weighting possible themes for that sentence. Possible themes are determined for that sentence based on thematic information associated with the different senses of each word in the sentence. A highly deterministic thematic-based word sense disambiguation method is used to preprocess the sentence prior to further syntactic and semantic analysis, thereby enhancing accuracy and decreasing the demand for computational resources (memory and CPU) by reducing input ambiguities.
      Read Full Article
      Mentions: United Nations
    5. Text analysis technique

      One embodiment of the present invention includes means determining a concept representation for a set of text documents based on partial order analysis and modifying this representation if it is determined to be unidentifiable. Furthermore, the embodiment includes means for labeling the representation, mapping documents to it to provide a corresponding document representation, generating a number of document signatures each of a different type, and performing several data processing applications each with a different one of the document signatures of differing types.
      Read Full Article
    6. Inferencing using disambiguated natural language rules

      A method and structure for automatically producing bridging inferences that join two related input sentences, by applying a lexicon and ontology data structure to a first input sentence to produce first input tagged sentences, applying the lexicon and ontology data structure to a second input sentence to produce second input tagged sentences, matching each first input tagged sentence to first rules, generating first inferred tagged sentences from the first rules, matching the first inferred tagged sentences to second rules, generating second inferred tagged sentences from the second rules, matching the second inferred tagged sentences to third rules, generating third inferred ...
      Read Full Article
    7. Search Doesn't Work: Story 2

      Search Doesn't Work: Story 2: NLP means many things. To me it means Natural Language Processing. To others it means neurolinguistic programming. When I search for the bare term 'nlp' in Google, I just get results with the second sense - same for other search engines. If I search for 'William Cohen', the first result on Google is for my friend Prof. William Cohen and the second for the other chap. [...] So why don't I get this for NLP? Why no mixture of results? [... ] Word sense disambiguation is a core requirement for a search engine. The problem - the same ...
      Read Full Article
    8. Search Doesn't Work: Story 2

      Search Doesn't Work: Story 2: NLP means many things. To me it means Natural Language Processing. To others it means neurolinguistic programming. When I search for the bare term 'nlp' in Google, I just get results with the second sense - same for other search engines. If I search for 'William Cohen', the first result on Google is for my friend Prof. William Cohen and the second for the other chap. [...] So why don't I get this for NLP? Why no mixture of results? [... ] Word sense disambiguation is a core requirement for a search engine. The problem - the same ...
      Read Full Article
    9. Training machine learning by sequential conditional generalized iterative scaling

      A system and method facilitating training machine learning systems utilizing sequential conditional generalized iterative scaling is provided. The invention includes an expected value update component that modifies an expected value based, at least in part, upon a feature function of an input vector and an output value, a sum of lambda variable and a normalization variable. The invention further includes an error calculator that calculates an error based, at least in part, upon the expected value and an observed value. The invention also includes a parameter update component that modifies a trainable parameter based, at least in part, upon the ...
      Read Full Article
      Mentions: Scgis lamda
    10. System and method for automatically discovering a hierarchy of concepts from a corpus of documents

      The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then ...
      Read Full Article
      Mentions: Japan Yahoo Finland
    11. Method and system for naming a cluster of words and phrases

      The present invention provides a method, system and computer program for naming a cluster, or a hierarchy of clusters, of words and phrases that have been extracted from a set of documents. The invention takes these clusters as the input and generates appropriate labels for the clusters using a lexical database. Naming involves first finding out all possible word senses for all the words in the cluster, using the lexical database; and then augmenting each word sense with words that are semantically similar to that word sense to form respective definition vectors. Thereafter, word sense disambiguation is done to find ...
      Read Full Article
      Mentions: Finland Espoo Madison
    12. Method and system for encoding and accessing linguistic frequency data

      Linguistic frequency data is encoded by identifying a plurality of sets of character strings in a source text, where each set comprises at least a first and a second character string. Frequency data is obtained for each set and stored at a memory position in a first memory array that is assigned to each first character string. A pointer pointing to a position in the first memory array that has been assigned to the corresponding first character string of the respective set and which has stored the frequency data of the respective set, is stored in a second memory array ...
      Read Full Article
    13. Method for generating training data for medical text abbreviation and acronym normalization

      A method for electronically generating high-quality feature vectors that can be used in connection with electronic data processing systems implementing Maximum Entropy or other statistical models to accurately normalize abbreviations in text such as medical records. An abbreviation database and a training text database are provided. The abbreviation database includes abbreviation data representative of abbreviations and associated expansions to be normalized. The training text database includes a corpus of text having expansions of the abbreviations to be normalized. The corpus of text is processed as a function of the abbreviation data to identify the expansions in the corpus of text ...
      Read Full Article
    14. Differential LSI space-based probabilistic document classifier

      A computerized method for automatic document classification based on a combined use of the projection and the distance of the differential document vectors to the differential latent semantics index (DLSI) spaces. The method includes the setting up of a DLSI space-based classifier to be stored in computer storage and the use of such classifier by a computer to evaluate the possibility of a document belonging to a given cluster using a posteriori probability function and to classify the document in the cluster. The classifier is effective in operating on very large numbers of documents such as with document retrieval systems ...
      Read Full Article
    15. Word sense disambiguation

      Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense ...
      Read Full Article
    16. System and method for matching a textual input to a lexical knowledge based and for utilizing results of that match

      The present invention can be used in a natural language processing system to determine a relationship (such as similarity in meaning) between two textual segments. The relationship can be identified or determined based on logical graphs generated from the textual segments. A relationship between first and second logical graphs is determined. This is accomplished regardless of whether there is an exact match between the first and second logical graphs. In one embodiment, the first graph represents an input textual discourse unit. The second graph, in one embodiment, represents information in a lexical knowledge base (LKB). The input graph can be ...
      Read Full Article
    17. Meta search engine

      A computer implemented meta search engine and search method. In accordance with this method, a query is forwarded to one or more third party search engines, and the responses from the third party search engine or engines are parsed in order to extract information regarding the documents matching the query. The full text of the documents matching the query are downloaded, and the query terms in the documents are located. The text surrounding the query terms are extracted, and that text is displayed.
      Read Full Article
      Mentions: Reuters Yahoo Lycos
    18. Linguistic disambiguation system and method using string-based pattern training to learn to resolve ambiguity sites

      A linguistic disambiguation system and method creates a knowledge base by training on patterns in strings that contain ambiguity sites. The string patterns are described by a set of reduced regular expressions (RREs) or very reduced regular expressions (VRREs). The knowledge base utilizes the RREs or VRREs to resolve ambiguity based upon the strings in which the ambiguity occurs. The system is trained on a training set, such as a properly labeled corpus. Once trained, the system may then apply the knowledge base to raw input strings that contain ambiguity sites. The system uses the RRE- and VRRE-based knowledge base ...
      Read Full Article
    19. Information generation and retrieval method based on standardized format of sentence structure and semantic structure and system using the same

      The present invention relates to an information generation and retrieval apparatus based on a standardized format of sentence structure and semantic structure and a method thereof and a computer readable recording medium for recording a program for implementing the method. The method for generating and retrieving information for use in an apparatus for generating and retrieving information based on standardized formats of sentence structure and semantic structure, comprises a first step of transforming a natural language sentence (information and knowledge) described by a information provider to a conceptual graph depending on standardized formats of sentence structure and semantic structure and ...
      Read Full Article
      Mentions: Korea Yahoo Paris
    20. Method and system for finding a query-subset of events within a master-set of events

      A method and system for determining similarity between a first event set, the first event set including a first plurality of event types, and a second event set, the second event set including a second plurality of event types, is provided. Observed events are randomly mapped to a multidimensional vector-Q and query events are mapped to a multidimensional query vector-q. Comparison of the vectors for a predetermined similarity according to: .parallel.Q-q.parallel..ltoreq.SV, where SV=a predetermined similarity value determines similarity.
      Read Full Article
    21. Terminology translation for unaligned comparable corpora using category based translation probabilities

      The invention relates to a method and apparatus for generating translations of natural language terms from a first language to a second language. A plurality of terms are extracted from unaligned comparable corpora of the first and second languages. Comparable corpora are sets of documents in different languages that come from the same domain and have similar genre and content. Unaligned documents are not translations of one another and are not linked in any other way. By accessing monolingual thesauri of the first and second languages, a category is assigned to each extracted term. Then, category-to-category translation probabilities are estimated ...
      Read Full Article
    22. System and method for matching a textual input to a lexical knowledge base and for utilizing results of that match

      The present invention can be used in a natural language processing system to determine a relationship (such as similarity in meaning) between two textual segments. The relationship can be identified or determined based on logical graphs generated from the textual segments. A relationship between first and second logical graphs is determined. This is accomplished regardless of whether there is an exact match between the first and second logical graphs. In one embodiment, the first graph represents an input textual discourse unit. The second graph, in one embodiment, represents information in a lexical knowledge base (LKB). The input graph can be ...
      Read Full Article
    23. Techniques for controlling distribution of information from a secure domain

      Techniques for controlling distribution of information from a secure domain by automatically detecting outgoing messages which violate security policies corresponding to the secure domain. Semantic models are constructed for one or more message categories and for the outgoing messages. The semantic model of an outgoing message is compared with the semantic models of the message categories to determine a degree of similarity between the semantic models. The outgoing message is classified based on the degree of similarity obtained from the comparison. A determination is made, based on the classification of the outgoing message, if distribution of the outgoing message would ...
      Read Full Article
    24. System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models

      The disclosed system implements a novel method for personalized filtering of information and automated generation of user-specific recommendations. The system uses a statistical latent class model, also known as Probabilistic Latent Semantic Analysis, to integrate data including textual and other content descriptions of items to be searched, user profiles, demographic information, query logs of previous searches, and explicit user ratings of items. The disclosed system learns one or more statistical models based on available data. The learning may be reiterated once additional data is available. The statistical model, once learned, is utilized in various ways: to make predictions about item ...
      Read Full Article
    337-360 of 370 « 1 2 ... 12 13 14 15 16 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles