1. Articles in category: NER

    313-336 of 415 « 1 2 ... 11 12 13 14 15 16 17 18 »
    1. Highly Multilingual News Analysis Applications

      The publicly accessible Europe Media Monitor (EMM) family of applications (http://press.jrc.it/overview.html) gather and analyse an average of 80,000 to 100,000 online news articles per day in up to 43 languages. Through the extraction of meta-information in these articles, they provide an aggregated view of the news; they allow to monitor trends and to navigate the news over time and even across languages. EMM-NewsExplorer additionally collects historical information about persons and organisations from the multilingual news, generates co-occurrence and quotation-based social networks, and more. All EMM applications were entirely developed at, and are being ...
      Read Full Article
    2. An Iterative Model for Discovering Person Coreferences Using Name Frequency Estimates

      In this paper we present an approach to person coreference in a large collection of news, based on two main hypothesis: first, coreference is an iterative process, where the easy cases are addressed first and are then made available as an incrementally enriched resource for resolving more difficult cases. Second, at each iteration coreference among two person names is established according to a probabilistic model, where a number of features (e.g. frequency of first and last names) are taken into account. The approach does not assume any prior knowledge about persons mentioned in the collection and requires basic linguistic ...
      Read Full Article
    3. Building a Morphosyntactic Lexicon and a Pre-syntactic Processing Chain for Polish

      This paper introduces a new set of tools and resources for Polish which cover all the steps required to transform a raw unrestricted text into a reasonable input for a parser. This includes (1) a large-coverage morphological lexicon, developed thanks to the IPI PAN corpus as well as a lexical acquisition techique, and (2) multiple tools for spelling correction, segmentation, tokenization and named entity recognition. This processing chain is also able to deal with the XCES format both as input and output, hence allowing to improve XCES corpora such as the IPI PAN corpus itself. This allows us to give ...
      Read Full Article
    4. Method and system for displaying time-series data and correlated events derived from text mining

      FIELD OF THE INVENTIONThe present invention generally relates to a method and system for displaying time-series data and correlated events. More specifically, the present invention relates to a method and system for displaying time-series data and correlated eventsderived from text mining.BACKGROUND OF THE INVENTIONNumerical serial data, such as the prices of stocks on any given date, is commonly presented graphically on a chart. For example, financial serial data is commonly presented in the fo
      Read Full Article
    5. Cascaded classifiers for confidence-based chemical named entity recognition.

      Related Articles Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinformatics. 2008;9 Suppl 11:S4 Authors: Corbett P, Copestake A BACKGROUND: Chemical named entities represent an important facet of biomedical text. RESULTS: We have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjustable threshold allows the system to be tuned to high precision or high recall. At a threshold set for balanced precision and recall, we were able to extract named entities at an F ...
      Read Full Article
    6. Accelerating the annotation of sparse named entities by dynamic sentence selection.

      Related Articles Accelerating the annotation of sparse named entities by dynamic sentence selection. BMC Bioinformatics. 2008;9 Suppl 11:S8 Authors: Tsuruoka Y, Tsujii J, Ananiadou S BACKGROUND: Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems. RESULTS: This paper presents an active learning-like framework for reducing the human ...
      Read Full Article
    7. Knowledge Discovery via Machine Learning for Neurodegenerative Disease Researchers

      Ever-increasing size of the biomedical literature makes more precise information retrieval and tapping into implicit knowledge in scientific literature a necessity. In this chapter, first, three new variants of the expectation–maximization (EM) method for semisupervised document classification (Machine Learning 39:103–134, 2000) are introduced to refine biomedical literature meta-searches. The retrieval performance of a multi-mixture per class EM variant with Agglomerative Information Bottleneck clustering (Slonim and Tishby (1999) Agglomerative information bottleneck. In Proceedings of NIPS-12) using Davies–Bouldin cluster validity index (IEEE Transactions on Pattern Analysis and Machine Intelligence 1:224–227, 1979), rivaled the state-of-the-art transductive support ...
      Read Full Article
    8. Knowledge Discovery via Machine Learning for Neurodegenerative Disease Researchers.

      Related Articles Knowledge Discovery via Machine Learning for Neurodegenerative Disease Researchers. Methods Mol Biol. 2009;569:173-96 Authors: Ozyurt IB, Brown GG Ever-increasing size of the biomedical literature makes more precise information retrieval and tapping into implicit knowledge in scientific literature a necessity. In this chapter, first, three new variants of the expectation-maximization (EM) method for semisupervised document classification (Machine Learning 39:103-134, 2000) are introduced to refine biomedical literature meta-searches. The retrieval performance of a multi-mixture per class EM variant with Agglomerative Information Bottleneck clustering (Slonim and Tishby (1999) Agglomerative information bottleneck. In Proceedings of NIPS-12) using Davies-Bouldin cluster ...
      Read Full Article
    9. System, and method for interactive browsing

      FIELD OF THE INVENTIONThe present invention generally relates to information technology, and more particularly, to a system and method for interactively browsing information.DESCRIPTION OF RELATED ARTAs more and more electronic documents are stored in computer, it becomes important how to manage the documents and get information effectively.At present, there are primarily three ways to acquire information. The first one is taxonomy. Taxonomy typically organizes a large scale of documents into a
      Read Full Article
      Mentions: N. sub
    10. Identification of Chemical Entities in Patent Documents

      Biomedical literature is an important source of information for chemical compounds. However, different representations and nomenclatures for chemical entities exist, which makes the reference of chemical entities ambiguous. Many systems already exist for gene and protein entity recognition, however very few exist for chemical entities. The main reason for this is the lack of corpus to train named entity recognition systems and perform evaluation. In this paper we present a chemical entity recognizer that uses a machine learning approach based on conditional random fields (CRF) and compare the performance with dictionary-based approaches using several terminological resources. For the training and ...
      Read Full Article
    11. A Comparison of Performance of Sequential Learning Algorithms on the Task of Named Entity Recognition for Indian Languages

      We have taken up the issue of named entity recognition of Indian languages by presenting a comparative study of two sequential learning algorithms viz. Conditional Random Fields (CRF) and Support Vector Machine (SVM). Though we only have results for Hindi, the features used are language independent, and hence the same procedure could be applied to tag the named entities in other Indian languages like Telgu, Bengali, Marathi etc. that have same number of vowels and consonants. We have used CRF++ for implementing CRF algorithm and Yamcha for implementing SVM algorithm. The results show a superiority of CRF over SVM and ...
      Read Full Article
      Mentions: India Indian Marathi
    12. AI in Web Advertising: Picking the Right Ad Ten Thousand Times a Second

      Online advertising is the primary economic force behind many Internet services ranging from major Web search engines to obscure blogs. A successful advertising campaign should be integral to the user experience and relevant to their information needs as well as economically worthwhile to the advertiser and the publisher. This talk will cover some of the methods and challenges of computational advertising, a new scientific discipline that studies advertising on the Internet. At first approximation, and ignoring the economic factors above, finding user-relevant ads can be reduced to conventional information retrieval. However, since both queries and ads are quite short, it ...
      Read Full Article
      Mentions: Sunnyvale Yahoo
    13. Entropy Guided Transformation Learning

      This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. ETL generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. ETL uses the information gain in order to select the feature combinations that provide good template sets. We describe the application of ETL to two language independent Text Mining preprocessing tasks: part-of-speech tagging and phrase chunking. We also report our findings on one language independent Information Extraction task: named entity recognition. Overall, we successfully apply it to six different languages: Dutch, English, German, Hindi, Portuguese and ...
      Read Full Article
    14. A Novel Method of Automobiles’ Chinese Nickname Recognition

      Nowadays, we have noticed that the free writing style becomes more and more popular. People tend to use nicknames to replace the original names. However, the traditional named entity recognition does not perform well on the nickname recognition problem. Thus, we chose the automobile domain and accomplished a whole process of Chinese automobiles’ nickname recognition. This paper discusses a new method to tackle the problem of automobile’s nickname recognition in Chinese text. First we have given the nicknames a typical definition. Then we have used methods of machine learning to acquire the probabilities of transition and emission based on ...
      Read Full Article
    15. Improving the Performance of a NER System by Post-processing, Context Patterns and Voting

      This paper reports about the development of a Named Entity Recognition (NER) system in Bengali by combining the outputs of the two classifiers, namely Conditional Random Field (CRF) and Support Vector Machine (SVM). Lexical context patterns, which are generated from an unlabeled corpus of 10 million wordforms in an unsupervised way, have been used as the features of the classifiers in order to improve their performance. We have post-processed the models by considering the second best tag of CRF and class splitting technique of SVM in order to improve the performance. Finally, the classifiers are combined together into a final ...
      Read Full Article
    16. A Simple and Efficient Model Pruning Method for Conditional Random Fields

      Conditional random fields (CRFs) have been quite successful in various machine learning tasks. However, as larger and larger data become acceptable for the current computational machines, trained CRFs Models for a real application quickly inflate. Recently, researchers often have to use models with tens of millions features. This paper considers pruning an existing CRFs model for storage reduction and decoding speedup. We propose a simple but efficient rank metric for feature group rather than features that previous work usually focus on. A series of experiments in two typical labeling tasks, word segmentation and named entity recognition for Chinese, are carried ...
      Read Full Article
    17. A Supervised Machine Learning Approach to Toponym Disambiguation

      This chapter presents a toponym disambiguation approach based on supervised machine learning. The proposed approach uses a simple hierarchical geographic relationship model to describe geographic entities and geographic relationships among them. The disambiguation procedure begins with the identification of toponyms in documents by applying and extending the state-of-the-art named entity recognition technologies and then performs disambiguation as a supervised classification processes over a feature space of geographic relationships. A geographic knowledge base is modeled and constructed to support the whole disambiguation procedure. System performance is evaluated on a document collection consisting of 15,194 local Australian news articles. The experiment ...
      Read Full Article
    313-336 of 415 « 1 2 ... 11 12 13 14 15 16 17 18 »
  1. Categories

    1. Default:

      Discourse, Entailment, Machine Translation, NER, Parsing, Segmentation, Semantic, Sentiment, Summarization, WSD
  2. Popular Articles