    1. AI in Web Advertising: Picking the Right Ad Ten Thousand Times a Second

      Online advertising is the primary economic force behind many Internet services ranging from major Web search engines to obscure blogs. A successful advertising campaign should be integral to the user experience and relevant to their information needs as well as economically worthwhile to the advertiser and the publisher. This talk will cover some of the methods and challenges of computational advertising, a new scientific discipline that studies advertising on the Internet. At first approximation, and ignoring the economic factors above, finding user-relevant ads can be reduced to conventional information retrieval. However, since both queries and ads are quite short, it ...
      Mentions: Sunnyvale Yahoo
    2. Entropy Guided Transformation Learning

      This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. ETL generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. ETL uses the information gain in order to select the feature combinations that provide good template sets. We describe the application of ETL to two language independent Text Mining preprocessing tasks: part-of-speech tagging and phrase chunking. We also report our findings on one language independent Information Extraction task: named entity recognition. Overall, we successfully apply it to six different languages: Dutch, English, German, Hindi, Portuguese and ...
    3. A Novel Method of Automobiles’ Chinese Nickname Recognition

      Nowadays, we have noticed that the free writing style becomes more and more popular. People tend to use nicknames to replace the original names. However, the traditional named entity recognition does not perform well on the nickname recognition problem. Thus, we chose the automobile domain and accomplished a whole process of Chinese automobiles’ nickname recognition. This paper discusses a new method to tackle the problem of automobile’s nickname recognition in Chinese text. First we have given the nicknames a typical definition. Then we have used methods of machine learning to acquire the probabilities of transition and emission based on ...
    4. Improving the Performance of a NER System by Post-processing, Context Patterns and Voting

      This paper reports about the development of a Named Entity Recognition (NER) system in Bengali by combining the outputs of the two classifiers, namely Conditional Random Field (CRF) and Support Vector Machine (SVM). Lexical context patterns, which are generated from an unlabeled corpus of 10 million wordforms in an unsupervised way, have been used as the features of the classifiers in order to improve their performance. We have post-processed the models by considering the second best tag of CRF and class splitting technique of SVM in order to improve the performance. Finally, the classifiers are combined together into a final ...
    5. A Simple and Efficient Model Pruning Method for Conditional Random Fields

      Conditional random fields (CRFs) have been quite successful in various machine learning tasks. However, as larger and larger data become acceptable for the current computational machines, trained CRFs Models for a real application quickly inflate. Recently, researchers often have to use models with tens of millions features. This paper considers pruning an existing CRFs model for storage reduction and decoding speedup. We propose a simple but efficient rank metric for feature group rather than features that previous work usually focus on. A series of experiments in two typical labeling tasks, word segmentation and named entity recognition for Chinese, are carried ...
    6. A Supervised Machine Learning Approach to Toponym Disambiguation

      This chapter presents a toponym disambiguation approach based on supervised machine learning. The proposed approach uses a simple hierarchical geographic relationship model to describe geographic entities and geographic relationships among them. The disambiguation procedure begins with the identification of toponyms in documents by applying and extending the state-of-the-art named entity recognition technologies and then performs disambiguation as a supervised classification processes over a feature space of geographic relationships. A geographic knowledge base is modeled and constructed to support the whole disambiguation procedure. System performance is evaluated on a document collection consisting of 15,194 local Australian news articles. The experiment ...
    7. Formal Grammar for Hispanic Named Entities Analysis

      A task that has been widely studied in the field of natural language processing is the Named Entity Recognition (NER). A great number of approaches have been developed to deal with the identification and classification of named entity strings in specific- and open-domains. Nevertheless, external modules have to be incorporated into many of the NER systems in order to solve the interpretation problems derived from proper nouns. In this article our focus will be on the study of ambiguity in Hispanic Nominal Sequences which constitution assumes three main problems: (1) the association of given names and/or surnames; (2) the ...
      Mentions: Mexico
    8. Extraction of CYP Chemical Interactions from Biomedical Literature Using Natural Language Processing Methods.

      Related Articles Extraction of CYP Chemical Interactions from Biomedical Literature Using Natural Language Processing Methods. J Chem Inf Model. 2009 Jan 21; Authors: Jiao D, Wild DJ This paper proposes a system that automatically extracts CYP protein and chemical interactions from journal article abstracts, using natural language processing (NLP) and text mining methods. In our system, we employ a maximum entropy based learning method, using results from syntactic, semantic, and lexical analysis of texts. We first present our system architecture and then discuss the data set for training our machine learning based models and the methods in building components in ...
      Mentions: Indiana Bloomington
    9. Adaptive and scalable method for resolving natural language ambiguities

      A method for resolving ambiguities in natural language by organizing the task into multiple iterations of analysis done in successive levels of depth. The processing is adaptive to the users' need for accuracy and efficiency. At each level of processing the most accurate disambiguation is made based on the available information. As more analysis is done, additional knowledge is incorporated in a systematic manner to improve disambiguation accuracy. Associated with each level of processing is a measure of confidence, used to gauge the confidence of a process in its disambiguation accuracy. An overall confidence measure is also used to reflect ...
    10. Challenges of Semantic Knowledge Management

      Forrester (Moore 2007) estimate that more than 80% of all corporate information is unstructured. Knowledge workers are increasingly overwhelmed by information from a bewildering array of information sources: emails, intranets, the web, etc. and yet still find it hard to access the specific information required for the task at hand. This implies that knowledge worker productivity is reduced and that organisations may be making decisions on the basis of incomplete knowledge. Furthermore, an inability to access key information can lead to compliance failure. As we have described in this volume, semantic technology is helping address these issues by associating unstructured ...
      Mentions: John Davies
    11. Yowie: Information Extraction in a Service Enabled World

      Service Oriented Computing is a potential enabler for popular applications of Named Entity Recognition and Information Extraction. In this demo we show an example of such an application and discuss how Service Oriented Architecture (SOA) makes the application fully flexible and easily extensible. The application brings SOA close to the end-user and gives possibilities hardly possible with other approaches. Content Type Book ChapterDOI 10.1007/978-3-540-89652-4_68Authors Marek Kowalkiewicz, SAP Research 133 Mary Street Brisbane AustraliaKonrad Jünemann, SAP Research 133 Mary Street Brisbane Australia Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5364/2008 ...
      Mentions: Brisbane
    12. Named Entity Recognition for Improving Retrieval and Translation of Chinese Documents

      This paper focuses on named entity recognition corresponding to people, organizations, locations, etc. in Chinese scientific documents. Two key benefits are shown by performing NER: (i) improved quality of semantic retrieval, and (ii) improvement in subsequent machine translation. Experiments using the Semantex platform for information extraction illustrate and quantify the two benefits outlined. Content Type Book ChapterDOI 10.1007/978-3-540-89533-6_56Authors Rohini K. Srihari, State University of New York at Buffalo Buffalo, NY USAErik Peterson, Janya Inc. Buffalo, NY USA Book Series Lecture Notes in Computer ScienceOnline ISSN 1611-3349Print ISSN 0302-9743 Book Series Volume Volume 5362/2008 Book Digital Libraries: Universal ...
    13. Systems, methods and computer products for name disambiguation by using private/global directories, and communication contexts

      TRADEMARKSIBM.RTM. is a registered trademark of International Business Machines Corporation, Armonk, N.Y., U.S.A. Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or othercompanies.BACKGROUND OF THE INVENTION1. Field of the InventionThis invention relates to name disambiguation, and particularly to systems, methods and computer products for name disambiguation by using private/global directories and communication con
    14. Named Entity Recognition in Biomedical Literature: A Comparison of Support Vector Machines and Conditional Random Fields

      In this paper, we propose two named entity recognition systems for biomedical literature, System1 using support vector machines and System2 using conditional random fields. Through employing several sets of experiments, we make a comprehensive comparison between these two systems. The final results reflect that System2 can achieve higher accuracy than System1, because System2 can catch more essential properties by handling the richer set of features, i.e., adding not only the individual and dynamic features as System1 does but also the combinational features, which can improve the performance further. Furthermore, with carefully designed features, System2 can recognize named entities in ...
    15. Spanish Nested Named Entity Recognition Using a Syntax-Dependent Tree Traversal-Based Strategy

      In this paper, we address the problem of nested Named Entity Recognition (NER) for Spanish. Phrase syntactic structure is exploited to generate a tree representation for the set of phrases that are candidate to be named entities. The classification of all candidate phrases is treated as a single problem, for which a globally optimal solution is approximated using a strategy based on the postorder traversal of that representation. Experimental results, obtained in the framework of SemEval 2007 Task 9 NER subtask, demonstrate the validity of our approach. Content Type Book ChapterDOI 10.1007/978-3-540-88636-5_13Authors Yunior Ramírez-Cruz, Universidad de Oriente Center ...
    16. Mono-and Crosslingual Retrieval Experiments with Spatial Restrictions at GeoCLEF 2007

      The participation of the University of Hildesheim focused on the monolingual German and English tasks of GeoCLEF 2007. Based on the results of GeoCLEF 2005 and GeoCLEF 2006, the weighting and expansion of geographic Named Entities (NE) and Blind Relevance Feedback (BRF) were combined and an improved model for German Named Entity Recognition (NER) was evaluated. Post submission experiments are also presented.. A topic analysis revealed a wide spread of MAP values with high standard deviation values. Therefore further development will lie in the field of topic-adaptive systems. Content Type Book ChapterDOI 10.1007/978-3-540-85760-0_109Authors Ralph Kölle, University of Hildesheim ...
    17. QA@L2F, First Steps at Qa@clef

      This paper presents QA@L2F, the question-answering system developed at L2F, INESC-ID. QA@L2F follows different strategies according with the question type, and relies strongly on named entity recognition and on the pre-detection of linguistic patterns. Each question type is mapped into a single strategy; however, if no answer is found, the system proceeds and tries to find an answer using one of the other strategies. Content Type Book ChapterDOI 10.1007/978-3-540-85760-0_45Authors Ana Mendes, L2F/INESC-ID Lisboa, Email: qa-clef@l2f.inesc-id.pt Rua Alves Redol, 9 1000-029 Lisboa PortugalLuísa Coheur, L2F/INESC-ID Lisboa, Email: qa-clef@l2f.inesc-id.pt Rua ...
    18. What Happened to Esfinge in 2007?

      Esfinge is a general domain Portuguese question answering system which uses the information available on the Web as an additional resource when searching for answers. Other external resources and tools used are a broad coverage parser, a morphological analyser, a named entity recognizer and a Web-based database of word co-occurrences. In this fourth participation in CLEF, in addition to the new challenges posed by the organization (topics and anaphors in questions and the use of Wikipedia to search and support answers), we experimented with a multiple question and multiple answer approach in QA. Content Type Book ChapterDOI 10.1007/978-3-540-85760-0_31Authors ...
      Mentions: Esfinge
    19. Bengali and Hindi to English CLIR Evaluation

      This paper presents a cross-language retrieval system for the retrieval of English documents in response to queries in Bengali and Hindi, as part of our participation in CLEF 2007 Ad-hoc bilingual track. We followed the dictionary-based Machine Translation approach to generate the equivalent English query out of Indian language topics. Our main challenge was to work with a limited coverage dictionary (of coverage ~ 20%) that was available for Hindi-English, and virtually non-existent dictionary for Bengali-English. So we depended mostly on a phonetic transliteration system to overcome this. The CLEF results point to the need for a rich bilingual lexicon, a ...
      Mentions: India IIT Kharagpur
    20. Extracting and Querying Relations in Scientific Papers

      High-precision linguistic and semantic analysis of scientific texts is an emerging research area. We describe methods and an application for extracting interesting factual relations from scientific texts in computational linguistics and language technology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting ‘quriples’ are stored in a database from where they can be retrieved by relation-based search. The query interface is ...
