    1. World's #1 Maker of Translation Memory Software for Windows, Mac, and Linux ...

      World's #1 Maker of Translation Memory Software for Windows, Mac, and Linux ...PR-CANADA.net (press release)... Wordfast Aligner(TM) BETA -- Built-In TM administration module -- Machine Translation integration -- User-Defined segmentation -- MS Office Spellchecker ...and more »
    2. A Sequential Model for Discourse Segmentation

      Identifying discourse relations in a text is essential for various tasks in Natural Language Processing, such as automatic text summarization, question-answering, and dialogue generation. The first step of this process is segmenting a text into elementary units. In this paper, we present a novel model of discourse segmentation based on sequential data labeling. Namely, we use Conditional Random Fields to train a discourse segmenter on the RST Discourse Treebank, using a set of lexical and syntactic features. Our system is compared to other statistical and rule-based segmenters, including one based on Support Vector Machines. Experimental results indicate that our sequential ...
    3. Lexical Chains Using Distributional Measures of Concept Distance

      In practice, lexical chains are typically built using term reiteration or resource-based measures of semantic distance. The former approach misses out on a significant portion of the inherent semantic information in a text, while the latter suffers from the limitations of the linguistic resource it depends upon. In this paper, chains are constructed using the framework of distributional measures of concept distance, which combines the advantages of resource-based and distributional measures of semantic distance. These chains were evaluated by applying them to the task of text segmentation, where they performed as well as or better than state-of-the-art methods. Content Type ...
    4. The Influence of Collocation Segmentation and Top 10 Items to Keyword Assignment Performance

      Automatic document annotation from a controlled conceptual thesaurus is useful for establishing precise links between similar documents. This study presents a language independent document annotation system based on features derived from a novel collocation segmentation method. Using the multilingual conceptual thesaurus EuroVoc, we evaluate filtered and unfiltered version of the method, comparing it against other language independent methods based on single words and bigrams. Testing our new method against the manually tagged multilingual corpus Acquis Communautaire 3.0 (AC) using all descriptors found there, we attain improvements in keyword assignment precision from 18 to 29 percent and in F-measure from ...
    5. World's #1 Maker of Translation Memory Software for Windows, Mac, and Linux ...

      World's #1 Maker of Translation Memory Software for Windows, Mac, and Linux ...IT News Online... Wordfast Aligner(TM) BETA -- Built-In TM administration module -- Machine Translation integration -- User-Defined segmentation -- MS Office Spellchecker ...
    6. Melodic Grouping in Music Information Retrieval: New Methods and Applications

      We introduce the MIR task of segmenting melodies into phrases, summarise the musicological and psychological background to the task and review existing computational methods before presenting a new model, IDyOM, for melodic segmentation based on statistical learning and information-dynamic analysis. The performance of the model is compared to several existing algorithms in predicting the annotated phrase boundaries in a large corpus of folk music. The results indicate that four algorithms produce acceptable results: one of these is the IDyOM model which performs much better than naive statistical models and approaches the performance of the best-performing rule-based models. Further slight performance ...
      Mentions: Goldsmiths
    7. The hare and the tortoise: speed and accuracy in translation retrieval

      Abstract  This research looks at the effects of segment order and segmentation on translation retrieval performance for an experimental Japanese–English translation memory system. We implement a number of both bag-of-words and segment-order-sensitive string comparison methods, and test each over character-based and word-based indexing using n-grams of various orders. To evaluate accuracy, we propose an automatic method which identifies the target-language string(s) which would lead to the optimal translation for a given input, based on analysis of the held-out translation and the current contents of the translation memory. Our results indicate that character-based indexing is superior to word-based indexing ...
    8. Bayesian Transductive Markov Random Fields for Interactive Segmentation in Retinal Disorders

      In the realm of computer aided diagnosis (CAD) interactive segmentation schemes have been well received by physicians, where the combination of human and machine intelligence can provide improved segmentation efficacy with minimal expert intervention [1-3]. Transductive learning (TL) or semi-supervised learning (SSL) is a suitable framework for learning-based interactive segmentation given the scarce label problem. In this paper we present extended work on Bayesian transduction and regularized conditional mixtures for interactive segmentation [3]. We present a Markov random field model integrating a semi-parametric conditional mixture model within a Bayesian transductive learning and inference setting. The model allows efficient learning and ...
    9. Using automated content analysis for audio/video content consumption

      Audio/video (A/V) content is analyzed using speech and language analysis components. Metadata is automatically generated based upon the analysis. The metadata is used in generating user interface interaction components which allow a user to view subject matter in various segments of the A/V content and to interact with the A/V content based on the automatically generated metadata.
    10. Model-Guided Segmentation and Layout Labelling of Document Images Using a Hierarchical Conditional Random Field

      We present a model-guided segmentation and document layout extraction scheme based on hierarchical Conditional Random Fields (CRFs, hereafter). Common methods to classify a pixel of a document image into classes - text, background and image - are often noisy, and error-prone, often requiring post-processing through heuristic methods. The input to the system is a pixel-wise classification based on the output of a Fisher classifier based on the output of a set of Globally Matched Wavelet (GMW) Filters. The system extracts features which encode contextual information and spatial configurations of a given document image, and learns relations between these layout entities using hierarchical ...
    11. A Novel Role-Based Movie Scene Segmentation Method

      Semantic scene segmentation is a crucial step in movie video analysis and extensive research efforts have been devoted to this area. However, previous methods are heavily relying on video content itself, which are lack of objective evaluation criterion and necessary semantic link due to the semantic gap. In this paper, we propose a novel role-based approach for movie scene segmentation using script. Script is a text description of movie content that contains the scene structure information and related character names, which can be regarded as an objective evaluation criterion and useful external reference. The main novelty of our approach is ...
    12. Automatic Evaluation Of Machine Translation Via Word Choice And Word Order

      Abstract  We propose a novel metric ATEC for automatic MT evaluation based on explicit assessment of word choice and word order in an MT output in comparison to its reference translation(s), the two most fundamental factors in the construction of meaning for a sentence. The former is assessed by matching word forms at various linguistic levels, including surface form, stem, sound and sense, and further by weighing the informativeness of each word. The latter is quantified in term of the discordance of word position and word sequence between a translation candidate and its reference. In the evaluations using the ...
    13. AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

      Abstract  This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k each) with coreference links between pronouns (including elliptical subjects and clitics), full noun phrases (including proper nouns), and discourse segments. The coding scheme distinguishes between identity links, predicative relations, and discourse deixis. Inter-annotator agreement on the link types is 85–89% above chance, and we provide an analysis of the sources of disagreement. The resulting corpora make it possible to train and test learning-based algorithms for automatic coreference resolution, as well as to carry out bottom-up linguistic descriptions of coreference relations as they occur ...
    14. Segmentation of strings into structured records

      An system for segmenting strings into component parts for use with a database management system. A reference table of string records are segmented into multiple substrings corresponding to database attributes. The substrings within an attribute are analyzed to provide a state model that assumes a beginning, a middle and an ending token topology for that attribute. A null token takes into account an empty attribute component and copying of states allows for erroneous token insertions and misordering. Once the model is created from the clean data, the process breaks or parses an input record into a sequence of tokens. The ...
    15. Unsupervised Text Normalization Approach for Morphological Analysis of Blog Documents

      In this paper, we propose an algorithm for reducing the number of unknown words on blog documents by replacing peculiar expressions with formal expressions. Japanese blog documents contain many peculiar expressions regarded as unknown sequences by morphological analyzers. Reducing these unknown sequences improves the accuracy of morphological analysis for blog documents. Manual registration of peculiar expressions to the morphological dictionaries is a conventional solution, which is costly and requires specialized knowledge. In our algorithm, substitution candidates of peculiar expressions are automatically retrieved from formally written documents such as newspapers and stored as substitution rules. For the correct replacement, a substitution ...
      Mentions: Japan Saitama
    16. Text segmentation of spoken meeting transcripts

      Abstract  Text segmentation has played an important role in information retrieval as well as natural language processing. Current segmentation methods are well suited for written and structured texts making use of their distinctive macro-level structures; however text segmentation of transcribed multi-party conversation presents a different challenge given its ill-formed sentences and the lack of macro-level text units. This paper describes an algorithm suitable for segmenting spoken meeting transcripts combining semantically complex lexical relations with speech cue phrases to build lexical chains in determining topic boundaries. Content Type Journal ArticleDOI 10.1007/s10772-009-9048-2Authors Bernadette Sharp, Staffordshire University FCET Beaconside Stafford ST18 ...
    17. System and method for audio hot spotting

      Audio hot spotting is accomplished by specifying query criterion to include a non-lexical audio cue. The non-lexical audio cue can be, e.g., speech rate, laughter, applause, vocal effort, speaker change or any combination thereof. The query criterion is retrieved from an audio portion of a file. A segment of the file containing the query criterion can be provided to a user. The duration of the provided segment can be specified by the user along with the files to be searched. A list of detections of the query criterion within the file can also be provided to the user. Searches ...
      Mentions: Greece Oracle Palm
    18. Intended boundaries detection in topic change tracking for text segmentation

      Abstract  This paper presents a topical text segmentation method based on intended boundaries detection and compares it to a well known default boundaries detection method, c99. We compared the two methods by running them on two different corpora of French texts and results are evaluated by two different methods: one using a modified classic measure, the FScore, the other based on a manual evaluation one the Internet. Our results showed that algorithms that are close when automatically evaluated can be quite far when manually evaluated. Content Type Journal ArticleDOI 10.1007/s10772-009-9051-7Authors Alexandre Labadié, LIRMM 161 rue Ada 34392 Montpellier ...
      Mentions: France Lirmm
    19. Systems and methods for hybrid text summarization

      Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse ...
    20. Method and apparatus for efficient segmentation of compound words using probabilistic breakpoint traversal

      A method for segmenting a compound word in an unrestricted natural-language input is disclosed. The method comprises receiving a natural-language input consisting of a plurality of characters. Next, a set of probabilistic breakpoints based on a probabilistic breakpoint analysis is constructed in the natural-language input. A plurality of linkable components is identified by traversal of substrings of the natural-language input delimited by the set of probabilistic breakpoints. Finally, a segmented string consisting of a plurality of linkable components spanning the natural-language input is returned. The segmented string can be interpreted as a compound word.
      Mentions: German FIG NLP
    21. Multimodal News Story Segmentation

      In this paper, we describe a multi-modal approach to segmenting news video based on the perceived shift in content. We divide up a video document into logically coherent semantic units known as stories. We investigate the effectiveness of a number of multimedia features which serve as potential indicators of a story boundary. The results show an improvement of performance over current state of the art story segmenters. Content Type Book ChapterDOI 10.1007/978-81-8489-203-1_7Authors Gert-Jan Poulisse, Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan 200A Box 2402 B-3001 Heverlee BelgiumMarie-Francine Moens, Katholieke Universiteit Leuven Department of Computer Science Celestijnenlaan 200A ...
      Mentions: Heverlee Moens
