Methods are described for performing classification (categorization) of text documents written in various languages. Language-independent semantic structures are constructed before classifying documents. These structures reflect lexical, morphological, syntactic, and semantic properties of documents. The methods suggested are able to perform cross-language text classification which is based on document properties reflecting their meaning. The methods are applicable to genre classification, topic detection, news analysis, authorship analysis, etc.
Read Full Article