2013년 2월 14일 목요일

Language Database

    “… LSA is a theory and method for extracting and representing the contextual-usage meaning of words by statistical
    computations applied to a large corpus of text…. LSA can be construed in two ways: (1) simply as a practical 
    expedient for obtaining approximate estimates of the contextual usage substitutability of words in larger text segments, and of the kinds of-as yet incompletely specified- meaning similarities among words and text segments that such relations may reflect, or (2) as a model of the computational processes and representations underlying substantial 
portions of the acquisition and utilization of knowledge. We next sketch both views…..”

    “… WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet's structure makes it a useful tool for computational linguistics and natural language processing….”

WordNet Similarity
    “ … This is a Perl module that implements a variety of semantic similarity and relatedness measures based 
    on information found in the lexical database WordNet. In particular, it supports the measures of Resnik, 
    Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and 

MRC Psycholinguistic Database
    “… The MRC Psycholinguistic Database is a machine usable dictionary containing 150837 words with up to 26 linguistic and psycholinguistic attributes for each - psychological measures are recorded for only about 2500 words. The dictionary may be of use to researchers in psychology or linguistics to develop sets of experimental stimuli, or those in artificial intelligence and computer science who require psychological and linguistic descriptions of words….”

    “… Coh-Metrix calculates the coherence of texts on a wide range of measures. It replaces common 
    readability formulas by applying the latest in computational linguistics and linking this to the latest research in psycholinguistics….”

    “… The World Atlas of Language Structures (WALS) is a large database of structural (phonological, 
    grammatical, lexical) properties of languages gathered from descriptive materials (such as reference 
    grammars) by a team of 55 authors (many of them the leading authorities on the subject)….”

    “… CHILDES is the child language component of the TalkBank system. TalkBank is a system for sharing and studying conversational interactions….”

꼬꼬마 형태소분석기
    “…꼬꼬마 프로젝트는 서울대학교 IDS (Intelligent Data Systems) 연구실에서 자연어 처리를 하기 위한 다양한 모듈 및 자료를 구축하기 위한 과제로 크게 '형태소 분석기 및 자연어 처리 모듈 개발' 부분과 '세종 말뭉치 활용 시스템'으로 구분된다…”


