テキストから「かな表記の語彙」を抽出する試み ―コーパスを利用して古典語彙を収集するために―

北村, 啓子; キタムラ, ケイコ; KITAMURA, Keiko

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

テキストから「かな表記の語彙」を抽出する試み ―コーパスを利用して古典語彙を収集するために―

https://doi.org/10.24619/00000603

名前 / ファイル	ライセンス	アクション
テキストから「かな表記の語彙」を抽出する試み ―コーパスを利用して古典語彙を収集するために― (1.1 MB)

Item type

紀要論文 / Departmental Bulletin Paper(1)

公開日

2014-11-10

タイトル

テキストから「かな表記の語彙」を抽出する試み ―コーパスを利用して古典語彙を収集するために―

タイトル

A Study for Abstracting a lexicon in KANA from Classical Japanese Literature Corpus

言語

jpn

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

departmental bulletin paper

ID登録

10.24619/00000603

ID登録タイプ

JaLC

著者

北村, 啓子

WEKO 16472

	北村, 啓子
ja-Kana	キタムラ, ケイコ

Search repository

KITAMURA, Keiko

抄録

内容記述タイプ

Abstract

内容記述

古文のテキスト処理をしようとすると、表記のゆらぎは切実な問題であり、これをカバーするシソーラスや異表記辞書、読み辞書、固有名詞辞書などの語彙に関する電子辞書の構築が待望されている。古文のテキストデータ化が研究者個人で活発に行われるようになり十年を数え（国文学資料館でも二十年近く前から実験されていた)、大規模にテキストデータベースとして構築するプロジェクトもいくつか興っている。これらの活動で作られてきた古文テキストは、古文を対象にした一種の大規模コーパスを形成している。

この生データであるコーパスから直接古典語彙を抽出するというアプローチは、トップダウンに作られた辞書にはない古文のテキスト処理に実際に役立つ語彙集の抽出が期待できる。

特に古文を扱う上では、「もののあはれ」の例を出すまでもなく「かな表記の語彙」に重要な語彙が多く存在する。ここでは、この「かな表記の語彙」を抽出することに狙いを定め、現在利用できるテキストを分析することにより、コーパスから語彙を抽出する手法を検討し、いかに抽出できるかを試みる。

　When processing texts of ancient writings, the inscription fluctuation is an acute problem, the construction of the electronic dictionary about the vocabulary such as thesaurus, a variant notations dictionary, a reading dictionary and a proper noun dictionary to cover it is expected. It has been over ten years since active conversion into classic text data was performed by researcher individuals (It was tested at the National Institute of Japanese Literature close to 20years ago.), some projects have been launched building a text database on a large scale. The classic text which has been made with these activity forms a kind of large –scale corpus for ancient writings.

The approach to extract classic vocabularies directly from the raw data corpus make it possible to expect extracting the vocabularies that helps practically to process classic texts which were not in dictionary made on a top-down approach.

In treating a classic in particular, needless to give an example of “Mono no aware”（もののあはれ）, a lot of important vocabularies exist in the “kana notation”. While aiming at extracting vocabularies of it by analyzing an available text at present, the method to extract vocabularies from a corpus was examined and tried how to do it.

書誌情報

国文学研究資料館紀要
en : The Bulletin of TheNational Instituｔe of Japanese Literature

号 25, p. 1-22, 発行日 1999-03-29

出版者

国文学研究資料館

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1880-2230

フォーマット

内容記述タイプ

Other

内容記述

pdf

戻る

views

See details

	Views

Versions

Ver.1

2023-05-15 15:34:21.509098

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

テキストから「かな表記の語彙」を抽出する試み ―コーパスを利用して古典語彙を収集するために―

× 北村, 啓子

× KITAMURA, Keiko

Versions

Share

Cite as

エクスポート