Abstract
BANGLA ROOT WORD CORPUS
Kazi Wohiduzzaman* and Sabir Ismail
ABSTRACT
Bangla is a very rich language, two hundred and thirty million world populations speak in Bangla. Hence, the computerization of this language is the inevitable need today. Unfortunately, a very few research work have been done in this field due to resource scarcity. The effort of Bangla computerization could not reach up to satisfactory level compared to other languages. We are going to describe the efficient algorithm of finding Bangla root word. This root word corpus store valid root word with its inflectional forms. We used Bangla word resource from Bangla Newspaper, Blogs etc. In our proposed Algorithm firstly collected 200000 words, filtered some incorrect words form and removed duplicate words from the list. Finally, stored 60000 unique Bangla word in a word list. This paper represents an efficient technique of find out real root of a word. In natural language processing, it's very important to find the real root of a word for information retrieval, document categorization etc.
[Full Text Article] [Download Certificate]