kannada phonetic transcription: nlp - DigitalXplore [PDF]

May 7, 2017 - The paper presents a way to transcript Kannada language using UTF-8 encoding to pronounce phonemes. ... Ka

34 downloads 40 Views 1MB Size

Recommend Stories


PHONETIC TRANSCRIPTION STANDARDS FOR EUROPEAN [PDF]
German ([x] before back vowels and [ч] before front vowels) are transcribed. For native speakers fine phonetic as well as allophonic contrasts are superfluous in a transcription due to their knowledge of the language. For non-native speakers fine ph

automatic phonetic transcription of non−prompted speech
You miss 100% of the shots you don’t take. Wayne Gretzky

Phonetic Transcription of Large Speech Corpora
You often feel tired, not because you've done too much, but because you've done too little of what sparks

Automatic Phonetic Transcription of Non-Prompted Speech
You have survived, EVERY SINGLE bad day so far. Anonymous

Automatic Phonetic Transcription of Large Speech Corpora
Love only grows by sharing. You can only have more for yourself by giving it away to others. Brian

crime story kannada paper pdf
Ego says, "Once everything falls into place, I'll feel peace." Spirit says "Find your peace, and then

Kannada Inscriptions
At the end of your life, you will never regret not having passed one more test, not winning one more

Review PDF NLP for Teachers
You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

Aditya Hrudayam In Kannada
Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

nlp practitioner
Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

Idea Transcript


KANNADA PHONETIC TRANSCRIPTION: NLP 1

NEESHALI R. NANDARGE, 2MALLAMMA V. REDDY, 3SUMAN GOUDA, 4GAYATRI PATIL 1,2,3,4

Department of Computer Science, Rani Channamma University, Vidyasangam, Belagavi-591156, India E-mail: 1 [email protected], [email protected], [email protected], [email protected]

Abstract - The world has opened up to learn natural languages, particularly in India. Working on natural languages has become crucial. India is a country where 1,652 languages were identified in 1961 & it is believed that 880 languages are still in use. India has 22 official languages one of them is Kannada and is rapidly used in the state of Karnataka [1].Nearly, 40 million Kannadigas and 50.8 million speakers have found [2]. Natural language processing is a way to communicate with intelligent systems using natural language techniques. This paper presents Kannada phonetic transcription, is a part of natural language processing. Phonetic transcription is a process of representing words, sentences or language by means of phonetic symbols. The paper presents a way to transcript Kannada language using UTF-8 encoding to pronounce phonemes. Index terms - Phoneme, Phonetic, Transcription, UTF-8

called Unicode is used, which is necessary for creating the text. Then to encode it, system uses UTF8. A Unicode transformation format (UTF) is an algorithm that maps from every Unicode code point to a unique byte sequence [7].Each UTF is reversible, thus every UTF supports lossless round tripping: mapping from any Unicode coded character sequence X to a sequence of bytes and back will produce X again. The system architecture is shown in Figure.1:

I. INTRODUCTION Natural language processing (NLP) is a way to communicate with intelligent systems using natural language techniques. Natural languages need processing for intelligent systems to understand human language and to work as per the instructions of human. The input and output of NLP can be: written text or speech. The concept of NLP can be implemented by dividing it into two parts: Natural Language Understanding (NLU) and Natural Language Generation (NLG) [3]. Natural Language Understanding attempts to understand the meaning behind the written text and then produce data with specific meaning. Natural Language Generation produces data that has been interpreted and analyzed. Generation needs tokenization of text. Tokenization is the task of chopping an input signal into parts, so that computer can process it. The tokens are converted into unique codes using standards like ASCII for English, so that the computer can interpret the tokens [4]. The paper presents Kannada Phonetic Transcription[13] where system is dealing with Kannada language. Kannada is rapidly used in the state of Karnataka in India. India has 22 official languages one of them is Kannada that belongs to the Southern Indic language. Kannada demonstrates the major features of Brahmi language which is derived from Indic language. Like other Southern Indic language, Kannada language has typically rounded features. Kannada is written horizontally from left to right. The basic set of symbols in Kannada consists of 35 consonants and 14 vowels [5]. The translation of Kannada language to machine understandable language, IISCI mechanism is used. IISCI stands for Indian Script Code for Information Interchange for Indian language. It is also an 8-bits code to represent Indian scripts. These codes are used for 10 Indian languagesAssamese, Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu [6]. To map the characters to these codes a standard

Figure 1: The System Architecture

II. TERMINOLOGIES TRANSCRIPTION

FOR

PHONETIC

The terminologies related to Kannada Phonetic Transcription is given below:  UTF- 8 compromises character encoding that can be as compact as ASCII  Phoneme deals with the smallest contrastive part of a word which may cause a change of meaning. For example, if you substitute the sound [aa] in the word [kaadu] for [kaa] there will be a change of meaning. The word becomes [aadu]. Both sounds are phonemes. 

Phonological awareness deals with the ability of a listener to recognize the sound structure of words.

Proceedings of 35th IRF International Conference, 07th May, 2017, Bengaluru, India 19

and

Kannada Phonetic Transcription: NLP



For example, in [alu] and , [aalu] [a] and are the phonemes that may lead to ambiguity due to misinterpretation of a listener. 

Phonemic awareness deals ability of a listener to hear, identify and manipulate phonemes based on syntactic structure. For example,







Kannada Vowels: There are 14 vowels in Kannada language. Except for and , every other vowel come in pairs and has a short and a long version. The vowel (RU) is no longer in use [9].Kannada vowels are:

There are two types of vowels (Swaras) depending on the time used to pronounce. They are, Hrasva Swara: An independent vowel which can be pronounced in single matra. They are-

[hennu] can be interpreted as

[hannu] by manipulating to . Phonemic transcription is a type of phonetic transcription which uses fewer phonetic symbols. For example, in the word [rushi] can be pronounced differently by native

. Anuswaras: Visarga: . Deerga Swara: An independent vowel which can be pronounced in two matras. They are-

speakers: as or . Phonetic spelling deals with confirming the spelling of a word by pronouncing each letter as a word. For example, for [maganaagi] you would say, for [mara], for [gana] for [vanar], for [gida]. Phonetic transcription deals with the visual representation of speech sounds. For example, the Kannada word is  written as . [8] Other types of phonetic transcription may use different symbols.



Kannada Consonants (Vyanjanas): Consonants are dependent on vowels to take an independent form. Kannada Consonants are:

Consonants can into Vargeeya and Avargeeya. Vargeeya Vyanjanas

be

divided



Phonetics deals with the sounds of human speech.



Phonology deals with the study of the systems of phonemes in particular languages Stress deals with the relative emphasis that may Avargeeya Vyanjanas be given to certain sounds or syllables in a word. For example: Kannada Consonantal Vowels are consonants written using extra strokes relating to the vowels added to them. Kannada Consonantal Vowels are shown in the Figure 2.



III. TYPES OF NOTATIONS Fig 2: Kannada Consonantal Vowels

Phonetic transcription needs representation of phonemes to process, the types of notations [10] that can be used as basic unit of natural language processing for phonetic transcription are given below:

When a dependent consonant combines with an independent vowel, Akshara is formed. Consonant + Vowel ---> Letter

Alphabetic Most phonetic transcription is based on the assumption that linguistic sounds are tokenized into discrete units that can be represented by symbols. Alphabets can be used as such symbols.  Kannada Alphabets The Kannada script

This rule combines all the Consonants (Vyanjanas) with the existing Vowels (Swaras) to form Letter (Akshara) for Kannada alphabet. Iconic In iconic phonetic notation, the shapes of the phonetic characters are designed so that they visually represent the position of articulators in the vocal tract.

is a phonemic abugida of forty-nine letters.

Proceedings of 35th IRF International Conference, 07th May, 2017, Bengaluru, India 20

Kannada Phonetic Transcription: NLP

Iconic phonetic notation spectrographically.

reproduces

speech Table.1 Example of Analphabetic notation

Here, the Articulator [13] means movable organ, as the tongue, lips, or uvula, the actio n of which is involved the production of speech sounds. The articulators are shown in Figure 3. CONCLUSION This paper presents detailed idea for generation of Kannada phonetics based on Kannada Phonetic transcription using tokenization technique to produce sound. IISCI is used for the translation of Kannada language to machine understandable language and UTF-8 is used for encoding the same. The future work focuses on generation of sound (audacity) using phonetically balanced dictionary by applying tokenization technique such as unigram, bigram, trigram and four-gram mapping for Kannada tokens such as alphabet and numbers.

Fig 3: The articulators This is unlike alphabetic notation, where the correspondence between character shape and articulator position is arbitrary. The notation is potentially more flexible than alphabetic notation in showing more shades of pronunciation. An example of iconic phonetic notation is the Visible Speech system, which is invented by Alexander Melville Bell [11] that utilized a set of phonetic symbols based on symbols for articulatory position.

REFERENCES [1] [2] [3] [4] [5] [6]

Analphabetic Analphabetic phonetic notation represents sounds by composite signs rather than by single letters or symbols. It uses long sequences of symbols to precisely describe the component features of an articulatory gesture (MacMahon(1996:842–844)). Analphabetic notation is more descriptive and is less practical for many purposes, so this type of notation is uncommon. Examples are shown Table 1. [12]

[7] [8] [9] [10] [11] [12] [13]

https://www.readmeindia.com/how-many-languages-in-india https://en.wikipedia.org/wiki/Kannada https://www.tutorialspoint.com/artificial_intelligence/artificia l_intelligence_natural_languag e_procssing.htm http://www.mind.ilstu.edu/curriculum/protothinker/natural_la nguage_processing.php http://www.ciil-lisindia.net/Kannada/Kan_script.html https://en.wikipedia.org/wiki/Indian_Script_Code_for_Inform ation_Interchange http://unicode.org/faq/utf_bom.html http://unicode.org/faq/utf_bom.html http://dictionary.tamilcube.com/alphabets/learn-kannadaalphabets.aspx http://wikivisually.com/wiki/Phonetic_notation https://en.wikipedia.org/wiki/Visible_Speech http://www.shabdkosh.com/kn/ http://www.ijera.com/papers/Vol7_issue1/Part4/M0701047780.pdf



Proceedings of 35th IRF International Conference, 07th May, 2017, Bengaluru, India 21

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.