Machine Learning-based Sentiment Analysis of Automatic Indonesian [PDF]

AbstractâSentiment analysis is the automatic classification of the overall opinion conveyed by a text towards its subj

2 downloads 20 Views 132KB Size

Report

Download PDF

PNG Network

Recommend Stories

Deeper Sentiment Analysis Using Machine Translation Technology

You have survived, EVERY SINGLE bad day so far. Anonymous

Sentiment Analysis using Machine Learning Algorithms

Ask yourself: Do I hold back from asking the big questions? The hard questions? If so, what scares me?

Sentiment Analysis

Before you speak, let your words pass through three gates: Is it true? Is it necessary? Is it kind?

Sentiment Analysis

Don't watch the clock, do what it does. Keep Going. Sam Levenson

Sentiment Analysis

Be grateful for whoever comes, because each has been sent as a guide from beyond. Rumi

Sentiment Analysis of Movie Reviews Using Machine Learning Algorithms

Keep your face always toward the sunshine - and shadows will fall behind you. Walt Whitman

Fully Automatic Espresso Machine

Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

Automatic Linking machine

Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Automatic pouches making machine

So many books, so little time. Frank Zappa

Automatic washing machine

Happiness doesn't result from what we get, but from what we give. Ben Carson

Idea Transcript

Machine Learning-based Sentiment Analysis of Automatic Indonesian Translations of English Movie Reviews Franky

Ruli Manurung

Faculty of Computer Science University of Indonesia [email protected]

Faculty of Computer Science University of Indonesia [email protected]

Abstract—Sentiment analysis is the automatic classification of the overall opinion conveyed by a text towards its subject matter. This paper discusses an experiment in the sentiment analysis of of a collection of movie reviews that have been automatically translated to Indonesian. Following [1], we employ three well known classification techniques: naive bayes, maximum entropy, and support vector machines, employing unigram presence and frequency values as the features. The translation is achieved through machine translation and simple word substitutions based on a bilingual dictionary constructed from various online resources. Analysis of the Indonesian translations yielded an accuracy of up to 78.82%, still short of the accuracy for the English documents (80.09%), but satisfactorily high given the simple translation approach.

I. I NTRODUCTION The task of sentiment analysis, i.e. the automatic analysis and classification of the overall opinion conveyed by a text towards its subject matter, has received much interest in the last few years [2]. Applications range from aggregating consumer feedback on commercial products, measuring public opinion on political issues, or filtering inflammatory discussions on mailing lists and forums. This paper describes an experiment on the sentiment analysis of a collection of movie reviews that have been automatically translated to Indonesian. Since the conveying of sentiment and opinion through human language is culturaldependant [3], we seek to explore how sentiment analysis methods perform across languages, specifically Indonesian. In Sections II and III we briefly discuss previous work on sentiment analysis, particularly supervised machine learning approaches, before outlining the design of our experiment in Section IV. Subsequently, we present the experiment results and analysis in Section V. II. P REVIOUS WORK There have been several different approaches to sentiment analysis. Turney [4] performs sentiment analysis at the phrase level by first constructing a lexicon that measures the similarity between adjectives and adverbs with the words excellent and poor using pointwise mutual information (PMI). The analysis

was done on documents in various domains, i.e. automotive, banking, films, and travel, with accuracy ranging from 66% to 84%. Pang et al. [1] classify movie reviews at the document level as having positive or negative sentiment using supervised learning techniques. A set of movie reviews that have previously been determined to be either positive or negative is used as training data for several well-known machine learning algorithms. The features used are unigrams, bigrams, and other information such as part of speech. The accuracy obtained ranged from 72% to 83%. Recent works use more sophisticated methods to analyse sentiment information such as syntactic relationships and document structure, e.g. the work of Kaji and Kitsuregawa [5], which builds a collection of polar sentences from a massive collection of Japanese webpages, with the goal of developing a sentiment lexicon. III. S UPERVISED LEARNING OF SENTIMENT In our work, we follow the approach taken by Pang et al. in using supervised machine learning techniques, namely naive Bayes, maximum entropy classification, and support vector machines, to analyse a corpus that has previously been annotated with sentiment information [1]. The idea of sentiment classification is to assign, to document d, the correct sentiment s∗ = argmaxs P (s|d), where s ∈ {pos, neg}. The probability distribution P (s|d) is obtained from a training corpus. To implement these supervised machine learning methods, the movie reviews must be represented as a collection of features. Let f1 , . . . , fm be a predefined set of m features that can belong to an article, and ni (d) be the number of times fi appears in document d. Subsequently, d can be represented as the vector d = (n1 (d), n2 (d), . . . , nm (d)). Theoretically, any aspect of a document can be used as a possible feature, but one simplifying approach that is common in corpus linguistics is to treat these reviews as “bags of

words”, i.e. the features used are the list of unique words, or unigrams, appearing in a corpus. This is the approach we use. A. Naive Bayes This is a relatively simple yet effective supervised learning method that computes the conditional probability of a particular classification given a conjunction of features by making the simplifying assumption that the values of fi for i = 1 . . . m are conditionally independent given the correct classification of d [10]. Bayes’ rule states that P (s|d) =

P (s)P (d|s) . P (d)

Furthermore, P (d) can be omitted if we are simply computing s∗ . Using the above conditional independence assumption, naive Bayes classification makes the following approximation: m Y P (d|s) ≈ P (fi |s). i=1

A slight variation is multinomial Naive Bayes, where different occurrences of the same word in a document are treated as separate events, e.g. m Y P (d|s) ≈ P (fi |s)ni (d) [1]. i=1

The P (fi |s) probabilities are obtained by counting the frequency of occurrences of the unigrams within a document. Although a very simple approach, with assumptions that disregard word order, syntax, and discourse coherence, naive Bayes has been shown to be effective in similar applications such as text classification [6]. B. Maximum Entropy One of the main deficiencies of the naive Bayes approach is the simplifying conditional independence assumption. An alternative method, maximum entropy classification, makes no such assumption. Maximum entropy models sentiment classification as follows: n Y F (s,d) p∗ (s, d) = π αj j [7] j=1 ∗

where p (s, d) states the probability of document d having sentiment s given a maximum entropy probability distribution. Fj (s, d) is a feature function for fj given document d and sentiment s, defined as follows: 1, if fj appears in d with sentiment s Fj (s, d) = 0, otherwise The αj parameters must be set so as to maximize the entropy of the probability distribution subject to the constraints imposed by the value of the Fj feature functions observed from the training data. These parameters are trained using the Generalized Iterative Scaling (GIS) algorithm [7]. The underlying philosophy is to choose the model that makes the fewest assumptions about the data whilst still remaining consistent with it.

C. Support Vector Machines The underlying idea of support vector machines is to compute a hyperplane that linearly separates sample points into two based on their classification, in this case the sentiment values of pos and neg. Given the actual training data, SVMs project them to a high dimensional space in which they are linearly separable [8]. The computed hyperplane is that which maximizes its margin between the data points. This separation can be relaxed with the use of ‘soft’ margins to allow for some amount of outliers in the data. Sentiment analysis is modelled as a binary classification where positive sentiment is represented as +1 and negative sentiment as -1. IV. D ESIGN : I NDONESIAN SENTIMENT To our knowledge, there has been no prior work in experimenting with sentiment analysis of Indonesian documents. Given that the conveying of sentiment and opinion is culturaldependant [3], it is certainly an open research question whether methods proved to work in a particular language perform equally well in another. Unfortunately, the most prevalent approach adopted in sentiment analysis, i.e. supervised learning, requires a significant knowledge source in the form of a corpus of known positive and negative sentiment articles. Due to the lack of readily available content online, our initial exploration to construct such a resource for Indonesian suggested that it requires considerable time and effort which we were not able to afford. Thus, our approach is to perform an initial experiment by automatically translating an existing reference corpus into Indonesian and comparing the results between the sentiment analysis in the original language with the results obtained from the Indonesian translations. Essentially, we replicate the experiment reported in [1], but augment it by performing an analysis on automatic Indonesian translations, and by experimenting with several other variables. Having first created several versions of Indonesian translations (Section IV-A), we prepare for supervised learning with feature extraction (Section IV-B), and use four different classification methods presented in Sections III-A to III-C: naive Bayes, multionomial naive Bayes, maximum entropy classification, and support vector machines. A. Translating sentiment corpora Our reference corpus consists of user-submitted movie reviews from the Internet Movie Database (www.imdb.com), which have been cleaned up (removed of HTML tags, tokenised) and determined to be of positive or negative sentiment based on the existence of explicit markings, e.g. ‘***1/2’, ‘7/10’, etc. For more details, see polarity dataset v0.9 from www.cs.cornell.edu/people/pabo/movie-review-data. In total, there are 700 positive and 700 negative reviews in the corpus. This reference corpus will be referred to as the E NGLISH corpus. Following [1], we attempt to account for the sentiment value encoded by negation (cf. the obvious difference in stating

....... account for#membukukan:menerangkan:menganggap:menjelaskan account payable#piutang accountability#keadaan yg dpt dipertanggung jawabkan accountable#bertgjwb:bertanggung jawab accountancy#pekerjaan pembukuan:akuntansi:akutansi accountant#akuntan accounting#menghitung:menerangkan:laporan:akutansi .........

Fig. 1.

Sample entries from bilingual dictionary

something is very good and not very good) by prefixing all words between a negation word (e.g. not, isn’t, don’t, etc.) and the first punctuation mark subsequently encountered with the tag NOT . Thus, for example, given the sentence “this movie isn’t good at all .”, the extracted features will be this, movie, isn’t, NOT good, NOT at, and NOT all. Applying this method, we obtain a new corpus variation which we refer to as the E NGLISH N OT corpus. To carry out our experiment of sentiment analysis of Indonesian documents, we automatically translated the reference corpus using two methods. The first uses an off-the-shelf commercial translation system, TransTool v4.1. This is a closedsource proprietary program, thus there is little information on how it works. The resulting automatically created corpus is referred to as the T RANS T OOL corpus. The second method is simply a naive word substitution using an English-Indonesian bilingual dictionary constructed from various online resources. Our bilingual dictionary contains 36437 English words, and for each word there exists one or more Indonesian translations. Such ‘translations’ are sometimes corresponding words, but on occasion are informal explanations. Figure 1 shows some sample entries. Each line shows an English word directly followed by a ‘#’ delimiter character, which in turn is followed by a list of possible Indonesian translations separated by a colon (‘:’). Notice that the bilingual dictionary also contains explanations, e.g. the ‘translation’ for accountability, informal abbreviations, e.g. bertgjwb, and spelling errors, e.g. akutansi. For this word substitution-based translation process, we experimented by varying two factors. The first concerns which Indonesian translation to use when multiple options exist. The three alternatives tried are as follows: • A LLT RANS : given an English word, all possible Indonesian translations are added to the translation result. • F IRST T RANS : given an English word, only the first possible Indonesian translation is added to the translation result. • L AST T RANS : given an English word, only the last possible Indonesian translation is added to the translation result. For example, if an English article contained the word accounting, A LLT RANS would add menghitung, menerangkan, laporan, and akutansi to the translation result, whereas F IRSTT RANS would simply add menghitung and L AST T RANS would only add akutansi (see Figure 1). The second factor concerns the inclusion and exclusion of

E NGLISH: in a world where children are genetically screened and filtered before birth , vincent is born with no artificial genetic help from his parents . E NGLISH N OT: in a world where children are genetically screened and filtered before birth , vincent is born with no NOT artificial NOT genetic NOT help NOT from NOT his NOT parents . T RANS T OOL: di ( dalam ) suatu dunia [ di mana/jika ] anak-anak genetically disaring dan disaring [ sebelum/di depan ] kelahiran , vincent adalah pembawaan sejak lahir tidak ( ada ) bantuan [ yang ] hal azas keturunan tiruan dari orang tua nya . A LLT RANS I NCLUDE: atas berkecimpung di di dalam pada yang menang pengaruh kedudukan suatu alam buana bumi butala darulfana dunia jagat johan manjapada di mana dimana tunas bangsa ialah secara genetis screened dan filtered baik di atas ke hadapan lebih lebih dulu mendahului sblm sdh telah sebelum sebelumnya terlebih dahulu kelahiran keturunan , vincent ialah bakat alam lahir pembawaan sejak kecil terlahir menyediakan menjunjung menghasilkan mengangkat menanggung memikul melahirkan dengan serta bukan tidak nomor tiruan artifisial buatan bikinan genetis yg berhubungan dgn azas keturunan menahan mencegah mengelakkan menghindari bantu bantuan inayat membantu membela mengembari menolong menunjang menyantuni menyumbang pembantuan pertolongan sambatan sambung tangan santunan berasal dari nya ibu bapak . A LLT RANS I GNORE: atas berkecimpung di di dalam pada yang menang pengaruh kedudukan suatu alam buana bumi butala darulfana dunia jagat johan manjapada di mana dimana tunas bangsa ialah secara genetis dan baik di atas ke hadapan lebih lebih dulu mendahului sblm sdh telah sebelum sebelumnya terlebih dahulu kelahiran keturunan , ialah bakat alam lahir pembawaan sejak kecil terlahir menyediakan menjunjung menghasilkan mengangkat menanggung memikul melahirkan dengan serta bukan tidak nomor tiruan artifisial buatan bikinan genetis yg berhubungan dgn azas keturunan menahan mencegah mengelakkan menghindari bantu bantuan inayat membantu membela mengembari menolong menunjang menyantuni menyumbang pembantuan pertolongan sambatan sambung tangan santunan berasal dari nya ibu bapak . F IRST T RANS I NCLUDE: atas suatu alam di mana tunas bangsa ialah secara genetis screened dan filtered baik kelahiran , vincent ialah bakat alam dengan bukan tiruan genetis menahan berasal nya ibu bapak . F IRST T RANS I GNORE: atas suatu alam di mana tunas bangsa ialah secara genetis dan baik kelahiran , ialah bakat alam dengan bukan tiruan genetis menahan berasal nya ibu bapak . L AST T RANS I NCLUDE: dalam suatu dunia dimana tunas bangsa ialah secara genetis screened dan filtered lebih dahulu kelahiran , vincent ialah melahirkan serta bukan bikinan yg berhubungan dgn azas keturunan bantuan dari nya ibu bapak . L AST T RANS I GNORE: dalam suatu dunia dimana tunas bangsa ialah secara genetis dan lebih dahulu kelahiran , ialah melahirkan serta bukan bikinan yg berhubungan dgn azas keturunan bantuan dari nya ibu bapak . TABLE I E XAMPLE TRANSLATIONS

words in the original English article for which our bilingual dictionary contained no translations. In an I NCLUDE corpus, any English word without an Indonesian translation is simply added to the automatic translation corpus, whereas in an I G NORE corpus it is not. For each of the three word substitution methods above, we create both the I NCLUDE and I GNORE versions, resulting in six different corpus versions. Table I shows an example of a fragment of a review along with its corresponding text in the various other corpus versions.

Label A LL F EAT T OP 2000 T OP 5000 E ND 25

Detail Select all features with a minimum frequency of 4 Select 2000 most frequent features Select 5000 most frequent features Select features with a minimum frequency of 4 and appearing in the last quarter of the article TABLE II F EATURE SELECTION VARIATION

Corpus type E NGLISH E NGLISH N OT T RANS T OOL A LLT RANS I NCLUDE A LLT RANS I GNORE F IRST T RANS I NCLUDE F IRST T RANS I GNORE L AST T RANS I NCLUDE L AST T RANS I GNORE

A LL F EAT 15103 16305 11887 24316 17822 12928 6373 12043 5479

E ND 25 5365 5671 5014 13714 12083 5080 3401 4671 2997

TABLE III N UMBER OF FEATURES OBTAINED FOR A LL F EAT AND E ND 25

B. Feature selection and values As discussed in Section III, a document is represented as a vector of features representing the occurrence of unigrams. The composition of the particular unigrams selected to represent the features of a document can greatly impact the effectiveness of the learning algorithm. Firstly, we compiled a set of all unique words appearing at least once in the entire collection. From this, we then created four subsets to be used as the features in the training process. The A LL F EAT subset consists of all features with a minimum occurrence frequency of 4 within the entire collection. This threshold was imposed so as to keep the total number of features down to a computationally tractable size, and to reduce unwanted noise introduced by very infrequently occurring words. Subsequently, we sorted this set in order of frequency count and took the top 2000 (T OP 2000) and 5000 (T OP 5000) unigrams. These smaller subsets were created to observe how much sentiment information could be obtained from using a limited number of features. We also experimented with selecting a further subset of unigrams that exclusively appear in the last quarter of an article. The intuition behind this is that review articles typically deliver their verdict towards the end of the article, thus it is at this position of the text where most sentiment information is conveyed. This subset is referred to as E ND 25. Table II shows a summary of the various feature selection schemes. Lastly, we also experiment with the actual values of the features themselves. Typically, the features used are simply binary features indicating presence, i.e. 1 if a unigram appears in a particular article, and 0 if it doesn’t. However, we also experiment with other values commonly used in corpus linguistics and information retrieval, e.g. the raw and normalized frequency count, and TF-IDF weighting [9]. V. E XPERIMENT R ESULTS & A NALYSIS

The experiments were performed for each of the following nine corpus variations (see Section IV-A for details): E N GLISH , E NGLISH NOT, T RANS T OOL , A LLT RANS I NCLUDE , A LLT RANS I GNORE, F IRST T RANS I NCLUDE, F IRST T RAN S I GNORE , L AST T RANS I NCLUDE , and L AST T RANS I GNORE . For each of these variations, we tested four methods for feature selection (see Section IV-B for details), i.e. T OP 2000, T OP 5000, A LL F EAT, and E ND 25. Finally, we also carried out some experiments with different feature values, e.g. TF-IDF normalization, which we discuss later. Table III shows the number of features obtained for the A LL F EAT and E ND 25 (the number of features for the T OP 2000 and T OP 5000 feature selection schemes should be self-explanatory). For each combination of the experimental variations described above, we performed three-way cross-validation by partitioning the sentiment corpus into three folds, where each fold contains an equal number of positive and negative sentiment articles [10]. The classification was performed on each fold using a classifier trained on the remaining two folds. Before carrying out the main experiment, we performed a small baseline measurement using a naive majority classification algorithm. For each unigram appearing in the training data, we counted their appearance in positive and negative articles. If they appeared more frequently in positive articles, they were tagged as positive features, and similarly for negative features. For classification, we simply compute the number of positive and negative features appearing in a test article and classify it based on which features appear more. We experimented with slight variations, i.e. a minimum threshold in the difference of feature counts. The best results obtained was an accuracy of 52.43%, marginally better than a theoretical 50% chance given a binary classification task.

A. Experiment setup

B. Language and translation variations

The experiment was carried out using various open-source implementations, namely WEKA v3.571 for naive Bayes classification, OpenNLP MaxEnt v2.402 for maximum entropy classification, and Thorsten Joachim’s SVMLight3 for support vector machine classification.

Table IV presents the results of our main experiment. For each corpus variation, it shows the average accuracy obtained by each classification technique in classifying the sentiment of unseen test articles having been trained on the remaining data. The results are averaged over the three cross-validation folds. Note that NB stands for naive Bayes, NBM stands for multinomial naive Bayes, ME stands for maximum entropy classification, and SVM stands for support vector machines.

1 www.cs.waikato.ac.nz/ml/weka 2 maxent.sourceforge.net 3 svmlight.joachims.org

E NGLISH E NGLISH N OT T RANS T OOL A LL I NCLUDE A LL I GNORE F IRST I NCLUDE F IRST I GNORE L AST I NCLUDE L AST I GNORE Average

NB 76.05% 76.77% 73.72% 69.86% 69.50% 73.11% 72.36% 74.68% 74.43% 73.39%

NBM 81.28% 80.71% 78.86% 75.32% 74.53% 78.20% 77.75% 79.75% 79.03% 78.38%

ME 81.18% 81.43% 79.68% 76.55% 75.91% 78.84% 78.52% 80.46% 79.91% 79.16%

SVM 81.19% 81.45% 80.12% 76.93% 76.70% 80.05% 79.02% 80.38% 80.07% 79.54%

Average 79.93% 80.09% 78.09% 74.66% 74.16% 77.55% 76.91% 78.82% 78.36% 77.62%

TABLE IV ACCURACY OF CLASSIFICATION METHODS AGAINST DIFFERENT

T OP 2000 T OP 5000 A LL F EAT E ND 25 Average

NB 72.26% 73.48% 73.89% 73.92% 73.39%

NBM 77.28% 77.99% 79.00% 79.25% 78.38%

ME 79.14% 79.55% 77.40% 80.56% 79.16%

SVM 78.51% 79.54% 80.02% 80.11% 79.55%

Average 76.80% 77.64% 77.58% 78.46% 77.62%

TABLE V ACCURACY OF CLASSIFICATION METHODS AGAINST FEATURE SELECTION

C. Feature selection and value variation

LANGUAGE DATA

In general we can see that the best accuracy is achieved for the reference English collection, particularly the E NGLISH N OT variation, as opposed to the various automatically produced Indonesian translations. This is certainly to be expected, given that the translation methods are rather naive. However, we can see that the decrease in accuracy for the Indonesian translations is quite small. Looking at the Indonesian translation accuracy results, there are several observations that can be made. Firstly, TransTool, a commercial translator which presumably does more than simple word-for-word substitution, is not the best performer. Such substitutions, whilst certainly unacceptable as a genuine translation method, performs quite well. This may be due to the fact that the features extracted are unigrams, and hence issues such as word order and context are not relevant to the task. Furthermore, the best results for Indonesian are obtained with the L AST T RANS method. The F IRST T RANS and L ASTT RANS methods were formulated as a simple ad-hoc method to select exactly one Indonesian translation for an English word when multiple options exist, i.e. it could just as well have been a random choice. Since our bilingual dictionary was constructed by concatenating entries from various online resources, the results suggest a difference in quality of the lexical resources. More specifically, the F IRST T RANS method gives priority to the rather informal and manually crafted bilingual dictionary from Hantarto Widjaja4 , whereas the L AST T RANS method gives priority to the translations originating from www.kamus.net, a larger and more widely used commercial web service. The I GNORE language data consistently yields marginally worse results than their corresponding I NCLUDE variants, indicating that the untranslated words still encode some measure of sentiment information. This warrants further inspection as to their specific nature. Additionally, the E NGLISH N OT variation of the English data slightly improves accuracy. This is consistent with the result briefly alluded to in [1]. Analysing the different classification techniques, we see that support vector machines yield the best results, followed by maximum entropy classification, multinomial naive Bayes, and finally, standard naive Bayes classification. 4 www.geocities.com/hantarto

Table V presents the results of our experiments according to the various methods used during the feature extraction stage. It shows the average accuracy obtained by each classification technique in classifying the sentiment of unseen test articles having been trained on the remaining data using the different feature selections (Section IV-B). The results are averaged over the three cross-validation folds. The results show that limiting the features used to the 2000 most frequent unigrams yields lower accuracy than when using 5000 of the most frequent unigrams. This confirms the intuition some sentiment information is lost during the truncation. However, including all unigrams above the frequency threshold of four yields marginally lower results, suggesting that at this configuration some unwanted noise is being introduced to the training data. More interestingly is the fact that E ND 25, i.e. where the selected features are only the unigrams appearing in the last quarter of an article, yields the best accuracy of all. This seems to confirm our intuition that review articles typically deliver their verdict towards the end of the article, thus it is at this position of the text where most sentiment information is conveyed. These results slightly differ from [1], where they experimented with the tagging of each unigram as to whether it appears in the first quarter, last quarter, or middle half of the article, which yielded slightly lower results than simply using the unigram features. As before, the support vector machine tests yielded the best results among the various classification techniques employed. Lastly, we performed some other experiments by varying the values of the features themselves, i.e. instead of using simple binary features that indicate whether or not a unigram appears in an article, we tried using its absolute frequency/appearance count along with various weightings and normalization methods such as TF-IDF [9]. Table VI shows the results. The (N) identifier indicates normalization with respect to the entire word count within an article. In general, the results were worse than when using simple binary presence features, with the only exception arising when using normalized TF-IDF with support vector machines. Note that due to the limitation of the MaxEnt package used, we were unable to test features values other than presence, thus slightly skewing the averages in its favour. VI. S UMMARY The best result for the Indonesian translations was obtained with the L AST I NCLUDE method with an accuracy of 78.82%.

P RESENCE F REQUENCY F REQUENCY (N) TF-IDF TF-IDF(N) Average

NB 73.39% 64.53% 66.85% 64.31% 66.82% 67.18%

NBM 78.38% 78.17% 65.81% 74.64% 76.11% 74.62%

ME 79.16% 79.16%

SVM 79.54% 69.26% 70.59% 78.23% 80.48% 75.62%

Average 77.62% 70.65% 67.75% 72.39% 74.47% 74.15%

TABLE VI ACCURACY OF CLASSIFICATION METHODS AGAINST FEATURE VALUES

This compares quite favourably with the E NGLISH N OT accuracy of 80.09%, indicating that automatic sentiment classification of articles using well-established supervised learning techniques is feasible for the Indonesian language. In terms of feature selection, unigrams appearing in the last quarter of an article tend to encode more sentiment information, confirming our intuitive notion that review articles typically deliver their verdict towards the end of the article. The next step is to verify this on a genuine corpus of articles in the Indonesian language, and to see whether the same results are obtained. One strategy would be to crawl and collect online product reviews written in Indonesian from the Web. R EFERENCES [1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques,” in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002, pp. 79–86. [2] B. Pang and L. Lee, Opinion mining and sentiment analysis. Now Publishers, 2008. [3] A. Wierzbicka, Emotions across Languages and Cultures: Diversity and Universals. Cambridge University Press, 1999. [4] P. D. Turney, “Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), Philadelphia, USA, 2002, pp. 417–424. [5] N. Kaji and M. Kitsuregawa, “Building lexicon for sentiment analysis from massive collection of HTML documents,” in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007, pp. 1075–1083. [6] A. McCallum and K. Nigam, “A comparison of event models for naive bayes text classification,” in Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998, pp. 41–48. [7] A. Ratnaparkhi, “Maximum entropy models for natural language ambiguity resolution,” Ph.D. dissertation, University of Pennsylvania, 1998. [8] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. [9] C. Manning and H. Schtze, Foundations of Statistical Natural Language Processing. Cambridge, USA: MIT Press, 1999. [10] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall, 2002.

Machine Learning-based Sentiment Analysis of Automatic Indonesian [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch