Sentiment and Subjectivity Analysis - EuroLAN and Subjectivity Analysis An Overview ... through classification, tabulation, ... Sentiment classification, sentiment

  • Published on
    29-Mar-2018

  • View
    219

  • Download
    6

Transcript

  • Sentiment and Subjectivity AnalysisAn Overview

    Alina AndreevskaiaDepartment of Computer ScienceConcordia University Montreal, QuebecCanada

    Nancy IdeDepartment of Computer Science

    Vassar CollegePoughkeepsie, New York

    USA

  • Definition Sentiment Analysis

    Also called Opinion Mining Classify words/senses, texts, documents according to the opinion, emotion, or sentiment they express

    Applications Determining critics opinions of products

    Track attitudes toward political candidates

    etc.

  • Sub-Tasks Determine Subjective-Objective polarity

    Is the text or language factual or an expression of an opinion?

    Determine Positive-Negative polarity Does the subjective text express a positive or negative opinion of the subject matter?

    Determine the strength of the opinion Is the opinion weakly positive/negative, strongly positive/negative, or neutral?

  • History Current work stems from

    Content analysis analysis of the manifest and latent content of a body of communicated material (as a book or film) through classification, tabulation, and evaluation of its key symbols and themes in order to ascertain its meaning and probable effect (Websters Dictionary of the English Language, 1961)

    Long history Quantitative newspaper analysis (1890-on)

    Lasswell, 1941: study of political symbols in editorials and public speeches

    Gerbner, 1969: establish violence profiles for different TV networks; trace trends; see how various groups portrayed

  • Content Analysis Psychology and sociology

    Analysis of verbal patterns to determine motivational, mental, personal characteristics (1940s)

    group processes cultural commonalities/differences (Osgood, Suci, and Tannebaum, 1957, semantic differential scales)

    Major contribution: General Inquirer http://www.wjh.harvard.edu/~inquirer/

    Anthropology Study of myths, riddles, folktales Analysis of kinship terminology (Goodenough, 1972)

    Literary and Rhetorical analysis Stylistic analysis (Sedelow and Sedelow, 1966) Thematic analysis (Smith, 1972; Ide, 1982, 1989)

  • Other Roots Point of view tracking in narrative (Banfield, 1982; Uspensky, 1973; Wiebe, 1994) Subjectivity analysis

    Affective Computing (Picard, 1997) Develop means to enable computer todetect and appropriately respond to user's emotions

    Directionality (Hearst, 1992) Determine if author is positive, neutral, or negative toward some part of a document

  • History

    Late 90s: first automatic systems implemented for NLP Spertus, 1997 Wiebe and Bruce, 1995 Wiebe et al., 1999 Bruce and Wiebe, 2000

    Now a major research stream

  • Current work in NLP Sentiment tagging

    Assignment of positive, negative, or neutral values/tags to texts and its components

    Began with focus on binary (positive-negative) classification

    Recently, include neutrals

    Little work on other types of affect Still a focus in much content analysis work in other fields

  • Other Work Assignment of fine-grained affect labels based on various

    psychological theories (Valitutti et al., 2004; Strapparavaand Mihalcea, 2007)

    Detection of opinion holders (Kim and Hovy, 2004; Kim and Hovy, 2005; Kim and

    Hovy, 2006; Choi et al., 2005; Bethard et al., 2004; Kobayashi et al., 2007)

    opinion targets (Hurst and Nigam, 2004; Gamon and Aue, 2005; Huand Liu, 2004; Popescu and Etzioni, 2005; Kim and Hovy, 2006; Kobayashi et al., 2007)

    perspective (Lin et al., 2006) pros and cons in reviews (Kim and Hovy, 2006a) bloggers mood (Mishne and Glance, 2006; Mishne, 2005; Leshed

    and Kaye, 2006) happiness (Mihalcea and Liu, 2006) politeness (Roman et al., 2005)

    Assignment of ratings to movie reviews (Pang and Lee, 2005) Identification of support/opposition in congressional

    debates (Thomas et al., 2006) Prediction of election results (Kim and Hovy, 2007)

  • Subjectivity Focuses on determining subjective words and texts that mark the presence of opinions and evaluations vs. objective words and texts, used to present factual information (Wiebe, 2000; Wiebe et al., 2004; Wiebe and Riloff, 2005)

  • Terminology

    Many terms Sentiment classification, sentiment analysis

    Semantic orientation Opinion analysis, opinion mining Valence Polarity Attitude

    Here, use the term sentiment

  • Theories of Emotion and Affect

    Osgood's semantic differential (Osgood, Suci, and Tannenbaum, 1957) Three recurring attitudes that people use to evaluate words and phrases Evaluation (good-bad) Potency (strong-weak) Activity (active-passive)

    Ortony's salience-imbalance theory(Ortony, 1979) defines metaphors in terms of particular relationships between topic and vehicle

  • Theories of Emotion and Affect

    Martins Appraisal Framework http://www.grammatics.com/appraisal/

    Three sub-types of attitude Affect (emotion)

    evaluation of emotional disposition

    Judgment (ethics) normative assessments of human behavior

    Appreciation (aesthetics) assessments of form, appearance, composition, impact, significance etc of human artefacts and individuals

  • Theories of Emotion and Affect

    Elliots Affective Reasoner http://condor.depaul.edu/~elliott/ar.html

    GRO UP SPE CIFICATI ON CATEGORY LABELAN D EMOTION TYPE

    Well-Being appr aisal of a situation as an event

    joy, distress

    Fortune s-of-Others

    presumed value of a situation as an event affecting another

    happy, gloating, resen tmen t, jealous y, env y, sorr y-for

    Prospe ct-based appr aisal of a situation as a prospec tive event

    hop e, fear

    Confirmati on appr aisal of a situation as confirming or disconfirming an expe ctation

    satis faction, relief, fears-con firmed, disappoin tmen t

    Attribution appr aisal of a situation as an act of some agent accounta ble

    prid e, admiration, shame, repr oach

    Attraction appr aisal of a situation as cont aining an a ttractive or unattractive obje ct

    liking, dislik ing

    Well-being/ Attribution

    compound emo tions gra titud e, ange r, gra tification, remorse

    Attraction/Attribution

    compound emo tion extensions love, hate

  • Theories of Emotion and Affect

    Ekmans basic emotions Ekmans work revealed that facial expressions of emotion are not culturally determined, but universal to human culture and thus biological in origin

    Found expressions of anger, disgust, fear, joy, sadness, and surprise to be universal Some evidence for contempt

  • Resource Development

    Semantic properties of individual words are good predictors of semantic characteristics of a phrase or a text that contain them

    Requires development of lists ofwords indicative of sentiment

  • Manually-created Lists General Inquirer (GI)

    Best known extensive list of words categorized for various categories

    Developed as part of content-analysis project (Stone et al., 1966; Stone et al., 1997)

    Three main word lists: Harvard IV-4 dictionary of content-analysis categories

    (includes Osgoods three dimensions of value, power and activity)

    Lasswells dictionary eight basic value categories (WEALTH, POWER, RESPECT, RECTITUDE,

    SKILL, ENLIGHTENMENT, AFFECTION, WELLBEING) plus other info

    Five categories based on social cognition work of Semin and Fiedler (1988)

    Verb and adjective types

    Some words tagged for sense Recognized as a gold standard for evaluation of

    automatically produced lists

  • WordNetAffect Developed semi-automatically by Strapparava and

    colleagues assigned affect labels to words in WordNet expanded the lists using WordNet relations such as

    synonymy, antonymy, entailment, hyponymy Includes

    semantic labels based on psychological and social science theories (Ortony, Elliot, Ekman)

    valence (positive or negative) arousal (strength of emotion)

    2004 version covers 1314 synsets, 3340 words Part of WordNet Domains

    http://wndomains.itc.it/

  • Others Whissells Dictionary of Affect in Language (DAL) (Sweeney and Whissell, 1984; Whissell, 1989; Whissell and Charuk, 1985)

    Affective Norms for English Words(ANEW) (Bradley and Lang, 1999)

    Sentiment-bearing adjectives by Hatzivassiloglou and McKeown(1997)

    Sentiment and subjectivity cluesfrom work by Wiebe

  • Caution

    Limitations of manually annotated lists limited coverage low inter-annotator agreement diversity of the tags used

  • Automatically-created Lists

    Corpus-based methods Hatzivassiloglou and McKeown (1997) (HM)

    builds on the observation that some linguistic constructs, such as conjunctions, impose constraints on the semantic orientation of their constituents

    clustered adjectives from the Wall Street Journalin a graph into positive and negative sets based on the type of conjunction between them

    cluster with higher average frequency was deemed to contain positive adjectives, lower average frequency meant negative sentiment

    Limitations algorithm limited to adjectives (also adverbs --

    Turney and Littman, 2002) requires large amounts of hand-labeled data to

    produce accurate results

  • Web As Corpus Peter Turney (Turney, 2002; Turney and Littman, 2002; Turney and Littman, 2003) More general method, does not require previously annotated data for training

    Induce sentiment of a word from the strength of its association with 14 seed words with known positive or negative semantic orientation

    Two methods for association: Point-wise mutual information Semantic latency

    Used web as data ran 14 queries on AltaVista using NEAR operator to acquire co-occurrence statistics with the 14 seed words

  • Turney, cont

    Results evaluated against GI on a variety of test settings gave up to 97.11% accuracy for top 25% of words

    Size of the corpus had a considerable effect 10-million word corpus instead of the full Web content reduced accuracy to 61.26-68.74%

    LSA performed relatively better than PMI on 10m word corpus LSA more complex, harder to implement

  • End of An Era

    Due to its simplicity, high accuracy and domain independence, the PMI method became popular

    In 2005, AltaVista discontinued support for NEAR operator, upon which the method relied

    Attempts to substitute NEAR with AND led to considerable deterioration in system performance

  • Other Approaches

    Bethard et al. (2004) Used two different methods to acquire opinion words from corpora calculated frequency of co-occurrence with seed words taken from Hatzivassiloglou and McKeown, computed the log-likelihood ratio

    computed relative frequencies of words in subjective and objective documents

    First method produced better results for adverbs and nouns, gave higher precision but lower recall for adjectives

    Second method worked best for verbs

  • Other Approaches Kim and Hovy (2005)

    separated opinion words from non-opinion words by computing their relative frequency in subjective (editorial) and objective (non-editorial) texts from TREC data

    Riloff et al. (2003), Grefenstetteet al. (2006) Used syntactic patterns Learn lexico-syntactic expressions characteristic for subjective nouns

  • Automatically-created Lists

    Dictionary-Based Methods Addresses some of the limitations of corpus-based methods

    Use semantic resources such as WordNet, thesauri

    Two approaches: Rely on thesaural relations between words (synonymy, antonymy, hyponymy, hyperonymy) to find similarity between seed words and other words

    Exploit information contained in definitions and glosses

  • Use of WordNet

    Kim and Hovy (2004,2005) Extended word lists by using WordNet synsets

    Ranked lists based on sentiment polarity assigned to each word in synset based on WordNetdistance from positive and negative seed words

    Similar approach used by Huand Liu (2004)

  • SentiWordNet Developed by Esuli and Sebastiani Trained several classifiers to give rating for positive, negative, objective for each synset in WordNet 2.0 Scores from 0 - 1

    Freely available

    http://sentiwordnet.isti.cnr.it

  • Beyond Synsets Kamps et al. (2004)

    Tagged words with Osgoods three semantic dimensions

    Computed shortest path through WordNet relations connecting a word to words representative of the three categories (e.g., good and bad for evaluation)

    Esuli and Sebastiani (2005) Classified words in WordNet into positive and negative based on synsets, glosses and examples

  • Further Beyond Synsets Andreevskaia and Bergler (2006)

    Take advantage of the semantic similarity between glosses and head words

    Start with a list of manually annotated words, expand with synonyms and antonyms, search WordNet glosses for occurrences of seed words

    If a gloss contains a word with known sentiment, head word deemed to have same sentiment

    Suggest that overlap measure reflects the centrality of the word in the sentiment category

  • Role of Neutrals Most work cited so far classifies words as positiveor negative

    Results vary between 60-80% agreement with GI as gold standard

    Adding neutrals severely reduces accuracy by 10-20%, depending on part of speech

  • Problems Words without strong positive or negative connotations are difficult to categorize accurately Strength of +/- affinity can be used as a measure

    Highest accuracy for words on extremes of the +/- poles

    Many words have both sentiment-bearing and neutral senses E.g., great typically tagged as positive, but according to statistics in WordNet, used neutrally 75% of occurrences

    Solve this by using sense-tagged word lists

  • Sense-tagged lists

    The need for sense-level sentiment annotation has recently attracted considerable attention

    Development of methods to devise sense-tagged word lists

  • Sense-tagged Word Lists Andreevskaia and Bergler (2006)

    applied gloss-based sentiment tagging to sense level

    extended their system by adding a word sense disambiguation module (Senti-Sense) used syntactic patterns to disambiguate between sentiment-bearing and neutral senses

    learned generalized adjective-noun patterns for sentiment- bearing adjectives from unambiguous data

    abstracted learned patterns to higher levels of hypernym hierarchies using predetermined propagation rules

    applied learned patterns to disambiguate adjectives with multiple senses in order to locate senses that bear sentiment

  • Sense-tagged Word Lists

    Esuli and Sebastiani (2007) Application of random walk PageRanking algorithm to sentiment tagging of synsets

    Takes advantage of the graph-like structure of the WordNethierarchy

  • Sense-tagged Word Lists Wiebe and Mihalcea (2006)

    Sense-level tagging for subjectivity(i.e., neutrals vs. sentiment-bearing words)

    Automatic method for sense-level sentiment tagging based on Lins(1998) similarity measure

    Acquire a list of top-ranking distributionally similar words

    Compute WordNet-based measure of semantic similarity for each word in the list

  • Beyond Positive and Negative

    Ide (2006) Bootstrapped sense-tagged word lists for semantic

    categories based on FrameNet frames (e.g., commitment, reasoning, etc.) GI categories (hostile/friendly weak/strong,

    submit/dominate, power/cooperation)

    Treated lexical units associated with a given frame as a bag of words

    Used WordNet::Similarity to compute similarity among senses of the words Relation set consisted of different pairwisecombinations of synsets, glosses, examples, hypernyms, and hyponyms

    Based on results, compute suresenses (strongest association) and retain; iterate

    Augment with suresenses for synsets, hypernyms, hyponyms

  • Resulting lists very high on precision (98%), lower on recall

    Used hierarchical clustering to group senses in a given category into positive and negative and finer-grained distinctions

    Results for judgment-communication

    deride1ridicule1gibe2scoff1mock1scoff2remonstrate2

    accuse1denigrate2

    accuse2charge2recriminate1

    condemn1decry1excoriate1deprecate1

    belittle2disparage1reprehend1censure1denounce1 remonstrate3blame2castigate1

    denigrate1deprecate2execrate2

    acclaim1extol1laud1commend4commend1praise1cite2

    ridiculeaccusechargecondemnbelittledenigrateacclaim

  • Manually AnnotatedCorpora

    Scarcity of manually annotated resources for system training and evaluation

    Some work uses user-created rankings in online product, book, or movie reviews ranking scale (good-bad, liked-disliked, etc.) easily available, fast to collect Drawbacks

    contain significant amount of noise erroneous ratings misspellings phrases in different languages

    exacerbated when researchers automatically break reviews into sentences or snippets

  • Available Corpora

    Corpus Level of annotation

    Annotation type(s)

    Corpus size Link

    MPQA Phrases and sentences

    Private states 535 documents (10 657 sentences)

    http://www.cs.pitt. edu/wiebe/mpqa

    Opinion corpus

    Expressions and sentences

    Subjectivity and objectivity

    2 sets of documents, 500 sentences each from WSJ Treebank

    http://www.cs.pitt. edu/wiebe/pub4.html

    Product reviews

    Product features

    Sentiment features

    500 reviews http://www.cs.uic. edu/liub/FBS/FBS.html

    SemEval-07 Task 14 dataset

    Headlines Sentiment -100 to +100 polarity scale, 6 basic emotions

    2225 news headlines http://www.cse. unt.edu/~rada/affectivetext/

  • Inter-annotator Agreement

    Rate of agreement among human annotators may reveal important insights about the task, provide a critical baseline for system evaluation

    Few inter-annotator agreement studies conducted to date

    Inter-annotator agreement on popular sentiment genres such as movie reviews, blogs, so far unexplored

    Agreement depends on unit annotated (sentence or text) annotation type (subjectivity or sentiment) domain and genre

  • IA Studies Sentence subjectivity labels (Wiebe et al., 1999) annotators classified sentence as subjective if it contained any significant expression of subjectivity

    multiple rounds of training and annotation instructions adjustment

    pairwise Kappa over WSJ test set ranged from 0.59 to 0.76

  • Variable Results Kim and Hovy (2004)

    Relatively high agreement (=0.91) between two annotators who assigned positive, negative, and n/a labels to 100 newspaper sentences

    Gamon and Aue (2005) Similar study using car reviews produced pairwise Kappa of 0.70 - 0.80

  • Granularity

    Strapparava and Mihalcea (2007) Study suggests inter-annotator agreement substantially lower on fine-grained types of annotation

    Six annotators assigned sentiment score and Ekmans six basic emotions scores to news headlines

    Agreement (Pearson correlation) 78.01 for sentiment

    Agreement for emotion labels ranged from 36.07 (surprise) to 68.19 (sadness)

  • IA for texts

    Wiebe et al., 2001 Two genres

    flames (hostile, inflammatory messages) =0.78

    opinion pieces from WSJ =0.94-0.95

    Domain and genre may affect level of inter-annotator agreement

  • Sentiment Analysis Ultimate goal of sentiment and subjectivity annotation is the analysis of clauses, sentences, and texts

    Resources : Lists Annotated (validated) corpora

    Subjectivity analysis uses MPQA corpus reliable gold standard for training and evaluation of

    newspaper texts

    Sentiment analysis must rely on small manually annotated test sets

    created ad hoc seldom made publicly available

    too small for machine learning methods

  • Sentiment Analysis Single text often includes both positive and negative sentences

    Sentences and clauses regarded as the most natural units for sentiment/subjectivity annotation Sentiment usually more homogenous in sentence than whole text

    But harder to identify due to a limited number of clues

    Work on improving system performance by performing analysis simultaneously at different levels Sentence-level analysis has high precision Text-level analysis has high recall

  • Words vs. Other Features

    Words/unigrams provide good accuracy in sentence-level sentiment tagging Yu and Hatzivassiloglou (2003)

    no significant improvement when bigrams or trigrams added to feature set

    Kim and Hovy (2004) presence of sentiment-bearing words works better than more sophisticated scoring methods

    Riloff et al. (2006) similar results for unigrams vs. unigrams plus bigrams, extraction patterns

    subsumption hierarchy and feature selection brought less than 1% improved accuracy

  • Other Features vs. Words Some studies show gains with use of features Wilson et al. (2005), Andreevskaia et al. (2007)

    syntactic properties of the phrase (role in sentence structure, presence of modifiers and valence shifters) yield statistically significant increases in system performance sentiment and subjectivity analysis

    Gains in subjectivity analysis: presence of complex adjectival phrases (Bethard et al., 2004)

    similarity scores (Yu and Hatzivassiloglou, 2003 position in the paragraph (Wiebe et al., 1999)

    Gains in sentiment analysis syntactic patterns and negation (Hurst and Nigam, 2004; Mulder et al., 2004; Andreevskaia et al., 2007)

    knowledge about the opinion holder (Kim and Hovy, 2004)

    target of the sentiment (Hu and Liu, 2004)

  • Multiple Features Some experiments suggest improvement gained when multiple feature setscombined Hatzivassiloglou and Wiebe, 2000

    combination of lists of adjectives tagged with dynamic, polarity, and gradability labels best predictor of sentence subjectivity

    Riloff et al. (2003) accuracy of subjectivity tagging results improved with addition of each new feature (25 in all)

    Gamon and Aue (2005) sentiment tagging using larger number of features increased average precision and recall

  • SemEval-2007 Affective Text Task

    Opportunity to compare systems for sentiment analysis run on the same dataset of 2000 manually annotated news headlines

    Machine learning methods had highest recall at the cost of low precision nave Bayes-based CLaC-NB (Andreevskaia and

    Bergler, 2007) word-space model-based SICS (Sahlgren et al.,

    2007)

    Knowledge-based unsupervised approaches had highest precision, but low recall because few sentiment clues per headline CLaC (Andreevskaia and Bergler, 2007) UPAR7 (Chaumartin, 2007)

  • General Conclusion (so far) Subjectivity classification

    statistical approaches that use nave Bayesor SVM significantly outperform non-statistical techniques

    Binary (positive-negative) sentiment classification non-statistical methods that rely on presence of sentiment markers in a sentence, or on strength of sentiment associated with these markers, yield as good or better accuracy than statistical approaches

    BUT: both methods show some strengths in both tasks

    Need LOTS more research

  • Text-level Analysis

    Features Choice of features used by a sentiment annotation system is a critical factor

    Wide range of features used: lists of words lemmas or unigrams Bigrams higher order n-grams part-of-speech syntactic properties of surrounding context

    etc.

  • Use of Features

    Use of multiple models can improve the performance of both sentiment and subjectivity classifiers Use of different features within the same classifier or in a community of several classifiers improves system performance

  • Features Studies show improvement when using ngrams vs. unigrams (Dave et al., 2003; Cui et al., 2006; Aue and Gamon, 2005; Wiebe et al., 2001; Riloff et al., 2006)

    Improvement when n-gram models / word lists augmented with context information (Wiebe et al., 2001b; Andreevskaia et al., 2007)

    features associated with syntactic structure(Gamon, 2004)

    combination of words annotated for semantic categories related to sentiment or subjectivity (Whitelaw et al., 2005; Fletcher and Patrick, 2005; Mullen and Collier, 2004)

  • Feature Selection Costly or computationally unfeasible to use all

    features Aue and Gamon (2005)

    n-grams selected based on Expectation Maximizationalgorithm

    Riloff et al. (2006) Use a subsumption hierarchy for n-grams (unigrams and bigrams) and extraction patterns

    if a features words and dependencies are a superset of a more general ancestor in the hierarchy, discard

    only features with higher information gain were allowed to subsume less informative ones

    for both subjective/objective and positive/negative text-level classifications, a combined use of subsumption and traditional feature selection improves performance of subjective/objective and positive/negative text-level classification

  • Part of Speech Subjectivity

    Adjectives best predictors of subjectivity (Hatzivassiloglou and Wiebe, 2000)

    Modals, pronouns, adverbs, cardinal numbers also used as subjectivity clues (Wiebe et al., 1999; Bruce and Wiebe, 2000)

    Sentiment Combined use of words from all parts-of-speech produced more accurate tags (Blair et al., 2004; Salvetti et al., 2004)

  • Feature Generation Most sentiment classifiers use standard machine learning techniques to learn and select features from labeled corpora Works well when large labeled corpora available for training and validation (e.g., movie reviews)

    Falls short when training data is scarce different domain or topic different time period

    Led to increased interest in unsupervised and semi-supervised approaches to feature generation

  • Some Promising Research

    Systems trained on a small number of labeled examples and large quantities of unlabelled in-domain data perform relatively well (Aue and Gamon, 2005)

    Structural correspondence learning applied to small number of labeled examples sufficient to adapt to new domain (Blitzer et al., 2007)

    So far performance of these methods inferior to supervised approaches and knowledge-based methods

    Availability of word lists and clues makes knowledge-based approaches an attractive alternative to supervised machine learning when labeled data is scarce

  • Domain Effects

    Variety of domains used in sentiment analysis movie, music, book and other entertainment reviews

    product reviews Blogs dream corpus Etc.

    Choice of domain can have a major impact on results

  • Movie Reviews Popular domain in sentiment research Positive and negative words/expressions do not necessarily convey the opinion holders attitude E.g., evil used in movie reviews when referring to characters or plot, does not convey sentiment toward the movie itself (Turney, 2002)

    Simple counting of positive and negative clues in movie review texts insufficient

    Clues acquired from out-of-domain sources often fail

  • Product Reviews

    Sentiment towards whole product sum of the sentiment towards its parts, components, and attributes(Turney, 2002)

    General word lists perform better on product reviews (Turney, 2002; Kennedy and Inkpen, 2006)

  • Attitude Influence Texts with positive sentiment easier to classify than negative ones Kennedy and Inkpen, 2006; Hurst and Nigam, 2004; Dave et al., 2003; Koppel and Schler, 2006; Chaovalit and Zhou, 2005

    Possible explanations: positive documents more uniform (Dave et al., 2003)

    positive clues have higher discriminantvalue (Koppel and Schler, 2006)

    negative texts characterized by extensive use of negations and other valence shifters that reverse the sentiment conveyed by individual words (e.g., not bad) (Pang and Lee, 2004)

  • Attitude Influence

    Improvement in accuracy when valence shifters taken into account (Kennedy and Inkpen, 2006; Andreevskaia et al., 2007) but negative impact reported when negation included in feature set (Dave et al., 2003)

    Use of balanced evaluation sets with equal number of positive and negative documents has become a standard in sentiment research

  • Classification Algorithms

    Wide variety of classification approaches used: simple keyword counting methods, with or without

    scoring rule-based methods content analytical methods (statistical) SVM, nave Bayes and other statistical classifiers

    used alone, sequentially, or as a community

    Comparison of results does not provide a definite answer as to which of these methods is the best for sentiment or subjectivity tagging choice of features and training domain have a more

    impact on accuracy than choice of classification algorithm

    comparison of performance of systems evaluated on different domains or different feature sets not conclusive

  • Summary

    Sentiment and subjectivity analysis has evolved into a strong research stream in NLP

    State-of-the-art systems can reach up to 90% accuracy on certain domains But need a generally applicable method

  • Research Directions Development of semi-supervised machine-learning approaches that will maximize the usefulness of the available resources and ensure domain adaptation with limited in-domain data

    Creation of reliable and extensive resources such as lists of words and expressions, syntactic patterns, combinatorial rules, and annotated corpora

    Creation of uniform ways to denote and represent sentiment and subjectivity annotation

  • Bibliography Andreevskaia, A. and S. Bergler: 2007, CLaC and CLaC-NB: Knowledge-

    based and Corpus-based Approaches to Sentiment Tagging. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

    Andreevskaia, A., S. Bergler, and M. Urseanu: 2007, All Blogs are not made Equal. In: International Conference on Weblogs and Social Media (ICWSM-2007). Boulder, Colorado.

    Aue, A. and M. Gamon: 2005, Customizing Sentiment Classifiers to New Domains: a Case Study. In: RANLP-05, the International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria.

    Bethard, S., H. Yu, A. Thornton, V. Hatzivassiloglou, and D. Jurafsky: 2004, Automatic Extraction of Opinion Propositions and their Holders. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004) .Stanford University.

    Blair, L., A. Jaharria, S. Lewis, T. Oda, C. Reichenbach, J. Rueppel, and F. Salvetti: 2004, Impact of Lexical Filtering on Semantic Orientation. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.

  • Bibliography Breck, E., Y. Choi, and C. Cardie: 2007, Identifying expressions of

    opinion in context. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-2007). Hyderebad, India.

    Bruce, R. and J. Wiebe: 2000, Recognizing subjectivity: A case study of manual processing. Natural Language Engineering 5(2), 187205.

    Chaovalit, P. and L. Zhou: 2005, Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches. In: Proceedings of HICSS-05, the 38th Hawaii International Conference on System Sciences.

    Chaumartin, F.-R.: 2007, UPAR7: A Knowledge-based System for Headline Sentiment Tagging. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

    Cui, H., V. Mittal, and M. Datar: 2006, Comparative Experiments on Sentiment Classification for Online Product Reviews. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006). Boston, MA.

  • Bibliography Das, S. R. and M. Y. Chen: 2001, Yahoo! For Amazon: Sentiment

    extraction from small talk on the Web. In: Asia Pacific Finance Association Annual Conference (APFA01).

    Dave, K., S. Lawrence, and D. M. Pennock: 2003, Mining the Peanut gallery: opinion extraction and semantic classification of product reviews. In: (WWW03). Budapest, Hungary, pp. 519528.

    Drezde, M., J. Blitzer, and F. Pereira: 2007, Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). Prague, Czech Republic.

    Dunning, T.: 1993, Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19, 6174.

    Ekman, P.: 1993, Facial expression of emotion.. American Psychologist 48, 384392.

    Fletcher, J. and J. Patrick: 2005, Evaluating the Utility of Appraisal Hierarchies as a Method for Sentiment Classification. In: Proceedings of Australian Language Technology Workshop 2005. Sydney, Australia, pp. 134142.

  • Bibliography Gamon, M.: 2004, Sentiment classification on customer feedback

    data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceeding of COLING-04, the 20th International Conference on Computational Linguistics. Geneva, CH, pp. 841847.

    Gamon, M. and A. Aue: 2005, Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL-05 Workshop on Feature Engineering for Machine Learning in Natural Language Processing. Ann Arbor, US.

    Goldberg, A. and J. Zhu: 2006, Seeing stars when there arent many stars: Graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing. Boston, MA.

    Hatzivassiloglou, V. and J. Wiebe: 2000, Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: 18th International Conference on Computational Linguistics (COLING-2000).

    Hu, M. and B. Liu: 2004, Mining and summarizing customer reviews. In: KDD-04. pp. 168177.

    Hurst, M. and K. Nigam: 2004, Retrieving topical sentiments from Online document collection. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.

  • Bibliography Kennedy, A. and D. Inkpen: 2006, Sentiment Classification of Movie

    Reviews Using Contextual Valence Shifters. Computational Intelligence 22(2), 110125.

    Kim, S.-M. and E. Hovy: 2004, Determining the Sentiment of Opinions. In: Proceedings COLING-04, the Conference on Computational Linguistics. Geneva, CH, pp. 13671373.

    Kim, S.-M. and E. Hovy: 2005a, Automatic Detection of Opinion Bearing Words and Sentences. In: Companion Volume to the Proceedings of IJCNLP-05, the Second International Joint Conference on Natural Language Processing. Jeju Island, KR, pp. 6166.

    Kim, S.-M. and E. Hovy: 2005b, Identifying Opinion Holders for Question Answering in Opinion Texts. In: Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains. Pittsburgh, US.

    Kim, S.-M. and E. Hovy: 2006, Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. In: Proceedings of the ACL/COLING Workshop on Sentiment and Subjectivity in Text. Sydney, Australia.

    Koppel, M. and J. Schler: 2006, The importance of neutral examples for learning sentiment. Computational Intelligence 22(2), 100116.

  • Bibliography Mao, Y. and G. Lebanon: 2006, Sequential Models for Sentiment

    Prediction. In: Proceedings of the ICML workshop on Learning in Structured Output Spaces.

    McDonald, R., K. Hannan, T. Nevon, M. Wells, and J. Reynar: 2007, Structured Models for Fine-to-Coarse Sentiment Analysis. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007). Prague, Czech Republic.

    Mulder, M., A. Nijholt, M. den Uyl, and P. Terpstra: 2004, A Lexical Grammatical Implementation of Affect. In: Proceedings of TSD-04, the 7th International Conference Text, Speech and Dialogue, Vol. 3206 of Lecture Notes in Computer Science. Brno, CZ, pp. 171178.

    Mullen, T. and N. Collier: 2004, Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of EMNLP-04, 9th Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain.

    Pang, B. and L. Lee: 2004, A sentiment education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL. pp. 271278. arXiv:cs.CL/040958.

  • Bibliography Pang, B., L. Lee, and S. Vaithyanathan: 2002, Thumbs up? Sentiment

    classification using machine learning techniques. In: Conference on Empirical Methods in Natural Language Processing (EMNLP-2002). pp. 7986.

    Read, J.: 2005, Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL-2005 Student Research Workshop. Ann Arbor, MI.

    Riloff, E., S. Patwardhan, and J. Wiebe: 2006, Feature Subsumptionfor Opinion Analysis. In: Proceedings of EMNLP-06, the Conference on Empirical Methods in Natural Language Processing. Sydney, AUS, pp. 440448.

    Riloff, E., J. Wiebe, and T. Wilson: 2003, Learning subjective nouns using extraction pattern bootstrapping. In: W. Daelemans and M. Osborne (eds.): Proceedings of CONLL-03, 7th Conference on Natural Language Learning. Edmonton, CA, pp. 2532.

    Sahlgren, M., J. Karlgren, and G. Eriksson: 2007, SICS: Valence Annotation Based on Seeds in Word Space. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

  • BibliographySalvetti, F., S. Lewis, and C. Reichenbach: 2004, Impact of lexical

    filtering on overall polarity identification. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.

    Snyder, B. and R. Barzilay: 2007, Multiple Aspect Ranking using the Good Grief Algorithm. In: Proceedings of NAACL-2007. Washington, DC.

    Stoyanov, V., C. Cardie, D. Litman, and J. Wiebe: 2004, Evaluating an Opinion Annotation Scheme Using a New Multi-Perspective Question and Answer Corpus. In: J. G. Shanahan, J. Wiebe, and Y. Qu (eds.): Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Stanford, US.

    Strapparava, C. and R. Mihalcea: 2007, SemEval-2007 Task 14: Affective Text. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.

    Turney, P.: 2002, Thumbs up or thumbs down? Semantic orientation applied to un-supervised classification of reviews. In: 40th Annual Meeting of the Association of Computational Linguistics (ACL02)). pp. 417424.

  • Bibliography Whitelaw, C., N. Garg, and S. Argamon: 2005, Using Appraisal

    Taxonomies for Sentiment Analysis. In: Proceedings of CIKM-05, the ACM SIGIR Conference on Information and Knowledge Management. Bremen, Germany.

    Wiebe, J.: 2002, Instructions for Annotating Opinions in Newspaper Artic. Technical Report TR-01-101, University of Pittsburgh, Department of Computer Science, Pittsburgh, PA.

    Wiebe, J., E. Breck, C. Buckley, C. Cardie, P. Davis, B. Fraser, D. Litman, D. Pierce, E. Riloff, T. Wilson, D. Day, and M. Maybury: 2003, Recognizing and Organizing Opinions Expressed in World Press. In: Proceedings of the AA AI Spring Symposium on New Directions in Question Answering.

    Wiebe, J., R. Bruce, M. Bell, M. Martin, and T. Wilson: 2001a, A corpus study of Evaluative and Speculative Language. In: Proceedings of the 2nd ACL SIGDial Workshop on Discourse and Dialogue). Aalberg, Denmark.

    Wiebe, J. and E. Riloff: 2005, Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In: Proceeding of CICLing-05, International Conference on Intelligent Text Processing and Computational Linguistics, Vol. 3406 of Lecture Notes in Computer Science. Mexico City, MX, pp. 475486.

  • Bibliography Wiebe, J., T. Wilson, and M. Bell: 2001b, Identifying Collocations

    for Recognizing Opinions. In: Proceedings of the ACL Workshop on collocation. Toulouse, France.

    Wiebe, J., T. Wilson, and C. Cardie: 2005, Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39(23), 165210.

    Wiebe, J. M., R. F. Bruce, and T. P. OHara: 1999, Development and Use of a Gold-Standard Data Set for Subjectivity Classifications . In: 37th Annual Meeting of the Association for Computational Linguistics (ACL-99). pp. 246253.

    Wilson, T. and J. Wiebe: 2003, Annotating Opinions in the World Press. In: 4th SIGdial Workshop on Discourse and Dialogue (SIGdial-03).

    Wilson, T., J. Wiebe, and R. Hwa: 2006, Recognizing Strong and Weak Opinion Clauses. Computational Intelligence 2(22), 7399.

    Yu, H. and V. Hatzivassiloglou: 2003, Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In: M. Collins and M. Steedman(eds.): Proceedings of EMNLP-03, 8th Conference on Empirical Methods in Natural Language Processing. Sapporo, Japan, pp. 129136.

Recommended

View more >