Sentiment and Subjectivity AnalysisAn Overview
Alina AndreevskaiaDepartment of Computer ScienceConcordia University Montreal, QuebecCanada
Nancy IdeDepartment of Computer Science
Vassar CollegePoughkeepsie, New York
Definition Sentiment Analysis
Also called Opinion Mining Classify words/senses, texts, documents according to the opinion, emotion, or sentiment they express
Applications Determining critics opinions of products
Track attitudes toward political candidates
Sub-Tasks Determine Subjective-Objective polarity
Is the text or language factual or an expression of an opinion?
Determine Positive-Negative polarity Does the subjective text express a positive or negative opinion of the subject matter?
Determine the strength of the opinion Is the opinion weakly positive/negative, strongly positive/negative, or neutral?
History Current work stems from
Content analysis analysis of the manifest and latent content of a body of communicated material (as a book or film) through classification, tabulation, and evaluation of its key symbols and themes in order to ascertain its meaning and probable effect (Websters Dictionary of the English Language, 1961)
Long history Quantitative newspaper analysis (1890-on)
Lasswell, 1941: study of political symbols in editorials and public speeches
Gerbner, 1969: establish violence profiles for different TV networks; trace trends; see how various groups portrayed
Content Analysis Psychology and sociology
Analysis of verbal patterns to determine motivational, mental, personal characteristics (1940s)
group processes cultural commonalities/differences (Osgood, Suci, and Tannebaum, 1957, semantic differential scales)
Major contribution: General Inquirer http://www.wjh.harvard.edu/~inquirer/
Anthropology Study of myths, riddles, folktales Analysis of kinship terminology (Goodenough, 1972)
Literary and Rhetorical analysis Stylistic analysis (Sedelow and Sedelow, 1966) Thematic analysis (Smith, 1972; Ide, 1982, 1989)
Other Roots Point of view tracking in narrative (Banfield, 1982; Uspensky, 1973; Wiebe, 1994) Subjectivity analysis
Affective Computing (Picard, 1997) Develop means to enable computer todetect and appropriately respond to user's emotions
Directionality (Hearst, 1992) Determine if author is positive, neutral, or negative toward some part of a document
Late 90s: first automatic systems implemented for NLP Spertus, 1997 Wiebe and Bruce, 1995 Wiebe et al., 1999 Bruce and Wiebe, 2000
Now a major research stream
Current work in NLP Sentiment tagging
Assignment of positive, negative, or neutral values/tags to texts and its components
Began with focus on binary (positive-negative) classification
Recently, include neutrals
Little work on other types of affect Still a focus in much content analysis work in other fields
Other Work Assignment of fine-grained affect labels based on various
psychological theories (Valitutti et al., 2004; Strapparavaand Mihalcea, 2007)
Detection of opinion holders (Kim and Hovy, 2004; Kim and Hovy, 2005; Kim and
Hovy, 2006; Choi et al., 2005; Bethard et al., 2004; Kobayashi et al., 2007)
opinion targets (Hurst and Nigam, 2004; Gamon and Aue, 2005; Huand Liu, 2004; Popescu and Etzioni, 2005; Kim and Hovy, 2006; Kobayashi et al., 2007)
perspective (Lin et al., 2006) pros and cons in reviews (Kim and Hovy, 2006a) bloggers mood (Mishne and Glance, 2006; Mishne, 2005; Leshed
and Kaye, 2006) happiness (Mihalcea and Liu, 2006) politeness (Roman et al., 2005)
Assignment of ratings to movie reviews (Pang and Lee, 2005) Identification of support/opposition in congressional
debates (Thomas et al., 2006) Prediction of election results (Kim and Hovy, 2007)
Subjectivity Focuses on determining subjective words and texts that mark the presence of opinions and evaluations vs. objective words and texts, used to present factual information (Wiebe, 2000; Wiebe et al., 2004; Wiebe and Riloff, 2005)
Many terms Sentiment classification, sentiment analysis
Semantic orientation Opinion analysis, opinion mining Valence Polarity Attitude
Here, use the term sentiment
Theories of Emotion and Affect
Osgood's semantic differential (Osgood, Suci, and Tannenbaum, 1957) Three recurring attitudes that people use to evaluate words and phrases Evaluation (good-bad) Potency (strong-weak) Activity (active-passive)
Ortony's salience-imbalance theory(Ortony, 1979) defines metaphors in terms of particular relationships between topic and vehicle
Theories of Emotion and Affect
Martins Appraisal Framework http://www.grammatics.com/appraisal/
Three sub-types of attitude Affect (emotion)
evaluation of emotional disposition
Judgment (ethics) normative assessments of human behavior
Appreciation (aesthetics) assessments of form, appearance, composition, impact, significance etc of human artefacts and individuals
Theories of Emotion and Affect
Elliots Affective Reasoner http://condor.depaul.edu/~elliott/ar.html
GRO UP SPE CIFICATI ON CATEGORY LABELAN D EMOTION TYPE
Well-Being appr aisal of a situation as an event
presumed value of a situation as an event affecting another
happy, gloating, resen tmen t, jealous y, env y, sorr y-for
Prospe ct-based appr aisal of a situation as a prospec tive event
hop e, fear
Confirmati on appr aisal of a situation as confirming or disconfirming an expe ctation
satis faction, relief, fears-con firmed, disappoin tmen t
Attribution appr aisal of a situation as an act of some agent accounta ble
prid e, admiration, shame, repr oach
Attraction appr aisal of a situation as cont aining an a ttractive or unattractive obje ct
liking, dislik ing
compound emo tions gra titud e, ange r, gra tification, remorse
compound emo tion extensions love, hate
Theories of Emotion and Affect
Ekmans basic emotions Ekmans work revealed that facial expressions of emotion are not culturally determined, but universal to human culture and thus biological in origin
Found expressions of anger, disgust, fear, joy, sadness, and surprise to be universal Some evidence for contempt
Semantic properties of individual words are good predictors of semantic characteristics of a phrase or a text that contain them
Requires development of lists ofwords indicative of sentiment
Manually-created Lists General Inquirer (GI)
Best known extensive list of words categorized for various categories
Developed as part of content-analysis project (Stone et al., 1966; Stone et al., 1997)
Three main word lists: Harvard IV-4 dictionary of content-analysis categories
(includes Osgoods three dimensions of value, power and activity)
Lasswells dictionary eight basic value categories (WEALTH, POWER, RESPECT, RECTITUDE,
SKILL, ENLIGHTENMENT, AFFECTION, WELLBEING) plus other info
Five categories based on social cognition work of Semin and Fiedler (1988)
Verb and adjective types
Some words tagged for sense Recognized as a gold standard for evaluation of
automatically produced lists
WordNetAffect Developed semi-automatically by Strapparava and
colleagues assigned affect labels to words in WordNet expanded the lists using WordNet relations such as
synonymy, antonymy, entailment, hyponymy Includes
semantic labels based on psychological and social science theories (Ortony, Elliot, Ekman)
valence (positive or negative) arousal (strength of emotion)
2004 version covers 1314 synsets, 3340 words Part of WordNet Domains
Others Whissells Dictionary of Affect in Language (DAL) (Sweeney and Whissell, 1984; Whissell, 1989; Whissell and Charuk, 1985)
Affective Norms for English Words(ANEW) (Bradley and Lang, 1999)
Sentiment-bearing adjectives by Hatzivassiloglou and McKeown(1997)
Sentiment and subjectivity cluesfrom work by Wiebe
Limitations of manually annotated lists limited coverage low inter-annotator agreement diversity of the tags used
Corpus-based methods Hatzivassiloglou and McKeown (1997) (HM)
builds on the observation that some linguistic constructs, such as conjunctions, impose constraints on the semantic orientation of their constituents
clustered adjectives from the Wall Street Journalin a graph into positive and negative sets based on the type of conjunction between them
cluster with higher average frequency was deemed to contain positive adjectives, lower average frequency meant negative sentiment
Limitations algorithm limited to adjectives (also adverbs --
Turney and Littman, 2002) requires large amounts of hand-labeled data to
produce accurate results
Web As Corpus Peter Turney (Turney, 2002; Turney and Littman, 2002; Turney and Littman, 2003) More general method, does not require previously annotated data for training
Induce sentiment of a word from the strength of its association with 14 seed words with known positive or negative semantic orientation
Two methods for association: Point-wise mutual information Semantic latency
Used web as data ran 14 queries on AltaVista using NEAR operator to acquire co-occurrence statistics with the 14 seed words
Results evaluated against GI on a variety of test settings gave up to 97.11% accuracy for top 25% of words
Size of the corpus had a considerable effect 10-million word corpus instead of the full Web content reduced accuracy to 61.26-68.74%
LSA performed relatively better than PMI on 10m word corpus LSA more complex, harder to implement
End of An Era
Due to its simplicity, high accuracy and domain independence, the PMI method became popular
In 2005, AltaVista discontinued support for NEAR operator, upon which the method relied
Attempts to substitute NEAR with AND led to considerable deterioration in system performance
Bethard et al. (2004) Used two different methods to acquire opinion words from corpora calculated frequency of co-occurrence with seed words taken from Hatzivassiloglou and McKeown, computed the log-likelihood ratio
computed relative frequencies of words in subjective and objective documents
First method produced better results for adverbs and nouns, gave higher precision but lower recall for adjectives
Second method worked best for verbs
Other Approaches Kim and Hovy (2005)
separated opinion words from non-opinion words by computing their relative frequency in subjective (editorial) and objective (non-editorial) texts from TREC data
Riloff et al. (2003), Grefenstetteet al. (2006) Used syntactic patterns Learn lexico-syntactic expressions characteristic for subjective nouns
Dictionary-Based Methods Addresses some of the limitations of corpus-based methods
Use semantic resources such as WordNet, thesauri
Two approaches: Rely on thesaural relations between words (synonymy, antonymy, hyponymy, hyperonymy) to find similarity between seed words and other words
Exploit information contained in definitions and glosses
Use of WordNet
Kim and Hovy (2004,2005) Extended word lists by using WordNet synsets
Ranked lists based on sentiment polarity assigned to each word in synset based on WordNetdistance from positive and negative seed words
Similar approach used by Huand Liu (2004)
SentiWordNet Developed by Esuli and Sebastiani Trained several classifiers to give rating for positive, negative, objective for each synset in WordNet 2.0 Scores from 0 - 1
Beyond Synsets Kamps et al. (2004)
Tagged words with Osgoods three semantic dimensions
Computed shortest path through WordNet relations connecting a word to words representative of the three categories (e.g., good and bad for evaluation)
Esuli and Sebastiani (2005) Classified words in WordNet into positive and negative based on synsets, glosses and examples
Further Beyond Synsets Andreevskaia and Bergler (2006)
Take advantage of the semantic similarity between glosses and head words
Start with a list of manually annotated words, expand with synonyms and antonyms, search WordNet glosses for occurrences of seed words
If a gloss contains a word with known sentiment, head word deemed to have same sentiment
Suggest that overlap measure reflects the centrality of the word in the sentiment category
Role of Neutrals Most work cited so far classifies words as positiveor negative
Results vary between 60-80% agreement with GI as gold standard
Adding neutrals severely reduces accuracy by 10-20%, depending on part of speech
Problems Words without strong positive or negative connotations are difficult to categorize accurately Strength of +/- affinity can be used as a measure
Highest accuracy for words on extremes of the +/- poles
Many words have both sentiment-bearing and neutral senses E.g., great typically tagged as positive, but according to statistics in WordNet, used neutrally 75% of occurrences
Solve this by using sense-tagged word lists
The need for sense-level sentiment annotation has recently attracted considerable attention
Development of methods to devise sense-tagged word lists
Sense-tagged Word Lists Andreevskaia and Bergler (2006)
applied gloss-based sentiment tagging to sense level
extended their system by adding a word sense disambiguation module (Senti-Sense) used syntactic patterns to disambiguate between sentiment-bearing and neutral senses
learned generalized adjective-noun patterns for sentiment- bearing adjectives from unambiguous data
abstracted learned patterns to higher levels of hypernym hierarchies using predetermined propagation rules
applied learned patterns to disambiguate adjectives with multiple senses in order to locate senses that bear sentiment
Sense-tagged Word Lists
Esuli and Sebastiani (2007) Application of random walk PageRanking algorithm to sentiment tagging of synsets
Takes advantage of the graph-like structure of the WordNethierarchy
Sense-tagged Word Lists Wiebe and Mihalcea (2006)
Sense-level tagging for subjectivity(i.e., neutrals vs. sentiment-bearing words)
Automatic method for sense-level sentiment tagging based on Lins(1998) similarity measure
Acquire a list of top-ranking distributionally similar words
Compute WordNet-based measure of semantic similarity for each word in the list
Beyond Positive and Negative
Ide (2006) Bootstrapped sense-tagged word lists for semantic
categories based on FrameNet frames (e.g., commitment, reasoning, etc.) GI categories (hostile/friendly weak/strong,
Treated lexical units associated with a given frame as a bag of words
Used WordNet::Similarity to compute similarity among senses of the words Relation set consisted of different pairwisecombinations of synsets, glosses, examples, hypernyms, and hyponyms
Based on results, compute suresenses (strongest association) and retain; iterate
Augment with suresenses for synsets, hypernyms, hyponyms
Resulting lists very high on precision (98%), lower on recall
Used hierarchical clustering to group senses in a given category into positive and negative and finer-grained distinctions
Results for judgment-communication
Scarcity of manually annotated resources for system training and evaluation
Some work uses user-created rankings in online product, book, or movie reviews ranking scale (good-bad, liked-disliked, etc.) easily available, fast to collect Drawbacks
contain significant amount of noise erroneous ratings misspellings phrases in different languages
exacerbated when researchers automatically break reviews into sentences or snippets
Corpus Level of annotation
Corpus size Link
MPQA Phrases and sentences
Private states 535 documents (10 657 sentences)
Expressions and sentences
Subjectivity and objectivity
2 sets of documents, 500 sentences each from WSJ Treebank
500 reviews http://www.cs.uic. edu/liub/FBS/FBS.html
SemEval-07 Task 14 dataset
Headlines Sentiment -100 to +100 polarity scale, 6 basic emotions
2225 news headlines http://www.cse. unt.edu/~rada/affectivetext/
Rate of agreement among human annotators may reveal important insights about the task, provide a critical baseline for system evaluation
Few inter-annotator agreement studies conducted to date
Inter-annotator agreement on popular sentiment genres such as movie reviews, blogs, so far unexplored
Agreement depends on unit annotated (sentence or text) annotation type (subjectivity or sentiment) domain and genre
IA Studies Sentence subjectivity labels (Wiebe et al., 1999) annotators classified sentence as subjective if it contained any significant expression of subjectivity
multiple rounds of training and annotation instructions adjustment
pairwise Kappa over WSJ test set ranged from 0.59 to 0.76
Variable Results Kim and Hovy (2004)
Relatively high agreement (=0.91) between two annotators who assigned positive, negative, and n/a labels to 100 newspaper sentences
Gamon and Aue (2005) Similar study using car reviews produced pairwise Kappa of 0.70 - 0.80
Strapparava and Mihalcea (2007) Study suggests inter-annotator agreement substantially lower on fine-grained types of annotation
Six annotators assigned sentiment score and Ekmans six basic emotions scores to news headlines
Agreement (Pearson correlation) 78.01 for sentiment
Agreement for emotion labels ranged from 36.07 (surprise) to 68.19 (sadness)
IA for texts
Wiebe et al., 2001 Two genres
flames (hostile, inflammatory messages) =0.78
opinion pieces from WSJ =0.94-0.95
Domain and genre may affect level of inter-annotator agreement
Sentiment Analysis Ultimate goal of sentiment and subjectivity annotation is the analysis of clauses, sentences, and texts
Resources : Lists Annotated (validated) corpora
Subjectivity analysis uses MPQA corpus reliable gold standard for training and evaluation of
Sentiment analysis must rely on small manually annotated test sets
created ad hoc seldom made publicly available
too small for machine learning methods
Sentiment Analysis Single text often includes both positive and negative sentences
Sentences and clauses regarded as the most natural units for sentiment/subjectivity annotation Sentiment usually more homogenous in sentence than whole text
But harder to identify due to a limited number of clues
Work on improving system performance by performing analysis simultaneously at different levels Sentence-level analysis has high precision Text-level analysis has high recall
Words vs. Other Features
Words/unigrams provide good accuracy in sentence-level sentiment tagging Yu and Hatzivassiloglou (2003)
no significant improvement when bigrams or trigrams added to feature set
Kim and Hovy (2004) presence of sentiment-bearing words works better than more sophisticated scoring methods
Riloff et al. (2006) similar results for unigrams vs. unigrams plus bigrams, extraction patterns
subsumption hierarchy and feature selection brought less than 1% improved accuracy
Other Features vs. Words Some studies show gains with use of features Wilson et al. (2005), Andreevskaia et al. (2007)
syntactic properties of the phrase (role in sentence structure, presence of modifiers and valence shifters) yield statistically significant increases in system performance sentiment and subjectivity analysis
Gains in subjectivity analysis: presence of complex adjectival phrases (Bethard et al., 2004)
similarity scores (Yu and Hatzivassiloglou, 2003 position in the paragraph (Wiebe et al., 1999)
Gains in sentiment analysis syntactic patterns and negation (Hurst and Nigam, 2004; Mulder et al., 2004; Andreevskaia et al., 2007)
knowledge about the opinion holder (Kim and Hovy, 2004)
target of the sentiment (Hu and Liu, 2004)
Multiple Features Some experiments suggest improvement gained when multiple feature setscombined Hatzivassiloglou and Wiebe, 2000
combination of lists of adjectives tagged with dynamic, polarity, and gradability labels best predictor of sentence subjectivity
Riloff et al. (2003) accuracy of subjectivity tagging results improved with addition of each new feature (25 in all)
Gamon and Aue (2005) sentiment tagging using larger number of features increased average precision and recall
SemEval-2007 Affective Text Task
Opportunity to compare systems for sentiment analysis run on the same dataset of 2000 manually annotated news headlines
Machine learning methods had highest recall at the cost of low precision nave Bayes-based CLaC-NB (Andreevskaia and
Bergler, 2007) word-space model-based SICS (Sahlgren et al.,
Knowledge-based unsupervised approaches had highest precision, but low recall because few sentiment clues per headline CLaC (Andreevskaia and Bergler, 2007) UPAR7 (Chaumartin, 2007)
General Conclusion (so far) Subjectivity classification
statistical approaches that use nave Bayesor SVM significantly outperform non-statistical techniques
Binary (positive-negative) sentiment classification non-statistical methods that rely on presence of sentiment markers in a sentence, or on strength of sentiment associated with these markers, yield as good or better accuracy than statistical approaches
BUT: both methods show some strengths in both tasks
Need LOTS more research
Features Choice of features used by a sentiment annotation system is a critical factor
Wide range of features used: lists of words lemmas or unigrams Bigrams higher order n-grams part-of-speech syntactic properties of surrounding context
Use of Features
Use of multiple models can improve the performance of both sentiment and subjectivity classifiers Use of different features within the same classifier or in a community of several classifiers improves system performance
Features Studies show improvement when using ngrams vs. unigrams (Dave et al., 2003; Cui et al., 2006; Aue and Gamon, 2005; Wiebe et al., 2001; Riloff et al., 2006)
Improvement when n-gram models / word lists augmented with context information (Wiebe et al., 2001b; Andreevskaia et al., 2007)
features associated with syntactic structure(Gamon, 2004)
combination of words annotated for semantic categories related to sentiment or subjectivity (Whitelaw et al., 2005; Fletcher and Patrick, 2005; Mullen and Collier, 2004)
Feature Selection Costly or computationally unfeasible to use all
features Aue and Gamon (2005)
n-grams selected based on Expectation Maximizationalgorithm
Riloff et al. (2006) Use a subsumption hierarchy for n-grams (unigrams and bigrams) and extraction patterns
if a features words and dependencies are a superset of a more general ancestor in the hierarchy, discard
only features with higher information gain were allowed to subsume less informative ones
for both subjective/objective and positive/negative text-level classifications, a combined use of subsumption and traditional feature selection improves performance of subjective/objective and positive/negative text-level classification
Part of Speech Subjectivity
Adjectives best predictors of subjectivity (Hatzivassiloglou and Wiebe, 2000)
Modals, pronouns, adverbs, cardinal numbers also used as subjectivity clues (Wiebe et al., 1999; Bruce and Wiebe, 2000)
Sentiment Combined use of words from all parts-of-speech produced more accurate tags (Blair et al., 2004; Salvetti et al., 2004)
Feature Generation Most sentiment classifiers use standard machine learning techniques to learn and select features from labeled corpora Works well when large labeled corpora available for training and validation (e.g., movie reviews)
Falls short when training data is scarce different domain or topic different time period
Led to increased interest in unsupervised and semi-supervised approaches to feature generation
Some Promising Research
Systems trained on a small number of labeled examples and large quantities of unlabelled in-domain data perform relatively well (Aue and Gamon, 2005)
Structural correspondence learning applied to small number of labeled examples sufficient to adapt to new domain (Blitzer et al., 2007)
So far performance of these methods inferior to supervised approaches and knowledge-based methods
Availability of word lists and clues makes knowledge-based approaches an attractive alternative to supervised machine learning when labeled data is scarce
Variety of domains used in sentiment analysis movie, music, book and other entertainment reviews
product reviews Blogs dream corpus Etc.
Choice of domain can have a major impact on results
Movie Reviews Popular domain in sentiment research Positive and negative words/expressions do not necessarily convey the opinion holders attitude E.g., evil used in movie reviews when referring to characters or plot, does not convey sentiment toward the movie itself (Turney, 2002)
Simple counting of positive and negative clues in movie review texts insufficient
Clues acquired from out-of-domain sources often fail
Sentiment towards whole product sum of the sentiment towards its parts, components, and attributes(Turney, 2002)
General word lists perform better on product reviews (Turney, 2002; Kennedy and Inkpen, 2006)
Attitude Influence Texts with positive sentiment easier to classify than negative ones Kennedy and Inkpen, 2006; Hurst and Nigam, 2004; Dave et al., 2003; Koppel and Schler, 2006; Chaovalit and Zhou, 2005
Possible explanations: positive documents more uniform (Dave et al., 2003)
positive clues have higher discriminantvalue (Koppel and Schler, 2006)
negative texts characterized by extensive use of negations and other valence shifters that reverse the sentiment conveyed by individual words (e.g., not bad) (Pang and Lee, 2004)
Improvement in accuracy when valence shifters taken into account (Kennedy and Inkpen, 2006; Andreevskaia et al., 2007) but negative impact reported when negation included in feature set (Dave et al., 2003)
Use of balanced evaluation sets with equal number of positive and negative documents has become a standard in sentiment research
Wide variety of classification approaches used: simple keyword counting methods, with or without
scoring rule-based methods content analytical methods (statistical) SVM, nave Bayes and other statistical classifiers
used alone, sequentially, or as a community
Comparison of results does not provide a definite answer as to which of these methods is the best for sentiment or subjectivity tagging choice of features and training domain have a more
impact on accuracy than choice of classification algorithm
comparison of performance of systems evaluated on different domains or different feature sets not conclusive
Sentiment and subjectivity analysis has evolved into a strong research stream in NLP
State-of-the-art systems can reach up to 90% accuracy on certain domains But need a generally applicable method
Research Directions Development of semi-supervised machine-learning approaches that will maximize the usefulness of the available resources and ensure domain adaptation with limited in-domain data
Creation of reliable and extensive resources such as lists of words and expressions, syntactic patterns, combinatorial rules, and annotated corpora
Creation of uniform ways to denote and represent sentiment and subjectivity annotation
Bibliography Andreevskaia, A. and S. Bergler: 2007, CLaC and CLaC-NB: Knowledge-
based and Corpus-based Approaches to Sentiment Tagging. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.
Andreevskaia, A., S. Bergler, and M. Urseanu: 2007, All Blogs are not made Equal. In: International Conference on Weblogs and Social Media (ICWSM-2007). Boulder, Colorado.
Aue, A. and M. Gamon: 2005, Customizing Sentiment Classifiers to New Domains: a Case Study. In: RANLP-05, the International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria.
Bethard, S., H. Yu, A. Thornton, V. Hatzivassiloglou, and D. Jurafsky: 2004, Automatic Extraction of Opinion Propositions and their Holders. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004) .Stanford University.
Blair, L., A. Jaharria, S. Lewis, T. Oda, C. Reichenbach, J. Rueppel, and F. Salvetti: 2004, Impact of Lexical Filtering on Semantic Orientation. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.
Bibliography Breck, E., Y. Choi, and C. Cardie: 2007, Identifying expressions of
opinion in context. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-2007). Hyderebad, India.
Bruce, R. and J. Wiebe: 2000, Recognizing subjectivity: A case study of manual processing. Natural Language Engineering 5(2), 187205.
Chaovalit, P. and L. Zhou: 2005, Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches. In: Proceedings of HICSS-05, the 38th Hawaii International Conference on System Sciences.
Chaumartin, F.-R.: 2007, UPAR7: A Knowledge-based System for Headline Sentiment Tagging. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.
Cui, H., V. Mittal, and M. Datar: 2006, Comparative Experiments on Sentiment Classification for Online Product Reviews. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI-2006). Boston, MA.
Bibliography Das, S. R. and M. Y. Chen: 2001, Yahoo! For Amazon: Sentiment
extraction from small talk on the Web. In: Asia Pacific Finance Association Annual Conference (APFA01).
Dave, K., S. Lawrence, and D. M. Pennock: 2003, Mining the Peanut gallery: opinion extraction and semantic classification of product reviews. In: (WWW03). Budapest, Hungary, pp. 519528.
Drezde, M., J. Blitzer, and F. Pereira: 2007, Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In: 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). Prague, Czech Republic.
Dunning, T.: 1993, Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19, 6174.
Ekman, P.: 1993, Facial expression of emotion.. American Psychologist 48, 384392.
Fletcher, J. and J. Patrick: 2005, Evaluating the Utility of Appraisal Hierarchies as a Method for Sentiment Classification. In: Proceedings of Australian Language Technology Workshop 2005. Sydney, Australia, pp. 134142.
Bibliography Gamon, M.: 2004, Sentiment classification on customer feedback
data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceeding of COLING-04, the 20th International Conference on Computational Linguistics. Geneva, CH, pp. 841847.
Gamon, M. and A. Aue: 2005, Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In: Proceedings of the ACL-05 Workshop on Feature Engineering for Machine Learning in Natural Language Processing. Ann Arbor, US.
Goldberg, A. and J. Zhu: 2006, Seeing stars when there arent many stars: Graph-based semi-supervised learning for sentiment categorization. In: Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing. Boston, MA.
Hatzivassiloglou, V. and J. Wiebe: 2000, Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: 18th International Conference on Computational Linguistics (COLING-2000).
Hu, M. and B. Liu: 2004, Mining and summarizing customer reviews. In: KDD-04. pp. 168177.
Hurst, M. and K. Nigam: 2004, Retrieving topical sentiments from Online document collection. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.
Bibliography Kennedy, A. and D. Inkpen: 2006, Sentiment Classification of Movie
Reviews Using Contextual Valence Shifters. Computational Intelligence 22(2), 110125.
Kim, S.-M. and E. Hovy: 2004, Determining the Sentiment of Opinions. In: Proceedings COLING-04, the Conference on Computational Linguistics. Geneva, CH, pp. 13671373.
Kim, S.-M. and E. Hovy: 2005a, Automatic Detection of Opinion Bearing Words and Sentences. In: Companion Volume to the Proceedings of IJCNLP-05, the Second International Joint Conference on Natural Language Processing. Jeju Island, KR, pp. 6166.
Kim, S.-M. and E. Hovy: 2005b, Identifying Opinion Holders for Question Answering in Opinion Texts. In: Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains. Pittsburgh, US.
Kim, S.-M. and E. Hovy: 2006, Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. In: Proceedings of the ACL/COLING Workshop on Sentiment and Subjectivity in Text. Sydney, Australia.
Koppel, M. and J. Schler: 2006, The importance of neutral examples for learning sentiment. Computational Intelligence 22(2), 100116.
Bibliography Mao, Y. and G. Lebanon: 2006, Sequential Models for Sentiment
Prediction. In: Proceedings of the ICML workshop on Learning in Structured Output Spaces.
McDonald, R., K. Hannan, T. Nevon, M. Wells, and J. Reynar: 2007, Structured Models for Fine-to-Coarse Sentiment Analysis. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007). Prague, Czech Republic.
Mulder, M., A. Nijholt, M. den Uyl, and P. Terpstra: 2004, A Lexical Grammatical Implementation of Affect. In: Proceedings of TSD-04, the 7th International Conference Text, Speech and Dialogue, Vol. 3206 of Lecture Notes in Computer Science. Brno, CZ, pp. 171178.
Mullen, T. and N. Collier: 2004, Sentiment analysis using support vector machines with diverse information sources. In: Proceedings of EMNLP-04, 9th Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain.
Pang, B. and L. Lee: 2004, A sentiment education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL. pp. 271278. arXiv:cs.CL/040958.
Bibliography Pang, B., L. Lee, and S. Vaithyanathan: 2002, Thumbs up? Sentiment
classification using machine learning techniques. In: Conference on Empirical Methods in Natural Language Processing (EMNLP-2002). pp. 7986.
Read, J.: 2005, Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL-2005 Student Research Workshop. Ann Arbor, MI.
Riloff, E., S. Patwardhan, and J. Wiebe: 2006, Feature Subsumptionfor Opinion Analysis. In: Proceedings of EMNLP-06, the Conference on Empirical Methods in Natural Language Processing. Sydney, AUS, pp. 440448.
Riloff, E., J. Wiebe, and T. Wilson: 2003, Learning subjective nouns using extraction pattern bootstrapping. In: W. Daelemans and M. Osborne (eds.): Proceedings of CONLL-03, 7th Conference on Natural Language Learning. Edmonton, CA, pp. 2532.
Sahlgren, M., J. Karlgren, and G. Eriksson: 2007, SICS: Valence Annotation Based on Seeds in Word Space. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.
BibliographySalvetti, F., S. Lewis, and C. Reichenbach: 2004, Impact of lexical
filtering on overall polarity identification. In: Exploring Attitude and Affect in Text: theories and application (AAAI-EAAT 2004). Stanford University.
Snyder, B. and R. Barzilay: 2007, Multiple Aspect Ranking using the Good Grief Algorithm. In: Proceedings of NAACL-2007. Washington, DC.
Stoyanov, V., C. Cardie, D. Litman, and J. Wiebe: 2004, Evaluating an Opinion Annotation Scheme Using a New Multi-Perspective Question and Answer Corpus. In: J. G. Shanahan, J. Wiebe, and Y. Qu (eds.): Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Stanford, US.
Strapparava, C. and R. Mihalcea: 2007, SemEval-2007 Task 14: Affective Text. In: 4th International Workshop on Semantic Evaluations (SemEval 2007). Prague, Czech Republic.
Turney, P.: 2002, Thumbs up or thumbs down? Semantic orientation applied to un-supervised classification of reviews. In: 40th Annual Meeting of the Association of Computational Linguistics (ACL02)). pp. 417424.
Bibliography Whitelaw, C., N. Garg, and S. Argamon: 2005, Using Appraisal
Taxonomies for Sentiment Analysis. In: Proceedings of CIKM-05, the ACM SIGIR Conference on Information and Knowledge Management. Bremen, Germany.
Wiebe, J.: 2002, Instructions for Annotating Opinions in Newspaper Artic. Technical Report TR-01-101, University of Pittsburgh, Department of Computer Science, Pittsburgh, PA.
Wiebe, J., E. Breck, C. Buckley, C. Cardie, P. Davis, B. Fraser, D. Litman, D. Pierce, E. Riloff, T. Wilson, D. Day, and M. Maybury: 2003, Recognizing and Organizing Opinions Expressed in World Press. In: Proceedings of the AA AI Spring Symposium on New Directions in Question Answering.
Wiebe, J., R. Bruce, M. Bell, M. Martin, and T. Wilson: 2001a, A corpus study of Evaluative and Speculative Language. In: Proceedings of the 2nd ACL SIGDial Workshop on Discourse and Dialogue). Aalberg, Denmark.
Wiebe, J. and E. Riloff: 2005, Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In: Proceeding of CICLing-05, International Conference on Intelligent Text Processing and Computational Linguistics, Vol. 3406 of Lecture Notes in Computer Science. Mexico City, MX, pp. 475486.
Bibliography Wiebe, J., T. Wilson, and M. Bell: 2001b, Identifying Collocations
for Recognizing Opinions. In: Proceedings of the ACL Workshop on collocation. Toulouse, France.
Wiebe, J., T. Wilson, and C. Cardie: 2005, Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39(23), 165210.
Wiebe, J. M., R. F. Bruce, and T. P. OHara: 1999, Development and Use of a Gold-Standard Data Set for Subjectivity Classifications . In: 37th Annual Meeting of the Association for Computational Linguistics (ACL-99). pp. 246253.
Wilson, T. and J. Wiebe: 2003, Annotating Opinions in the World Press. In: 4th SIGdial Workshop on Discourse and Dialogue (SIGdial-03).
Wilson, T., J. Wiebe, and R. Hwa: 2006, Recognizing Strong and Weak Opinion Clauses. Computational Intelligence 2(22), 7399.
Yu, H. and V. Hatzivassiloglou: 2003, Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences. In: M. Collins and M. Steedman(eds.): Proceedings of EMNLP-03, 8th Conference on Empirical Methods in Natural Language Processing. Sapporo, Japan, pp. 129136.