Twitter Sentiment Analysis for Marketing Research
Rachel BugejaSupervisor: Mr. Charlie Abela
Department of Intelligent Computer SystemsUniversity of Malta
ABSTRACTWith the popularity of social media networks, Natural Lan-guage Processing (NLP) faced new challenges because of thedynamic nature of the data published on these platforms.Social media networks are flooded with posts every second,of different lengths and formats about news, users opin-ions and observations, comments, etc.. This informationcan be exploited by companies, especially marketing compa-nies, to control and evaluate their social media hype, onlinemarketing strategies and customer relationships. The pa-per focuses on sentiment analysis and attempts to classifytweets based on their sentiment for a marketing tool, Twit-ter MAT (Marketing Analysis Tool) which summarizes thepublics view about products. Twitter MAT uses NaturalLanguage technologies, in particular a Named Entity Recog-nition (NER) found in the GATE text processor to performsentiment analysis . Twitter MAT also uses other tech-niques such as discourse analysis, which is namely the effectof adverbs on adjectives, to detect sentiment polarity and itsstrength. The results obtained for the accuracy on the clas-sification of tweets based on their sentiment was 75% whencompared to existing, manually annotated datasets and theresults we obtained from the conducted survey. However,when compared to datasets which focus on sentiment polar-ity strength, Twitter MAT reached a 100% agreement.
1. INTRODUCTIONThe increase of social media platforms and their frequent
usage by the public , resulted in these networks to beflooded with information every second. Billions of accountsthroughout famous social media networks are used every sin-gle day to post snippets of different lengths and formatsabout users news, opinions and observations, comments,etc.. This data can be structured into information thatis exploited by organisations for analysing and monitoringsales and for future product development.
Nowadays, before buying a product, people tend to search
for information and any reviews about the product. Throughthe Web, one can instantly find others opinions or reviewsand experiences related to particular products. People usesocial media platforms such as Facebook and Twitter toinstantly update their friends circle about new purchasesand first impressions, their frequently used products or dis-appointing products and also when certain products breakdown. These updates or posts or even micro-conversationsare important to companies since these are authentic con-sumer insights about any product or campaign. However,due to the huge amount received and also their short lengthand noisy content, these streams pose new challenges fornatural language processing and interpreting.
2. AIMS AND OBJECTIVESThe aim behind this paper is that of extracting informa-
tion from social media streams, in this case Twitter, andinterpret this data into structured information for productmarketing research uses. This information can be used toanalyze how the public is reviewing products and how theirperception changes over time. Consequently, we shall be an-alyzing tweets that mention or discuss certain products thatcompanies offer. Thus, our aim is to provide the user witha market research tool that provides instant consumer in-sights and a better understanding as to what these insightsare indicating.
The objectives behind this research are the following:
The identification and filtering of noisy data for senti-ment analysis: Certain data, such as duplicate tweets,will be totally discarded, while other noisy data, suchas tweets with various meta-data (example: hashtagsor mentions), which may be useful to our analysis, willbe exploited.
To analyse the strength of sentiment polarity by analysingthe effect of adverbs on adjectives and evaluate how ad-verbs intensify or minimize the sentiment polarity of agiven text.
Providing the user with a tool through which he canquery by topic through the use of keywords, for par-ticular information extracted from Twitter. The re-trieved data will be displayed in a timeline which showshow the opinion about the topic or product chosenby the user has evolved over time depending on thesentiment of the publics view. The tool will be ableto receive data continuously so that the information
represented through the timeline will be realistic andupdated.
This tool will also point out what product featuresstood out through the use of a tag cloud which is alsoupdated throughout time. This also helps the user toidentify what the people associate with the products.
3. RELATED WORKPrevious work on opinion mining and sentiment analysis
on web content, such as Yu and Hatzivassiloglou (2003),Kim and Hovy (2004) , Hu and Liu (2004) on sentimentanalysis have different approaches, however keyword-basedapproach is the most common. In short, they all have a col-lection of sentiment holding bag-of-words which are assigneda binary sentiment of either positive or negative. When oneof those keyword appear in a phrase or paragraph, the senti-ment polarity is worked out. Other studies including Wilsonet al. , introduce another sentiment categories such asneutral or both (a phrase that contains both negative andpositive lexicons).
One of the biggest challenges in NLP is noisy data. Sincethe idea for this thesis is to classify tweets, we were particu-larly interested in studies which work on these microbloggingtexts. In , an interesting approach was taken to sanitizetweets to improve the performance of the sentiment classi-fier, due to the amount of noisy content in tweets. Capital-ized words, excessive punctuation, upper cases, emoticonsand words that indicate laughter, and Twitter-specific char-acters such as the mention (@) were replaced by a keywordas they were considered as sentiment intensifiers. However,un-opinionated words knows as Stop Words in NLP and suf-fixes were removed. This sanitation process was proven toimprove sentiment classification.
On the other hand, in  the system selects sentenceswhich mention the topic required and contain opinionatedkeywords, calculates the polarity of each word separatelyand then calculates the polarity of the whole phrase. How-ever, this required a large amount of time dedicated to train-ing text and words in order to calculate the score for eachword. Studies from Pang et al.  showed that using key-words to determine the polarity of a text results in a 60%accuracy when compared to manually annotated texts.
3.1 SarcasmIn , a study was conducted explicitly on Tweets and
Amazon reviews. The Semi-Supervised Sarcasm Identifica-tion algorithm (SASI) was used for identifying sarcastic pat-terns and classifying tweets depending on the probability ofbeing sarcastic. To train the classifier, tweets having thehashtag #sarcasm were used, since these were considered tobe the ones with the highest probability to be sarcastic sincethey were explicitly marked by the user.
However, tweets containing the hashtag were found to bebiased and too noisy. This was due to the usage of thehashtag to non-sarcastic tweets (the main reason being thatthe user does not fully understand what is meant by sar-casm) and the use of the hashtag when talking about an-other tweet, document or external entity, such as: I love itwhen #sarcasm is used in TV shows, theres always some-
one who doesnt get it. Other tweets were also impossible toclassify as being sarcastic without the explicit sarcastic tag.It is also important to point out that only 4.09% of tweetsare explicitly tagged with this hashtag  (125 tweets outof 3.3 million) which shows that either users dont use thehashtag to mark their tweets or sarcasm is not used regu-larly.
The biggest problem with detecting sarcasm in microblog-ging was identified in the study  where it was stated thattweets are not accurately labeled. Again, hashtags placean important role in classification of sarcastic tweets. Infact, several hashtags that are somewhat related to sarcasmwere used to collect tweets (#sarcasm, #sarcastic) and onlytweets having these hashtags at the very end were used. Thiswas based on the result from the previous study  whereit was stated that tweets with #sarcasm were too noisy andbiased, where a marker at the end is more probable to indi-cate that a tweet is sarcastic than using sarcasm as a nounin a tweet. A further manual inspection was conducted toeliminate tweets where sarcasm was the main subject andnot being used as a marker. Apart from this step, man-ual inspection was also used to compare results from thesystems classification and human classification of sarcastictweets. With only a 50% agreement between the humanjudges themselves, this study shows how difficult it is todetect sarcasm from text.
3.2 Strength in Sentiment PolarityAlthough several studies show different methodologies, an
unlikely but interesting approach was taken in  where ad-verbs and adjectives were studied to show how they affectthe sentiment of a sentence. Although other studies  did analyze the use of adjectives to determine the sentimentof a text, this study was the first study to take into con-sideration adverbs as well, in particular adverbs of degree.These kind of adverbs affect the sentiment polarity and caneven reverse it. The method used to assign a score to eachadverb and the associated axiomatic rules, were defined toshow the relationship between adverbs and adjectives. Theresults show that this approach reaches a high level of pre-cision on sentiment classification.
4. DESIGNThe main purpose of Twitter MAT was to build a system
which could gather tweets from Twitter and classify themaccording to their sentiment group while displaying them ina timeline manner so as to show how the mentioned senti-ment is changing by time. This section discusses the designissues and decisions taken for modeling this system.
Twitter Mat includes two main components, the User In-terface, a component in the form of a Web application andthe Classification Module. The component diagram depictedin Figure 1 shows how the various components interact witheach other. The Classification Module holds any processingof the tweets, from data gathering to data storage, includingall algorithms to filter noise in tweets, annotate sentimentand the scoring system. The web application, on the otherhand allows the user to get access to this data, handle howresults are displayed to the user and give extra informationfor the interpretation of the data.
The Twitter module is the intermediary between the web
application and the Annotation module. Essentially it isresponsible of fetching tweets, checking for existing tweetsgrouped by the same keyword and filtering. Tweets are re-trieved through the use of the Twitter API which handlesany type of interaction between a developer and the Twit-ter service. The API, given a query, returns tweets in JSONformat, including information about the tweet itself such asthe author, date of publication, number of retweets, loca-tion, any URLs included and several other details.
The Annotation module processes the tweets in order tocalculate the score, which then determines the tweets sen-timent and its strength. Once a tweet enters the Anno-tation module it is first checked as to whether it containsany abbreviations or acronyms which are commonly foundin tweets considering their 140 character limit. The abbre-viations and acronyms found are converted to their full andoriginal meaning. This procedure ensures that each wordcan be understood by the GATE system while annotatingthe content.
Each tweet is then processed through GATE to annotateits contents. Annotations are done with the use of ANNIEand the POS tagger provided. Although the POS taggercan annotate several different kinds of words and entities,it was necessary to add new annotations for the purpose ofthis thesis. The following annotations were added: Senti-ment Words, which can be divided in positive and negativewords and Adverbs of Degree.
Once GATE annotates the content, each annotation isparsed and evaluated. Based on which type of annotationfound, a score is given to the tweet. The scoring systemworks by increasing the score if a positive word is foundwhile the score is reduced if a negative word is found. Thesame applies to the mentioned adverbs of degree. Dependingon the score given, the tweets are then classified as HighlyPositive, Positive, Negative or Highly Negative. They arestored to the appropriate topic, by adding it to the exist-ing list, if any. If the tweet being stored already exists, thisstage is skipped. Unfortunately, filtering retweets does notlimit the Twitter API from returning the same tweet twice.Moreover, if no annotations are found in the resulting out-put, the tweet are not stored. Although previous works onsentiment analysis usually put these tweets in a neutralcategory , they do not express any form of feedback orperception of the product and thus are considered as noisydata for our paper.
As soon as the new tweets are stored, each tweet is checkedfor keyword extraction, which are then displayed in a tagcloud. This means that for every word it holds, it is checkedas to whether there are any keywords which are related tothe topic or any words which are commonly found in tweetsregarding the same topic. This procedure is done on thenew tweets so that keywords shown are up-to-date and re-flect the new trends. It was decided that keywords will beextracted by our simple but effective algorithm. Keywordextraction was made possible by eliminating any type ofstop words. These include any form of adverbs, conjunc-tions, modal verbs and common nouns, amongst others. Theremaining words are then checked if they appear in othertweets and if they are found in at least three tweets from
the same set, i.e. from the same query or topic, in whichcase they are considered as keywords.
Figure 1: Top-Level Design
The database is a crucial part of this system. Given thatthe Twitter API can only give tweets less than a week old, itis important to store data continuously so as to have a big-ger time span to show on the timeline. Moreover, this givesthe user the ability to search for the formerly perceived sen-timent of certain products and analyze how this perceptionis changing based on the results shown in the timeline. Thedatabase, in this case, is a relational database which is usedto store the actual tweet, the score given after the classifica-tion algorithm, the number of retweets and the date it waspublished.
5. IMPLEMENTATIONFollowing the previous Design Section, this section will
continue to explain the process of building the system anddiscuss in further detail how each module, the Classificationmodule and the Web Application module, was implementedbased on the design already discussed. We will also elab-orate on techniques used, frameworks and libraries used tohelp the development and how they were integrated and theAPI used to interact with Twitter. The Figure 2 belowshows a top-level diagram of the system showing both mod-ules and highlighting which technologies were used in orderto develop our system.
5.1 Classification of TweetsIn order to get tweets from Twitter, we used the search
functionality provided by twitter4j which returns a numberof tweets. This number depends on the number of tweetsavailable in the last 7 days with the keyword given as a queryto the search functionality, from the day being queried. Italso returns duplicate tweets if there is a limited number of
Figure 2: Top-Level Design with Technologies Used
tweets available. Once the search is done, the system willloop through all the tweets and filter which tweets it wouldlike to process. Filtering is based on these characteristics:
The tweet must be in English
The tweet must contain the keyword specified in thequery due to the noisy and unrelated data returnedfrom the API.
The tweet must not be a retweet. Processing suchtweets can lead to duplication of data (since it is highlyprobable that the original tweet is also retrieved). In-stead of storing a repeated tweet, we will be using theretweeted count of the original tweet so that it is re-flected in the percentages of how many tweets werefound for each sentiment group.
The tweet must not be a reply to someone elses tweetdue to the possibility of propaganda.
The tweet is checked for any sarcasm indicator. Suchindicators include the explicit hashtag #sarcasm and#not. The rest of the indicators can be found in .
The tweets that pass this filtering process are then passedto the Annotation module and through GATE. By split-ting the content into separate words, the system identifiesany abbreviations from a given list 1. If an abbreviation isfound, the abbreviated word or phrase will be replaced byits full phrase based on a corresponding list. Both of theselists were extracted from the Twitter Plugin provided byGATE. Once every word is analyzed, the whole tweet con-tent is passed on as a document to a corpus. The systemthen commands the document to annotate its content, in-cluding user added annotations. The system loops througheach annotation, checking for its type. The score is thenincreased or decreased depending on the type found. Theaxioms below shows how scores are assigned.
Let AFF,DOUBT,WEAK and STRONG be the sets of ad-verbs of affirmation,adverbs of doubt,adverbs of weak inten-sity,and adverbs of strong intensity respectively & POS andNEG be the sets of adjectives which are positive and nega-tive. Having tweet T,
if adv AFF STRONG DOUBT where adj POS = (Score(T ) = Score(T ) + 2)
if adv DOUBT WEAK STRONGAFF where adj NEG
if adv WEAK where adj POS
= (Score(T ) = Score(T ) 2)
The user added annotations were created using the JAPElanguage. Once we created the gazetteer list files, it wasnecessary to add an XML schema to this annotation. Then,using JAPE we specified how we would like to annotate thesewords. On the other hand, Adverbs of Degree is composedof four different types. These include:
1. Adverbs of affirmation, such as absolutely, certainly,exactly, totally and so on.
2. Adverbs of doubt, such as possibly, roughly, appar-ently, seemingly and so on.
3. Strong intensifying adverbs, such as astronomically,exceedingly, extremely, immensely and so on.
4. Weak intensifying adverbs, such as barely, scarcely,weakly, slightly and so on.
Since adverbs are usually found in front of adjectives, thishad to be specified in our JAPE rule as a pattern to an-notate. In order to create an annotation, GATE pattern-matches the content specified in the corpus to any JAPErule found. If a match is found, a new XML tag is createdsurrounding the content it has matched. The XML tag con-tents depends on what has been specified in the JAPE rule.
Note that if a sentiment-bearing word is found, withoutany adverbs in front, it will still affect the score. When allannotations are parsed and the scoring system is finished,the tweet is ready to be stored. However, if no annotationsare found, the tweet is discarded. The score is reversed ifany sarcasm indicator was found. While this is being done,the system also checks if the tweet to be stored containsany keywords needed for our tag cloud. This is done bychecking whether each word is a stop word (such as a con-junction, common verb, etc.). If it is not, it will be addedto a HashMap where the key is the actual keyword and thevalue is its weight. The weight here represents the num-ber of times the particular word has appeared in incomingtweets for that query. For a keyword to be shown, it mustbe present in at least three other tweets.
5.2 Web ApplicationThe web application was developed in order to let the
user search for new data and display the results found. Wemade use of Google Visualization (Google Charts)2 for ourtimeline and Tag Canvas3 for our tag cloud. Through ourweb application the user is able to register for the Twitter
MAT services and query new keywords or topics. The resultsare then displayed on the mentioned timeline which showshow the sentiment of the retrieved tweets changes over theselected time span. The user is also able to view percent-ages obtained for the amount of tweets found for each sen-timent class, view negative and positive tweets separatelyand the tag cloud showing various keywords found in thegiven tweets. Figure 3 shows the main focus of the webapplication, the Timeline.
Figure 3: Twitter MAT: Timeline
6. EVALUATION AND RESULTSThe accuracy of a sentiment analysis system is based on
how well it agrees with human assessment. Having no bodylanguage as compared to verbal language, extracting sen-timent in written text has always been a challenge. Sincedifferent people rate sentiment in text differently from eachother, it makes evaluating this type of analysis an evenharder task. Most studies carried out     onsentiment analysis provided an evaluation system based onthe results of surveys done by different individuals in orderto have a wide range of answers. In order to interpret theresults of these surveys, the average mark is usually takenso as to reflect the general answer given by the people whoundertook the surveys.
The same was done with our survey where 20 individualswere given different tweets and they had to assign a senti-ment for each. As any other sentiment analysis study, therespondents opinion varied quite a lot. This goes to showhow difficult it is to determine the exact sentiment of writ-ten text. However, most answers did match up. From theresults obtained in the surveys we have a total of 75% agree-ment. This was calculated by comparing the surveys resultsto the score obtained by Twitter MAT.
We also compared the systems results with available pub-lic datasets. The datasets chosen were manually annotated,and each dataset was used for different reasons.
1. Stanford Twitter Sentiment Test SetThis corpus known as Sentiment1404 consists of over amillion tweets labelled as positive or negative. The re-sults are shown in Figure 4 where the red figures showany results which did not match. As one can see forboth instances, the tweet is more of a statement thanan expression of sentiment, which may be the reasonwhy the results did not comply. While for the firsttweet words such as die or blasting are considered
as negative by the application, the correspondents hadthe knowledge that the tweet is referring to movies.
Figure 4: Sentiment140 Dataset vs. Twitter MATResults
2. Sentiment Strength Twitter DatasetThis dataset5 focuses on the strength of the senti-ment (as was our approach) having manually anno-tated tweets with a given number to represent thestrength of the sentiment. Two different scores aregiven, a positive score and a negative score. Positivesentiment strength ranges from 1 (not positive) to 5(extremely positive) and negative sentiment strengthfrom -1 (not negative) to -5 (extremely negative). Thus,for example, in the third tweet we have a score of neg-ative -4 by the Sentiment Strength Dataset and a scoreof -1 by our application. These two scores are indicat-ing the same score. While a score of -4 means that ahighly negative word was found, the score of -1 meansthat our application found a negative word and de-creased our score (since we are not keeping two sepa-rate scores for positive and negative words).
Figure 5: Sentiment Strength Dataset vs. TwitterMAT Results
3. STS-Gold DatasetThis dataset was constructed for evaluating Twittersentiment analysis systems as well. However, whatmade it different from the other systems was that theannotations of either positive or negative were not onlyon tweet-level but also entity level. Each entity men-tioned in a given tweet was annotated separately basedon the adjectives and adverbs found, and a generic la-bel is given to the whole tweet based on the majorityof the labels given at entity-level.
Figure 6: STS-Gold Dataset vs. Twitter MAT Re-sults
For the comparison to other manually annotated datasets,there was also a high percentage of agreement. With thefirst dataset, The Sentiment140, we had a 75% agreementas shown in Figure 4, while with Sentiment Strength Datasetwe had a 100% agreement, shown in Figure 5. This goes toshow that using adverbs to enhance our strength in polar-ity did in fact improve our sentiment analysis. For the lastdataset considered, we again reached a 75% agreement, asshown in Figure 6.
As part of our evaluation we also got feedback from alecturer from the Marketing department at the Universityof Malta, Mr E. Said. After a brief demonstration of theweb application and its main features, we discussed whethersimilar tools are useful for the marketing industry. The fol-lowing are some points which Mr. Said mentioned:
Twitter MAT is very simple to use, which attracts alot of marketing managers who are not technically pro-ficient.
The application is very helpful for both marketing com-panies and companies using this tool directly to eval-uate customer insights.
The timeline and the zoom feature are very useful soas to analyse a particular time frame in more depth.
Moreover, the tag cloud, which has recently becomevery popular in the marketing industry here in Malta,can help the user pin point new trends, errors, defectsand articles about the products which may have goneviral; all of which are very important for any marketingstrategy.
Although there are similar tools which are in fact usedeveryday in marketing companies, these are not linkedto any social media platforms, which makes TwitterMAT very innovative and useful to analyse social me-dia hype.
7. CONCLUSION AND FUTURE WORKThere are several issues which can be addressed to improve
our sentiment analysis.
The use of emoticons is becoming more popular thanever with the introduction of emoji keyboard apps andemoticons offered by the default keyboard of new smart-phones. Although in  , both studies explorehow text emoticons (emoticons made up of punctu-ation marks) affect the sentiment of a tweet, we are
now facing a new challenge of having image emoticons.Thus, we can categorize these images based on theirsentiment and adjust our score if any emoticons arefound. Currently the Twitter API does not cater forsuch emoticons as they are listed as image URLs.
Another feature which we could add to our sentimentanalysis component is to examine the publishing pat-terns of the users to identify any form of propaganda. This can be done by storing users (publisher) in-formation while receiving tweets. Doing so, we couldeliminate any form of spam published by propagandistsby analyzing how frequently he/she retweets certaintweets about particular topics or repetitive content.
We could also improve how the system provides theoverview of the results given by the application and thetimeline. One feature we could consider is to identifyevents by analyzing sudden increases of tweets regard-ing certain topics or keywords trending. Such eventscould include: launching new products, new updates,articles about the products or company, new publicity,etc. These events can then be displayed to the user tosuggest what could have affected the sentiment polar-ity of the tweets or what event might have triggered afluctuation in the hype or amount of tweets publishedin a time-span.
From our conducted research, we identified several formsof noise in tweets, which we reduced with our filters suchas removing tweets which are replies to other tweets, tweetswith no sentiment, URLs, etc. Instead of considering abbre-viations and acronyms as noise or stop words, we expandedthem to their full meaning since they could also indicatesentiment. Moreover, we also used adverbs of degree whichmay affect the strength of sentiment for adjectives and verbs.For each adverb found, we adjusted our score to reflect suchstrength. This was proven to be effective based on the re-sults obtained in our evaluation. Even though, as discussed,there are several features we could add to improve our ap-plication, we have obtained satisfying results. As discussedwith local experts in the field, we proved that there is a needfor such tools in an era where social media is becoming animportant part of our lives.
8. REFERENCES D. R. D. Maynard, K. Bontcheva, Challenges in
developing opinion mining tools for social media.University of Sheffield, 2012.
 D. Maynard, V. Tablan, C. Ursu, H. Cunningham,and Y. Wilks, Named Entity Recognition fromDiverse Text Types, in Proceedings of the RecentAdvances in Natural Language Processing 2001Conference, 2001, pp. 257274.
 H. Yu and V. Hatzivassiloglou, Towards answeringopinion questions: Separating facts from options andidentifying the polarity of opinion sentences,EMNLP-2003, 2003.
 E. H. S-M. Kim, Determining the sentiment ofopinions, Coling 2004, 2004.
 B. L. M. Hu, Mining and summarizing customerreviews. KDD-2004, 2004.
 C. C. J. Wiebe, T. Wilson, Annotating expressions ofopinions and emotions in language. LanguageResources and Evaluation (Formerly Computers andthe Humanities), 2005.
 S. D. R. N. H. M. E. F. M. Cohen, P. Damiani,Sentiment analysis in microblogging: A practicalimplementation, University of Buenos Aires, 2011.
 S. V. Bo Pang, Lillian Lee, Thumbs up? sentimentclassin Acation using machine learning techniques.Proceedings of the Conference on Empirical Methodsin Natural Language Processing (EMNLP), pp. 79 86, 2002.
 A. R. D. Davidov, O. Tsur, Semi-supervisedrecognition of sarcastic sentences in twitter andamazon, in Proceedings of the Fourteenth Conferenceon Computational Natural Language Learning, 2010,pp. 107116.
 A. v. d. B. C. Liebrecht, F. Kunneman, The perfectsolution for detecting sarcasm in tweets not.Proceedings of the 4th Workshop on ComputationalApproaches to Subjectivity, Sentiment and SocialMedia Analysis, pp. 29 37, 2013.
 N. W. R. GonzA alez-IbA aAsez, S. Muresan,Identifying sarcasm in twitter: A closer look,Proceedings of the 49th Annual Meeting of theAssociation for Computational Linguistics, pp. 581 586, 2011.
 D. R. F. Benamara, C. Cesarano, Sentiment analysis:Adjectives and adverbs are better than adjectivesalone, 2007.
 M. L. P.D Turney, Measuring praise and criticism:Inference of semantic orientation from association,2003.
 T. C. T. B. S. C. D. Borth, R. Ji, . large-scale visualsentiment ontology and detectors using adjective nounpairs, 2013.
 G. P. Mike Thelwall, Kevan Buckley and D. Cai,Sentiment strength detection in short informal text,Journal of the American Society for InformationScience and Technology, 2010.
 H. Saif, M. Fernandez, Y. He, Alani, and Harith,Evaluation datasets for twitter sentiment analysis: Asurvey and a new dataset, the sts-gold, 2013.
 A. Go, R. Bhayani, and L. Huang, Twitter sentimentclassification using distant supervision, Processing,2009. [Online]. Available:http://www.stanford.edu/ alecmgo/papers/TwitterDistantSupervision09.pdf
 E. Kouloumpis, T. Wilson, Moore, and Johanna,Twitter sentiment analysis: The good the bad andthe omg! The AAAI Press, 2011. [Online]. Available:http://dblp.uni-trier.de/db/conf/icwsm/icwsm2011.htmlKouloumpisWM11
 J. Zhao, L. Dong, J. Wu, and K. Xu, Moodlens: anemoticon-based sentiment analysis system for chinesetweets. in KDD. ACM, 2012, pp. 15281531.
 S. L. Rojas, U. Kirschenmann, and M. Wolpers, Wehave no feelings, we have emoticons ;-). in ICALT.IEEE, 2012, pp. 642646.
 H. K. C. Lumezanu, N. Feamster, bias: Measuringthe tweeting behavior of propagandists, 2012.