Big DataBig Deal or Big Distraction?Agenda:What is Big Data?Why YOU Should Care About (Big) Data?A Brief Introduction to Big Data Econometrics
Theory vs. measurement 2 min video: http://www.bigdataeconometrics.com/
Short vs long term good BHS blog post: http://blogs.hbr.org/2013/12/how-marketers-can-avoid-big-data-blind-spots/
1Internet of People
Ed Moeds lecture time people are on line and what they are doing. 2
Today computers are almost wholly dependent on human beings for information -- by typing, pressing a record button, taking a digital picture or scanning a bar code... The problem is, people have limited time, attention and accuracyall of which means they are not very good at capturing data about things in the real world Kevin Ashton,'That 'Internet of Things' Thing',RFID Journal, July 22, 2009The Problem With People
Really, how valuable is all the data about the number of people who watch cat videos?What can we learn from trending on twitter? http://www.hashtags.org/trending-on-twitter/
General problem with social science people dont create data for analytics. Selection bias, error, unobservable heterogeneity4
Internet of ThingsWikepedia page with good quotes: http://en.wikipedia.org/wiki/Internet_of_Things5
What do people want?But remember JobsWhere are the people we want? Customization, add placementWhat will sales be?Predicting the futureAre my ads working? : The ATTRIBUTION problem
Hal Varian: Prediction, summarization, estimation, hypothesis testingPandora WSJ article: http://online.wsj.com/news/articles/SB10001424052702304315004579381393567130078
You will either work for one of these companies, use one of their products or contract for one of their services in both your personal and professional lives.7
+the Publicis CEO noted that "the communication and marketing landscape has undergone dramatic changes in recent years, including the exponential development of new media giants, the explosion of Big Data, blurring of the roles of all players and profound changes in consumer behavior. WSJ 7/28/13a $35.1 billion cross-border linkup that shows how Big Data is making Madison Avenue look more like Wall Street. WSJ 7/28/13Interesting interview about digital channels: http://www.beet.tv/2013/07/publicis-vivaki.html
Key WSJ article (read some quotes) http://online.wsj.com/news/articles/SB10001424127887324809004578634133795292520how Big Data is making Madison Avenue look more like Wall Street8
CreativevsAnalyticalPrimacy of creative is declining: http://www.careers-in-marketing.com/adfacts.htmBloomberg: too many people, not enough robots: http://www.businessweek.com/articles/2013-07-29/omnicom-publicis-still-too-many-people-not-enough-robotsPeople being hired in creative are freelancers haves and have nots like in many other industries!9
A Brief Introduction to Big Data EconometricsWhat can we do with data?Correlation vs. CausationC. Types of datai. Cross sectionii. Time seriesiii. PanelD. Fit, overfit, validationE. Tools of the tradei. Regression, logit, probitii. Trees & Forestsiii. Baysean simulation Prediction, summarization, estimation, hypothesis testing
Fast Company interview with Nate Silver (data in sports and politics) on hype of big data revolution: http://www.fastcompany.com/3009258/most-creative-people-2013/1-nate-silver
PredictionSummarization EstimationHypothesis TestingTension between prediction dont care why it works as long as it works can causal modeling.11
Is Marriage Good for Your Health?Tara Parker-Pope, 4/14/10Contemporary studies, for instance, have shown that married people are less likely to getpneumonia, have surgery, developcancer or have heart attacks. A group of Swedish researchers has found that being married or cohabiting at midlife is associated with a lower risk for dementia. A study of two dozen causes of death in the Netherlands found that in virtually every category, ranging from violent deaths like homicide and car accidents to certain forms of cancer, the unmarried were at far higher risk than the married. Correlation vs. Causation12
What can get in the way of determining CAUSATION?ENDOGENEITY
1. Reverse causality (also selection bias): healthier people are more likely to get married
2. Unobservable characteristics such as time preference, aptitude, genetics
Counterfactual 1. What would happen if we change the cause?
2. Is there a plausible alternative explanation?
What would sales have been if the ad did not run?What would people do if they did not use Google?What would people buy if the weather was warmer?Big data opens more possibilities for more natural experiments because there are more people getting exposed to more stuff. 14
Cross-section Data: Lots of observations at one point in time.15Time Series Data: One observation over time.
Panel Data: Multiple observations of the same thing over time.
Varian article posted on Moodle18
Fit, Overfit, Validation and Out of Sample PredictionKey with big data is that we have bigger samples so can do more with different estimation and validation sub-samples19
Trees and Forests
Uninformative Prior ProbabilityGather DataConditional probability of observing dataUpdated ProbabilityBayesian Statistics
With BIG DATA we can repeat this process over and over again with multiple models to get better predictions!Corsea Machine Learning by Andrew Nghttps://www.coursera.org/course/mlAn Introduction to Statistical LearningBook: http://www-bcf.usc.edu/~gareth/ISL/Lecture videos & problem sets: http://www.alsharif.info/#!iom530/c21o7
Feeling a bit overwhelmed?
its no wonder that the latest fad in the business world is Big Data Big data can be an extraordinary tool, helping to gather new information about our behavior and preferences. What it cant explain is why we do what we do. WSJ 3/22/1426