SlideShare a Scribd company logo
1 of 6
Download to read offline
A Real-Time Sentiment Analysis of Twitter
Feeds with the NASDAQ Index
Eric Tham
National University of Singapore
MS Business Analytics 2013/24
A0119305M
tham@nus.edu.sg
Karthik Narayan Pasupathy
National University of Singapore
MS Business Analytics 2013/24
A0119092H
karthik@nus.edu.sg
Aswin Palliyali Venugopalan
National University of Singapore
MS Business Analytics 2013/24
A0119351L
aswin.pv@nus.edu.sg
ABSTRACT
We do a real-time analysis on twitter feeds computing
its sentiment analysis using the hash tag #NASDAQ.
This sentiment index is found to correlate well with the
hourly movements of the NASDAQ index over the
period 14-17th
Apr 2014. In particular, a Granger
causality analysis shows that the hourly movements of
the NASDAQ drives tweet sentiment real-time and not
vice versa during this period.
Our study uses a Python script to listen to tweets and to
collect the hourly prices of the index. The data is fed
into HIVE databases, which are extracted in a Map-
Reduce program to run sentiment index by the Stanford
NLP library. In the rest of this report, we describe first
the technical architecture implementation of our study.
We then describe the sentiment analysis library by the
Stanford NLP1
program and recent studies of sentiment
analysis on the financial markets. We conclude with
the results obtained in real-time during the aforesaid
period.
Keywords
Big data, map-reduce, NASDAQ, sentiment analysis, Stanford
NLP, momentum herding instinct, HIVE databases, Python,
MYSQL metastore
1. Technical Architecture
A high level architecture of our implementation is seen
below. This is divided into the following tasks:
collection and storage of data, use of map-reduce to
compute sentiment and visualisation.
1
http://www-nlp.stanford.edu/
Figure 1: Technical Architecture
The following section explains the different
Architectural components of the project.
1.1 Data Collection Layer:
This layer is responsible for collecting Twitter feeds
data as well as Stock prices. This layer functions
independent of other layers and runs continuously.
i. Tweet Listener
This component is written as a Python script and uses
'tweepy' library which is a Python wrapper over the
Twitter Streaming APIs. This component listens for
tweets that contain key words 'Nasdaq' or '^IXIC'.
self.stream_listener.filter(track=['nasdaq', '^IXIC'])
Whenever a Tweet arrives satisfying this filter
criterion, it will be written to a text file (tweets.tsv).
When the number of Tweets reaches the
BATCH_LOAD_COUNT, another script -
(load_data_tweets.hql) will be invoked from Tweet
Listener to load the data to HIVE from tweets.tsv.
if self.tweet_count==self.BATCH_LOAD_COUNT:
self.out_file.close()
call(["hive","-f", "load_data_tweets.hql"])
This process continues until interrupted by the user.
ii. Stock Price Listener
This component is written as a Python script and reads
the latest stock price from Yahoo Finance directly.
Stock price can be read directly from the link
http://download.finance.yahoo.com/d/quotes.csv?s=^IXIC&f=l1
Similar to Tweet Listener, this component also writes
the data into a text file (stock_prices.tsv) first. Once the
number of rows reaches a configurable
BATCH_LOAD_COUNT, another script
(load_data_stockprices.hql) will be invoked to load
data from stock_prices.tsv to HIVE.
if data_count==BATCH_LOAD_COUNT:
out_file.close()
call(["hive","-f", load_data_stockprices.hql"])
This process continues until interrupted by the user.
1.2 Data Processing Layer:
This layer is responsible for computing the aggregate
sentiment score for tweets collected over a window and
also to aggregate the stock price for the same window.
All components of this layer are invoked from a single
Python script than runs continuously. The inner
working of this layer is explained using the below flow
chart:
Figure 2: Data Processing Flow
i. Data Extractor
This component is generated from the Python script to
extract the data for a particular window. The script has
queries to extract all data with time stamp >=
start_of_window_timestamp from Tweets &
Stock_Prices tables.
fp=open('temp/extract_data.sh','w')
cmd='hive -e "select * from tweets where
time_stamp>=cast(''+time_stamp+'' as timestamp)"
>temp/tweets.txt'
fp.write(cmd+'n')
cmd='hive -e "select * from stock_prices where
time_stamp>=cast(''+time_stamp+'' as timestamp)"
>temp/stock_prices.txt'
fp.write(cmd+'n')
ii. Map Reduce Algorithm
This component is written in Java and makes use of
Hadoop map-reduce framework to compute the
aggregate sentiment score & stock price for a window.
The inner working of this component is as given
below.
Figure 3: Map Reduce Flow
ii. Data Loader
This component is responsible for loading the output of
the map-reduce program to HIVE. This moves the
output file from HDFS to local file system and
executes the script to load data to HIVE as a
repository.
load data local inpath 'output/part-r-00000' into table
stock_sentiments; (output directly is local, copied from HDFS)
1.3 Data Visualization Layer:
This layer also works independently of other layers and
is implemented using Python ‘matplotlib’ for
visualization.
i. Data Visualizer
The below plot is a sample visualization of how
sentiment index moves according to stock price for a
small window (22-Apr 2 AM - 6 AM, Singapore time).
The steady stock price after 4 am is due to the close of
market.
Figure 4: Stock Price Visualisation
1.4 MySQL Metastore
The HIVE metastore service stores the metadata for
Hive tables and partitions in a relational database, and
provides clients (including HIVE) access to this
information via the metastore service API. By default
Apache HIVE is configured to use Derby as metastore.
But it can handle only one active user at a time. In our
case, we need multiple connections to be active with
Hive at the same time for loading tweets/stock prices,
for doing sentiment analysis and for visualization.
One solution to this issue is to standalone database as a
metastore and one popular choice is MySQL.2
2
source: Hadoop Definitive Guide
Figure 5: MySQL Metastore
2. Sentiment Analysis Library
Our sentiment analysis uses the Deeply Moving library
from the Stanford NLP. This is an improvement over
the usual bag of words. The bag of words does a count
of positive and negative words and derives the overall
sentiment as such. It neglects negation and the
linguistic phenomena of longer sentences. Accuracy of
bag of words has not exceeded 80%3
. On the other
hand, the Stanford NLP library looks at the sentence in
its entirety in a recursive deep model to derive the
sentiment. Its accuracy has been touted to reach 86%.
The Stanford sentiment analysis uses a recursive neural
network through a tree structure containing the words
in the sentence marked for analysis. It is a supervised
training method based on the Stanford Sentiment
Treebank4
which is trained from more than 11k movie
reviews that have humanly annotated. Each n–gram of
words in the sentence is marked by a vector of features
eg Part-of-Speech, semantics, co-occurrence which are
used in its training classification and testing recursively
in a tree-like structure. The tree builds itself bottom-up
to include different words within the sentence. In this
manner, the entire sentence can be considered for
overall analysis.
3
Source of accuracy
http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf
4
http://nlp.stanford.edu/sentiment/treebank.html We note that
this Treebank is based on movie reviews from which the
supervised training is based. There may be some inherent
differences in sentiment reviews from movie reviews and on
stock sentiment.
3. Impact of sentiment on Financial Markets
The financial markets have been known to be volatile
with sudden spurts of heterscedasticity (fat tails). This
is partly due to herding instinct amongst investors.
Various studies have suggested that the media and
financial news reporting accentuate momentum in the
financial markets5
. Sentiment analysis in the financial
markets is now in the mainstream as major news
agencies eg Thomson Reuters have added unique
twitter and news sentiment analysis into their product
suite.6
Many start-up companies like InfoTrie have also
added their products as add-ons to the Bloomberg news
services.
Aside from the traditional news media like Bloomberg,
CNN and CNBC, alternative other forms of media
have surfaced that are a microcosm of the investor
community at large. This includes the online blogs,
Facebook and Twitter. An advantage of twitter feeds
over Facebook or online blogs is its frequency. The
higher frequency of tweets means that it may better
reflect investor sentiment in real-time. There are three
potential ways on how tweets may affect the stock
price movements:
i. Volatility
A trending (increased number of) tweets with regards
to economic news may correspond to a period of
increased volatility in the stock markets. This is logical
considering that aside from the news agencies, traders
and investors alike may ‘tweet’ more often during this
period. However, there were no significant economic
releases7
over the week of Apr 14-17 for us to test this
hypothesis.
5
http://stocktwits.com/research/Predictability-of-stock-market-
behavior-using-stocktwits-sentiment-and-posting-
volume_NunoOliveira.pdf for example predicts the study of
the stock prices with sentiment from StockTwits, a micro
online blog and the volume being traded. Momentum in
trading has probably caused a herding instinct which causes
stock prices to overshoot their ‘equilibrium’ price. Similarly in
down markets, fear generated by the media or online buzz may
cause prices to plummet more than they should.
6
http://thomsonreuters.com/press-releases/022014/Thomson-
Reuters-Adds-Unique-Twitter-and-News-Sentiment-Analysis-
to-Thomson-Reuters-Eikon
7
Significant economic releases that may affect the
NASDAQ will include the FED statements or the
Labour Department reports.
ii. Event Study & Correlation analysis:
Aside from the increased number of tweets, studies
have also been done that categorises the polarity and
subjectivity of the tweets in events and its relation to
stock price movements.
iii. Momentum Trading Strategy
Another use of tweets can also be as a predictor of
stock trends. Momentum strategy is much researched
and based on herding instinct of traders. It is based on
a feedback loop of investor sentiment back into rising
or falling markets. A question to ask: do traders or
news agencies tweet more often in trending markets? If
so, is this a Bayesian probabilistic event having some
predictability on the markets?
3.1 Lead-lag analysis
In our study on the tweets, we studied hourly tweets in
a week and graphed it against the index movement.
This is found to have a strong correlation as seen in the
figure below, where both the sentiment index and the
NASDAQ trend upwards.
The Pearson correlation analysis of the sentiment index
with the NASDAQ index is 0.1 considering all hours.8
Ignoring non-trading hours, this correlation is 0.25.
Considering that on average stock return correlations
are ~0.1, this is relatively high.
Figure 6: Sentiment Index and NASDAQ time series
8
The NASDAQ is traded only from 0900 to 1600 EST whilst
tweets collected round the clock provide a sentiment index.
We had assumed the NASDAQ index to be constant during
non-trading hours which would have inadvertently lowered
the correlation.
Predictability of Tweets from Index or Vice Versa
We further do a simple granger causality9
analysis
testing against a lead lag order of 1 to 3 hours. The
results from R output for the null hypothesis of
sentiment to not granger cause the NASDAQ index:
grangertest(NASDAQ~Sentiment, order
=1,data=data)
Model 1: NAS ~ Lags(NAS, 1:2) + Lags(Sent,
1:2)
Model 2: NAS ~ Lags(NAS, 1:2)
Res.Df Df F Pr(>F)
1 25
2 27 -2 2.0128 0.1547
For lags of up to 3, the F stat is not significant at the
5% level thus not rejecting the null hypothesis.
The results from R output for sentiment with the null
hypothesis to not granger cause NASDAQ are:
grangertest(Sentiment~NASDAQ, order
=1,data=data)
Model 1: Sent ~ Lags(Sent, 1:1) + Lags(NAS,
1:1)
Model 2: Sent ~ Lags(Sent, 1:1)
Res.Df Df F Pr(>F)
1 28
2 29 -1 4.0874 0.05285 .
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’
With the t-stat statistically significant at ~5%, the null
hypothesis is rejected. The NASDAQ thus granger
causes the tweets sentiment level.
Examination of tweets
We next examine the tweets that are downloaded.
There are in all 17k tweets over 4 days x 24 hours. This
works out to 177 tweets per hour, which is our measure
of analysis.
Most of the tweets are ‘reporting’ in nature, which
substantiates the study result that the NASDAQ
movements granger causes the tweets sentiment. Some
examples of the tweets are:
9
The test statistic of the Granger test is the F-stat
Where s1 is the sum of squared errors of the OLS of y against
lagged y up to order n and s2 is the sum of squares of the OLS
of y against lagged y and x to order n.
2014-04-15 18:49:25|@cnbc - newest update on how 29
Nasdaq co bear market Territory. That's significant –
2014-04-15 18:48:19|Nasdaq comp almost positive after (1.9%)
loss earlier - what a joy ride for the liquidity machines. #HFT
$QQQ
2014-04-15 18:46:26|Money_Mystery Alert@ As expected
Nasdaq hits 3966 and took support...now back to 4000
Whilst there are tweets that are ‘analytical in nature’
and potentially drives markets, these are few and far in
between. Examples are:
2014-04-15 18:46:27|$AAPL $FB $GOOGL $TWTR What-If
NASDAQ falls another 5% from current levels. "CAPM"
analysis on a portfolio.
2014-04-15 18:05:19|RT @hakanKRBN: $NASDAQ watching
for reversal ..I think 1997 scenario in play.
2014-04-15 18:03:37|Deeper selloff it this happens. If yield
curve drops below 2.6 watch out below. #NASDAQ
We further premise that the week of 14-17 Apr is a
quiet week, and just before the long Easter break. This
is a period of low volatility and had no significant news
events. As such, there is no feeding of investor
sentiment back into the NASDAQ or stock prices in a
‘feedback loop’, which the authors had described as
momentum herding.
4. CONCLUSION
Our Map Reduce program and subsequent statistical
analysis have shown that in times of low volatility, it is
the stock market (NASDAQ) that drives tweets
sentiment in a more ‘reporting’ mode. This is premised
on data that is collected in the week of 14-17 April
before the Easter break with no major events.
The technical architecture program that we have built
is very scalable with a HIVE repository, a generalized
Map reduce program and a real-time direct API to the
twitter. This may be used in other application which
will be useful to the authors.
5. ACKNOWLEDGMENTS
Our thanks to Prof Tan Kim Leng for his teaching and guidance
during the course of the Big Data module.

More Related Content

What's hot

Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysisAnil Shrestha
 
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering TechniquesA Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniquestengyue5i5j
 
IRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learningijtsrd
 
Market Forecasting Twitter Sentiment
Market Forecasting Twitter SentimentMarket Forecasting Twitter Sentiment
Market Forecasting Twitter SentimentNicholasBrown67
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine LearningIRJET Journal
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweetsmoresmile
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISEditor Jacotech
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
An Approach to Block Negative Posts on Social Media at Server Side
An Approach to Block Negative Posts on Social Media at Server SideAn Approach to Block Negative Posts on Social Media at Server Side
An Approach to Block Negative Posts on Social Media at Server Sideijtsrd
 
News Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisNews Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisTELKOMNIKA JOURNAL
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisGangasagar Patil
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
[M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization [M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization Andrea Rubio
 
Twitter text mining using sas
Twitter text mining using sasTwitter text mining using sas
Twitter text mining using sasAnalyst
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...eSAT Journals
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in REdureka!
 

What's hot (19)

Tweet sentiment analysis
Tweet sentiment analysisTweet sentiment analysis
Tweet sentiment analysis
 
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering TechniquesA Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
 
IRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic Regression
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learning
 
Market Forecasting Twitter Sentiment
Market Forecasting Twitter SentimentMarket Forecasting Twitter Sentiment
Market Forecasting Twitter Sentiment
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
An Approach to Block Negative Posts on Social Media at Server Side
An Approach to Block Negative Posts on Social Media at Server SideAn Approach to Block Negative Posts on Social Media at Server Side
An Approach to Block Negative Posts on Social Media at Server Side
 
News Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisNews Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic Analysis
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
[M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization [M3A2] Data Analysis and Interpretation Specialization
[M3A2] Data Analysis and Interpretation Specialization
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
Twitter text mining using sas
Twitter text mining using sasTwitter text mining using sas
Twitter text mining using sas
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...
 
Sentiment Analysis in R
Sentiment Analysis in RSentiment Analysis in R
Sentiment Analysis in R
 
Pydata Taipei 2020
Pydata Taipei 2020Pydata Taipei 2020
Pydata Taipei 2020
 

Viewers also liked

Restaurant Consultants Middle East
Restaurant Consultants Middle EastRestaurant Consultants Middle East
Restaurant Consultants Middle EastAaron Allen
 
Group 2 , Topic 1. Restaurant Portions And Obesity
Group 2 , Topic 1. Restaurant Portions And ObesityGroup 2 , Topic 1. Restaurant Portions And Obesity
Group 2 , Topic 1. Restaurant Portions And Obesitylshie223
 
Practical Elliott Wave Trading Strategies
Practical Elliott Wave Trading StrategiesPractical Elliott Wave Trading Strategies
Practical Elliott Wave Trading StrategiesNick Radge
 
Facebook for Your Restaurant
Facebook for Your RestaurantFacebook for Your Restaurant
Facebook for Your RestaurantAaron Allen
 
Modern Pizza Promotions
Modern Pizza Promotions Modern Pizza Promotions
Modern Pizza Promotions Aaron Allen
 
An Introduction to the University of Cambridge Computing Service
An Introduction to the University of Cambridge Computing ServiceAn Introduction to the University of Cambridge Computing Service
An Introduction to the University of Cambridge Computing Servicehvs
 
The Lehman Brothers Volatility Screening Tool
The Lehman Brothers Volatility Screening ToolThe Lehman Brothers Volatility Screening Tool
The Lehman Brothers Volatility Screening ToolRYAN RENICKER
 
Technical Analysis: Oscillators by NSFX
Technical Analysis: Oscillators by NSFXTechnical Analysis: Oscillators by NSFX
Technical Analysis: Oscillators by NSFXNSFX
 
Technical Analysis of Major Forex Currencies
Technical Analysis of Major Forex CurrenciesTechnical Analysis of Major Forex Currencies
Technical Analysis of Major Forex CurrenciesInvestingTips
 
Technical Analysis of Gaps in Forex Trading
Technical Analysis of Gaps in Forex TradingTechnical Analysis of Gaps in Forex Trading
Technical Analysis of Gaps in Forex TradingInvestingTips
 
Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...
Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...
Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...ForexTraining
 
09 Fluid Social Media Restaurant Seminar
09 Fluid Social Media Restaurant Seminar09 Fluid Social Media Restaurant Seminar
09 Fluid Social Media Restaurant SeminarMax Connect Marketing
 
Principles of food beverage and labor cost controls
Principles of food  beverage  and labor cost controlsPrinciples of food  beverage  and labor cost controls
Principles of food beverage and labor cost controlslibfsb
 
Employee Rules And Regulations
Employee Rules And RegulationsEmployee Rules And Regulations
Employee Rules And RegulationsNalaka Jayaratne
 

Viewers also liked (15)

Restaurant Consultants Middle East
Restaurant Consultants Middle EastRestaurant Consultants Middle East
Restaurant Consultants Middle East
 
Group 2 , Topic 1. Restaurant Portions And Obesity
Group 2 , Topic 1. Restaurant Portions And ObesityGroup 2 , Topic 1. Restaurant Portions And Obesity
Group 2 , Topic 1. Restaurant Portions And Obesity
 
Practical Elliott Wave Trading Strategies
Practical Elliott Wave Trading StrategiesPractical Elliott Wave Trading Strategies
Practical Elliott Wave Trading Strategies
 
Facebook for Your Restaurant
Facebook for Your RestaurantFacebook for Your Restaurant
Facebook for Your Restaurant
 
Modern Pizza Promotions
Modern Pizza Promotions Modern Pizza Promotions
Modern Pizza Promotions
 
An Introduction to the University of Cambridge Computing Service
An Introduction to the University of Cambridge Computing ServiceAn Introduction to the University of Cambridge Computing Service
An Introduction to the University of Cambridge Computing Service
 
The Lehman Brothers Volatility Screening Tool
The Lehman Brothers Volatility Screening ToolThe Lehman Brothers Volatility Screening Tool
The Lehman Brothers Volatility Screening Tool
 
Technical Analysis: Oscillators by NSFX
Technical Analysis: Oscillators by NSFXTechnical Analysis: Oscillators by NSFX
Technical Analysis: Oscillators by NSFX
 
Technical Analysis of Major Forex Currencies
Technical Analysis of Major Forex CurrenciesTechnical Analysis of Major Forex Currencies
Technical Analysis of Major Forex Currencies
 
Technical Analysis of Gaps in Forex Trading
Technical Analysis of Gaps in Forex TradingTechnical Analysis of Gaps in Forex Trading
Technical Analysis of Gaps in Forex Trading
 
The Technical Analysis Guidebook
The Technical Analysis GuidebookThe Technical Analysis Guidebook
The Technical Analysis Guidebook
 
Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...
Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...
Trade Forex From Home - 10 Biggest Mistakes New Forex Traders Make (And How T...
 
09 Fluid Social Media Restaurant Seminar
09 Fluid Social Media Restaurant Seminar09 Fluid Social Media Restaurant Seminar
09 Fluid Social Media Restaurant Seminar
 
Principles of food beverage and labor cost controls
Principles of food  beverage  and labor cost controlsPrinciples of food  beverage  and labor cost controls
Principles of food beverage and labor cost controls
 
Employee Rules And Regulations
Employee Rules And RegulationsEmployee Rules And Regulations
Employee Rules And Regulations
 

Similar to Real time sentiment analysis of twitter feeds with the NASDAQ index

IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET Journal
 
Stock Market Prediction
Stock Market PredictionStock Market Prediction
Stock Market PredictionMRIDUL GUPTA
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsS M Raju
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaionPrathameshSankpal
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptxRishita Gupta
 
Political Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptxPolitical Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptxDineshGaikwad36
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningIRJET Journal
 
Political prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learningPolitical prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learningVishwambhar Deshpande
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor
 
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment AnalysisA Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment AnalysisIRJET Journal
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...IRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental AnalysisIRJET Journal
 
Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET Journal
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysisAntaraBhattacharya12
 

Similar to Real time sentiment analysis of twitter feeds with the NASDAQ index (20)

IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: TwisentIRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
IRJET- A Real-Time Twitter Sentiment Analysis and Visualization System: Twisent
 
Stock Market Prediction
Stock Market PredictionStock Market Prediction
Stock Market Prediction
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaion
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptx
 
Political Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptxPolitical Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptx
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine Learning
 
Political prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learningPolitical prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learning
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment AnalysisA Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
 
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
Live Twitter Sentiment Analysis and Interactive Visualizations with PyLDAvis ...
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News Articles
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

Real time sentiment analysis of twitter feeds with the NASDAQ index

  • 1. A Real-Time Sentiment Analysis of Twitter Feeds with the NASDAQ Index Eric Tham National University of Singapore MS Business Analytics 2013/24 A0119305M tham@nus.edu.sg Karthik Narayan Pasupathy National University of Singapore MS Business Analytics 2013/24 A0119092H karthik@nus.edu.sg Aswin Palliyali Venugopalan National University of Singapore MS Business Analytics 2013/24 A0119351L aswin.pv@nus.edu.sg ABSTRACT We do a real-time analysis on twitter feeds computing its sentiment analysis using the hash tag #NASDAQ. This sentiment index is found to correlate well with the hourly movements of the NASDAQ index over the period 14-17th Apr 2014. In particular, a Granger causality analysis shows that the hourly movements of the NASDAQ drives tweet sentiment real-time and not vice versa during this period. Our study uses a Python script to listen to tweets and to collect the hourly prices of the index. The data is fed into HIVE databases, which are extracted in a Map- Reduce program to run sentiment index by the Stanford NLP library. In the rest of this report, we describe first the technical architecture implementation of our study. We then describe the sentiment analysis library by the Stanford NLP1 program and recent studies of sentiment analysis on the financial markets. We conclude with the results obtained in real-time during the aforesaid period. Keywords Big data, map-reduce, NASDAQ, sentiment analysis, Stanford NLP, momentum herding instinct, HIVE databases, Python, MYSQL metastore 1. Technical Architecture A high level architecture of our implementation is seen below. This is divided into the following tasks: collection and storage of data, use of map-reduce to compute sentiment and visualisation. 1 http://www-nlp.stanford.edu/ Figure 1: Technical Architecture The following section explains the different Architectural components of the project. 1.1 Data Collection Layer: This layer is responsible for collecting Twitter feeds data as well as Stock prices. This layer functions independent of other layers and runs continuously. i. Tweet Listener This component is written as a Python script and uses 'tweepy' library which is a Python wrapper over the Twitter Streaming APIs. This component listens for tweets that contain key words 'Nasdaq' or '^IXIC'.
  • 2. self.stream_listener.filter(track=['nasdaq', '^IXIC']) Whenever a Tweet arrives satisfying this filter criterion, it will be written to a text file (tweets.tsv). When the number of Tweets reaches the BATCH_LOAD_COUNT, another script - (load_data_tweets.hql) will be invoked from Tweet Listener to load the data to HIVE from tweets.tsv. if self.tweet_count==self.BATCH_LOAD_COUNT: self.out_file.close() call(["hive","-f", "load_data_tweets.hql"]) This process continues until interrupted by the user. ii. Stock Price Listener This component is written as a Python script and reads the latest stock price from Yahoo Finance directly. Stock price can be read directly from the link http://download.finance.yahoo.com/d/quotes.csv?s=^IXIC&f=l1 Similar to Tweet Listener, this component also writes the data into a text file (stock_prices.tsv) first. Once the number of rows reaches a configurable BATCH_LOAD_COUNT, another script (load_data_stockprices.hql) will be invoked to load data from stock_prices.tsv to HIVE. if data_count==BATCH_LOAD_COUNT: out_file.close() call(["hive","-f", load_data_stockprices.hql"]) This process continues until interrupted by the user. 1.2 Data Processing Layer: This layer is responsible for computing the aggregate sentiment score for tweets collected over a window and also to aggregate the stock price for the same window. All components of this layer are invoked from a single Python script than runs continuously. The inner working of this layer is explained using the below flow chart: Figure 2: Data Processing Flow i. Data Extractor This component is generated from the Python script to extract the data for a particular window. The script has queries to extract all data with time stamp >= start_of_window_timestamp from Tweets & Stock_Prices tables. fp=open('temp/extract_data.sh','w') cmd='hive -e "select * from tweets where time_stamp>=cast(''+time_stamp+'' as timestamp)" >temp/tweets.txt' fp.write(cmd+'n') cmd='hive -e "select * from stock_prices where time_stamp>=cast(''+time_stamp+'' as timestamp)" >temp/stock_prices.txt' fp.write(cmd+'n') ii. Map Reduce Algorithm
  • 3. This component is written in Java and makes use of Hadoop map-reduce framework to compute the aggregate sentiment score & stock price for a window. The inner working of this component is as given below. Figure 3: Map Reduce Flow ii. Data Loader This component is responsible for loading the output of the map-reduce program to HIVE. This moves the output file from HDFS to local file system and executes the script to load data to HIVE as a repository. load data local inpath 'output/part-r-00000' into table stock_sentiments; (output directly is local, copied from HDFS) 1.3 Data Visualization Layer: This layer also works independently of other layers and is implemented using Python ‘matplotlib’ for visualization. i. Data Visualizer The below plot is a sample visualization of how sentiment index moves according to stock price for a small window (22-Apr 2 AM - 6 AM, Singapore time). The steady stock price after 4 am is due to the close of market. Figure 4: Stock Price Visualisation 1.4 MySQL Metastore The HIVE metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients (including HIVE) access to this information via the metastore service API. By default Apache HIVE is configured to use Derby as metastore. But it can handle only one active user at a time. In our case, we need multiple connections to be active with Hive at the same time for loading tweets/stock prices, for doing sentiment analysis and for visualization. One solution to this issue is to standalone database as a metastore and one popular choice is MySQL.2 2 source: Hadoop Definitive Guide
  • 4. Figure 5: MySQL Metastore 2. Sentiment Analysis Library Our sentiment analysis uses the Deeply Moving library from the Stanford NLP. This is an improvement over the usual bag of words. The bag of words does a count of positive and negative words and derives the overall sentiment as such. It neglects negation and the linguistic phenomena of longer sentences. Accuracy of bag of words has not exceeded 80%3 . On the other hand, the Stanford NLP library looks at the sentence in its entirety in a recursive deep model to derive the sentiment. Its accuracy has been touted to reach 86%. The Stanford sentiment analysis uses a recursive neural network through a tree structure containing the words in the sentence marked for analysis. It is a supervised training method based on the Stanford Sentiment Treebank4 which is trained from more than 11k movie reviews that have humanly annotated. Each n–gram of words in the sentence is marked by a vector of features eg Part-of-Speech, semantics, co-occurrence which are used in its training classification and testing recursively in a tree-like structure. The tree builds itself bottom-up to include different words within the sentence. In this manner, the entire sentence can be considered for overall analysis. 3 Source of accuracy http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf 4 http://nlp.stanford.edu/sentiment/treebank.html We note that this Treebank is based on movie reviews from which the supervised training is based. There may be some inherent differences in sentiment reviews from movie reviews and on stock sentiment. 3. Impact of sentiment on Financial Markets The financial markets have been known to be volatile with sudden spurts of heterscedasticity (fat tails). This is partly due to herding instinct amongst investors. Various studies have suggested that the media and financial news reporting accentuate momentum in the financial markets5 . Sentiment analysis in the financial markets is now in the mainstream as major news agencies eg Thomson Reuters have added unique twitter and news sentiment analysis into their product suite.6 Many start-up companies like InfoTrie have also added their products as add-ons to the Bloomberg news services. Aside from the traditional news media like Bloomberg, CNN and CNBC, alternative other forms of media have surfaced that are a microcosm of the investor community at large. This includes the online blogs, Facebook and Twitter. An advantage of twitter feeds over Facebook or online blogs is its frequency. The higher frequency of tweets means that it may better reflect investor sentiment in real-time. There are three potential ways on how tweets may affect the stock price movements: i. Volatility A trending (increased number of) tweets with regards to economic news may correspond to a period of increased volatility in the stock markets. This is logical considering that aside from the news agencies, traders and investors alike may ‘tweet’ more often during this period. However, there were no significant economic releases7 over the week of Apr 14-17 for us to test this hypothesis. 5 http://stocktwits.com/research/Predictability-of-stock-market- behavior-using-stocktwits-sentiment-and-posting- volume_NunoOliveira.pdf for example predicts the study of the stock prices with sentiment from StockTwits, a micro online blog and the volume being traded. Momentum in trading has probably caused a herding instinct which causes stock prices to overshoot their ‘equilibrium’ price. Similarly in down markets, fear generated by the media or online buzz may cause prices to plummet more than they should. 6 http://thomsonreuters.com/press-releases/022014/Thomson- Reuters-Adds-Unique-Twitter-and-News-Sentiment-Analysis- to-Thomson-Reuters-Eikon 7 Significant economic releases that may affect the NASDAQ will include the FED statements or the Labour Department reports.
  • 5. ii. Event Study & Correlation analysis: Aside from the increased number of tweets, studies have also been done that categorises the polarity and subjectivity of the tweets in events and its relation to stock price movements. iii. Momentum Trading Strategy Another use of tweets can also be as a predictor of stock trends. Momentum strategy is much researched and based on herding instinct of traders. It is based on a feedback loop of investor sentiment back into rising or falling markets. A question to ask: do traders or news agencies tweet more often in trending markets? If so, is this a Bayesian probabilistic event having some predictability on the markets? 3.1 Lead-lag analysis In our study on the tweets, we studied hourly tweets in a week and graphed it against the index movement. This is found to have a strong correlation as seen in the figure below, where both the sentiment index and the NASDAQ trend upwards. The Pearson correlation analysis of the sentiment index with the NASDAQ index is 0.1 considering all hours.8 Ignoring non-trading hours, this correlation is 0.25. Considering that on average stock return correlations are ~0.1, this is relatively high. Figure 6: Sentiment Index and NASDAQ time series 8 The NASDAQ is traded only from 0900 to 1600 EST whilst tweets collected round the clock provide a sentiment index. We had assumed the NASDAQ index to be constant during non-trading hours which would have inadvertently lowered the correlation. Predictability of Tweets from Index or Vice Versa We further do a simple granger causality9 analysis testing against a lead lag order of 1 to 3 hours. The results from R output for the null hypothesis of sentiment to not granger cause the NASDAQ index: grangertest(NASDAQ~Sentiment, order =1,data=data) Model 1: NAS ~ Lags(NAS, 1:2) + Lags(Sent, 1:2) Model 2: NAS ~ Lags(NAS, 1:2) Res.Df Df F Pr(>F) 1 25 2 27 -2 2.0128 0.1547 For lags of up to 3, the F stat is not significant at the 5% level thus not rejecting the null hypothesis. The results from R output for sentiment with the null hypothesis to not granger cause NASDAQ are: grangertest(Sentiment~NASDAQ, order =1,data=data) Model 1: Sent ~ Lags(Sent, 1:1) + Lags(NAS, 1:1) Model 2: Sent ~ Lags(Sent, 1:1) Res.Df Df F Pr(>F) 1 28 2 29 -1 4.0874 0.05285 . Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ With the t-stat statistically significant at ~5%, the null hypothesis is rejected. The NASDAQ thus granger causes the tweets sentiment level. Examination of tweets We next examine the tweets that are downloaded. There are in all 17k tweets over 4 days x 24 hours. This works out to 177 tweets per hour, which is our measure of analysis. Most of the tweets are ‘reporting’ in nature, which substantiates the study result that the NASDAQ movements granger causes the tweets sentiment. Some examples of the tweets are: 9 The test statistic of the Granger test is the F-stat Where s1 is the sum of squared errors of the OLS of y against lagged y up to order n and s2 is the sum of squares of the OLS of y against lagged y and x to order n.
  • 6. 2014-04-15 18:49:25|@cnbc - newest update on how 29 Nasdaq co bear market Territory. That's significant – 2014-04-15 18:48:19|Nasdaq comp almost positive after (1.9%) loss earlier - what a joy ride for the liquidity machines. #HFT $QQQ 2014-04-15 18:46:26|Money_Mystery Alert@ As expected Nasdaq hits 3966 and took support...now back to 4000 Whilst there are tweets that are ‘analytical in nature’ and potentially drives markets, these are few and far in between. Examples are: 2014-04-15 18:46:27|$AAPL $FB $GOOGL $TWTR What-If NASDAQ falls another 5% from current levels. "CAPM" analysis on a portfolio. 2014-04-15 18:05:19|RT @hakanKRBN: $NASDAQ watching for reversal ..I think 1997 scenario in play. 2014-04-15 18:03:37|Deeper selloff it this happens. If yield curve drops below 2.6 watch out below. #NASDAQ We further premise that the week of 14-17 Apr is a quiet week, and just before the long Easter break. This is a period of low volatility and had no significant news events. As such, there is no feeding of investor sentiment back into the NASDAQ or stock prices in a ‘feedback loop’, which the authors had described as momentum herding. 4. CONCLUSION Our Map Reduce program and subsequent statistical analysis have shown that in times of low volatility, it is the stock market (NASDAQ) that drives tweets sentiment in a more ‘reporting’ mode. This is premised on data that is collected in the week of 14-17 April before the Easter break with no major events. The technical architecture program that we have built is very scalable with a HIVE repository, a generalized Map reduce program and a real-time direct API to the twitter. This may be used in other application which will be useful to the authors. 5. ACKNOWLEDGMENTS Our thanks to Prof Tan Kim Leng for his teaching and guidance during the course of the Big Data module.