Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reactive Realtime Big Data with Open Source Lambda Architecture

Reactive Realtime Big Data with Open Source Lambda Architecture

Introducing the Philosophy of Reactive Pattern, Lambda Architecture and some open source tools for implementation in practice

Trieu Nguyen

March 06, 2014
Tweet

More Decks by Trieu Nguyen

Other Decks in Technology

Transcript

  1. Reactive Realtime Big Data with Open Source Lambda Architecture Make

    Big Data as simple as possible, but not simpler
  2. About speakers Nguyễn Tấn Triều from FPT Online Personal blog:

    http://nguyentantrieu.info/blog Big Data blog: http://www.mc2ads.com Lê Kiến Trúc from InfoNam
  3. Contents 1. Big Data, we will see it in 1

    picture 2. Demands → Realtime 3. Solutions → Reactive 4. Dreams in Data-Driven World in 21st century Yes, the Matrix movies
  4. Big Data can solve these problems? 1. Predicting the future

    disasters? 2. Understanding our customers better? 3. Optimizing marketing campaigns in realtime? Let’s see 3 pictures
  5. Weather forecast “many provinces in the Mekong Delta will be

    flooded by the year 2030” → Disaster Response System Source: http://en.wikipedia.org/wiki/Mekong_Delta http://www.wired.co.uk/news/archive/2013-10/28/predicting-disasters
  6. Big Data can solve these problems? NO Big Data is

    just a buzzword. You need (3R): 1. Solve right problems 2. Build the right team 3. Use right tools
  7. Big data Ecosystem • Frameworks: Hadoop Ecosystem, Apache Spark, Apache

    Storm, Facebook Presto, Storm, ... • Patterns: MapReduce, Actor Model, Data Pipeline, ... • Platforms: Amazon Redshift, Cloudera, Pivotal, HortonWorks , IBM, Google Compute Engine, ... • Best Practices: ◦ How Heineken Interacts With Customers Using Big Data ◦ How Nestlé Understands Brand Sentiment Of 2.000 Brands In Real-time Source: http://azadparinda.wordpress.com/2013/10/11/projects-other-than-hadoop/ http://www.bigdata-startups.com/best-practices
  8. Is Hadoop the best solution? Top 4 limitations of Mapreduce

    1. Computation depends on previously computed values 2. Full-text indexing or ad hoc searching 3. Algorithms depend on shared global state 4. Online learning, aka: stream mining (Reactive Functor will fix this issue) Source: http://csci8980-2.blogspot.com/2012/10/limitations-of-mapreduce-where-not-to.html It’s not {Realtime, Responsive} → Let’s find out new creative idea
  9. Lambda Architecture System data query = function(all data) useful data

    Reactive Lambda Architecture System data + context + metadata useful (data + relationship)
  10. • Reactive Functor: functional actor that receives and responses data

    reactively to event source and context (just like neuron cell in your brain) ◦ Original ideas, are got from my advisor in 2007 Source: http://activefunctor.blogspot.com • Lambda Architecture: the hydrid model, named by Nathan Marz, a software engineer at twitter.com for designing Big Data system with 3 core layers ◦ Speed layer: query stream data (realtime processing) ◦ Serving layer: query analyzer ◦ Batch layer: query all data (batch processing) Source: http://www.manning.com/marz Core concepts of Reactive Lambda Architecture
  11. Why reactive ? It’s the philosophy and pattern for designing

    a large application at Internet-scaled. Focus on: 1. event-driven 2. scalable 3. resilient 4. responsive
  12. User story and Demo Problem: Social Data Processing User story:

    User go to Chrome App Store, download the extension called #save2mycloud User selects text, click save and push data to system User will get responses from system • Realtime trending (hot news) • Personalized trending (hot news for you) • Geolocation trending (hot news with context filter) → the solution must be realtime and responsive Let’s test at http://bit.ly/save2mycloud
  13. RxSQL Query Parser (RxGroovy + SQL) Data Collector (Netty) Data

    Crawler (Crawling Actor) Realtime Database (Redis) Batch Database (HDFS + HBase) Reactive Functor Graph Engine (Actor + OrientDB) Messaging (Kafka) Intelligent Algorithms: Spark + Hive Text Indexing: Elasticsearch + Kibana Client side: HTML5 D3 JavaScript Service-side: Netty Groovy Reactive Lambda Architecture for Social Data Processing Stream Topology (Storm API + Akka Actor)
  14. “You may say I'm a dreamer But I'm not the

    only one I hope someday you'll join us And the world will live as one” John Lennon Join with us at http://mc2ads. com