SlideShare a Scribd company logo
1 of 24
Introduction to
Microsoft Azure HDInsight
Dattatrey Sindhol
2
Agenda
Introduction
Hadoop Distributions
Microsoft Azure HDInsight
Microsoft BI and Data Platform
HDInsight - Use Cases
HDInsight - Typical Implementation
Further Learning
3
Introduction
What is Big Data?
“Big Data is a collection
of data sets so large and
complex that it becomes
difficult to process using
on-hand database
management tools or
traditional data processing
applications”.
4
Introduction
Hadoop is an open source
framework, from Apache foundation,
capable of processing very large
volumes of heterogeneous data sets
in a distributed fashion across clusters
of commodity computers and
hardware using a simplified
programming model.
What is Hadoop?
5
Introduction
Conclusion
In simple terms, Big Data is the Challenge and Hadoop is the Solution.
6
Hadoop Distributions
Amazon Elastic
Map Reduce
(EMR)
Cloudera Hortonworks
IBM
InfoSphere
BigInsights
MapR
Pivotal Teradata Intel
Azure
HDInsight
Reference: How the 9 Leading Commercial Hadoop Distributions Stack Up
7
Which Distribution Should I Use?
Cost
Scalability
Availability
Existing Technology Stack
Existing Infrastructure
Existing Skillset
8
HDInsight - Overview
Microsoft’s
Hadoop
Distribution in
the Cloud
Offers Hadoop
on Windows
Platform
Based on
Hortonworks
Data Platform
(HDP)
Tightly
integrated
with Microsoft
Technology
Stack
9
HDInsight - Architecture
10
Microsoft Data Platform and Enterprise BI Ecosystem
11
Why HDInsight?
Microsoft Stack
Runs on Windows
Create & Destroy
On-Demand
DFS Implementation
in Blob Storage
DFS Implementation
in Blob Storage
Store data on Blob
Storage for Later Use
Automation using
PowerShell
Orchestration/Work
flow using SSIS
Scheduling using
SQL Agent
BI & Analytics with
Power BI
12
Considerations
Requires dropping and
re-creating the cluster to
scale-up/down
Storage and Cluster should be in
the same Data Center
13
HDInsight Versions
COMPONENT VERSION 1.6 VERSION 2.1 VERSION 3.0
VERSION 3.1
(Current/Default)
Hortonworks Data Platform (HDP) 1.1 1.3 2.0 2.1.7
Apache Hadoop & YARN 1.0.3 1.2.0 2.2.0 2.4.0
Tez 0.4.0
Apache Pig 0.9.3 0.11.0 0.12.0 0.12.1
Apache Hive & HCatalog 0.9.0 0.11.0 0.12.0 0.13.1
HBase 0.98.0
Apache Sqoop 1.4.2 1.4.3 1.4.4 1.4.4
Apache Oozie 3.2.0 3.3.2 4.0.0 4.0.0
Apache HCatalog 0.4.1 Merged with Hive Merged with Hive Merged with Hive
Apache Templeton 0.1.4 Merged with Hive Merged with Hive Merged with Hive
Ambari API v1.0 1.4.1 >=1.5.1
Zookeeper 3.4.5 3.4.5
Storm 0.9.1
Mahout 0.9.0
Phoenix 4.0.0.2.1.7.0-2162
14
HDInsight Use Case - Iterative Exploration
15
HDInsight Use Case - Data Warehouse on Demand
16
HDInsight Use Case - ETL Automation
17
HDInsight Use Case - BI Integration
18
Typical Implementation
Transactional
Social
Warehouse
Azure
Blob
Blob Blob
Blob Blob
Multi-Node
HDInsight Cluster
MapReduce
• Hive
• Java
Reporting and Analytics
• SSRS
• Excel
• Power BI
Web Logs
Clickstream
Files
(TXT, XML, JSON, ..)
Collaboration
Office 365 / SharePoint
19
Typical Implementation (Contd…)
E-CommerceInternalSystems
OLTP
Transactional
Internal Systems
Customers
Internal Systems
Team
Sqoop
Or AzCopy
Hive Metastore
MapReduce
Hive
Multi-Node
HDInsight Cluster
MapReduce
• Hive
• Pig
• Java
• Python
Collaboration, Reporting, and Analytics• SSRS
• Excel
• Power BI
PowerShell / SSIS / SQL Agent
Subscription & Cluster Management | Data Movement | Job Execution
Warehouse
Web Logs
Social
Web Logs
Azure
Blob Storage
Blob
Blob Blob
Blob
Blob
Blob
Blob
20
Further Reading and Learning Resources
• HDInsight Emulator
• http://azure.microsoft.com
• Learning map for HDInsight: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map
21
References
• http://msdn.microsoft.com/en-us/library/dn749804.aspx
• http://azure.microsoft.com/en-us/documentation/articles/hdinsight-
component-versioning/
• http://msdn.microsoft.com/en-us/library/dn749848.aspx
• http://msdn.microsoft.com/en-us/library/dn749787.aspx
• http://msdn.microsoft.com/en-us/library/dn749805.aspx
• http://msdn.microsoft.com/en-us/library/dn749876.aspx
22
Related Apache Projects
Term Description
Ambari / HUE Deployment, Configuration, and Monitoring
Avro / Parquet / RC / Sequence Data serialization system
Flume / S4 / Storm Collection and import of log and event data
Hbase / Cassandra Column-oriented database scaling to billions of rows
HCatalog Schema and Data Type Sharing over Pig, Hive, and MapReduce
Hive / Drill / Impala Data Warehouse with SQL-Like Access
Hive-QL/HQL SQL-Like Language to Query Hive
Mahout Library of machine learning and data mining algorithms
Pig High-level programming for Hadoop computations
Oozie Orchestration and workflow management
Sqoop Imports data from relational databases
Tez Application framework for graph
Whirr Cloud-agnostic deployment of clusters
MapReduce / YARN
MapReduce is a programming model for distributed data processing. MapReduce has undergone a
complete overhaul in hadoop-0.23 and we now have Map-Reduce 2.0 (MRv2) or YARN.
Zookeeper Configuration management and coordination
THANK YOU
24
Top 10
Mobile Companies
Top 5
Outsourced Product Development Companies
2012 Partner of the year
Windows Azure, Finalist
40
GLOBAL OFFICES
7500
EMPLOYEES
23
COUNTRIES
Excellence Award
Technology Agency of the Year

More Related Content

What's hot

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM Analytics
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData StoryLynn Langit
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 

What's hot (20)

Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love ItIBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
IBM InfoSphere BigInsights for Hadoop: 10 Reasons to Love It
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 

Viewers also liked

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDataWorks Summit
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Jen Stirrup
 
The Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureThe Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureMicrosoft Azure
 
Introduction of Windows azure and overview
Introduction of Windows azure and overviewIntroduction of Windows azure and overview
Introduction of Windows azure and overviewVishal Tandel
 
Introduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows AzureIntroduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows AzureKaushal Bhavsar
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BIPrasad Prabhu (PP)
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
NuVitae Presentation
NuVitae PresentationNuVitae Presentation
NuVitae Presentationnuvitae
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primersogrady
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money sogrady
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BIMSAdvAnalytics
 
A quick introduction to AWS Kinesis
A quick introduction to AWS KinesisA quick introduction to AWS Kinesis
A quick introduction to AWS Kinesisogeisser
 
Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureEduardo Castro
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 

Viewers also liked (20)

Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
The Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft AzureThe Basics of Getting Started With Microsoft Azure
The Basics of Getting Started With Microsoft Azure
 
Introduction of Windows azure and overview
Introduction of Windows azure and overviewIntroduction of Windows azure and overview
Introduction of Windows azure and overview
 
Introduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows AzureIntroduction to Cloud Computing and Windows Azure
Introduction to Cloud Computing and Windows Azure
 
Azure Cloud PPT
Azure Cloud PPTAzure Cloud PPT
Azure Cloud PPT
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
NuVitae Presentation
NuVitae PresentationNuVitae Presentation
NuVitae Presentation
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
 
Open Source + Big Data = Big Money
Open Source + Big Data = Big Money Open Source + Big Data = Big Money
Open Source + Big Data = Big Money
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
Cortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BICortana Analytics Workshop: Developing for Power BI
Cortana Analytics Workshop: Developing for Power BI
 
A quick introduction to AWS Kinesis
A quick introduction to AWS KinesisA quick introduction to AWS Kinesis
A quick introduction to AWS Kinesis
 
INTERNET OF THINGS
INTERNET OF THINGSINTERNET OF THINGS
INTERNET OF THINGS
 
Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 

Similar to Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol

Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightMSDEVMTL
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...ArunshankarArjunan
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...BigDataEverywhere
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 

Similar to Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol (20)

Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsightStéphane Fréchette - Samedi SQL - Introduction to HDInsight
Stéphane Fréchette - Samedi SQL - Introduction to HDInsight
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 

More from HARMAN Services

3 Dimensions Of Transformation
3 Dimensions Of Transformation3 Dimensions Of Transformation
3 Dimensions Of TransformationHARMAN Services
 
Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance HARMAN Services
 
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and GovernanceHow to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and GovernanceHARMAN Services
 
Digital Transformation: Connected API Ecosystems
Digital Transformation: Connected API EcosystemsDigital Transformation: Connected API Ecosystems
Digital Transformation: Connected API EcosystemsHARMAN Services
 
Webinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoTWebinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoTHARMAN Services
 
Microsoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D KeshariaMicrosoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D KeshariaHARMAN Services
 
15 Big Data Billionaires
15 Big Data Billionaires15 Big Data Billionaires
15 Big Data BillionairesHARMAN Services
 
Digital Transformation in Travel
Digital Transformation in TravelDigital Transformation in Travel
Digital Transformation in TravelHARMAN Services
 
Digital Transformation in Retail
Digital Transformation in RetailDigital Transformation in Retail
Digital Transformation in RetailHARMAN Services
 
Digital Transformation in Media
Digital Transformation in MediaDigital Transformation in Media
Digital Transformation in MediaHARMAN Services
 
Digital Transformation in Hospitality
Digital Transformation in HospitalityDigital Transformation in Hospitality
Digital Transformation in HospitalityHARMAN Services
 
Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow HARMAN Services
 
Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study HARMAN Services
 
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - InfographicHow Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - InfographicHARMAN Services
 
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...HARMAN Services
 
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership HARMAN Services
 
24 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 2424 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 24HARMAN Services
 
Webinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected CustomerWebinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected CustomerHARMAN Services
 
5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference5 Takeaways From The UX India Conference
5 Takeaways From The UX India ConferenceHARMAN Services
 
Cross-channel customer engagement: What 150 C-Level executives think about it!
Cross-channel customer engagement: What 150 C-Level executives think about it!Cross-channel customer engagement: What 150 C-Level executives think about it!
Cross-channel customer engagement: What 150 C-Level executives think about it!HARMAN Services
 

More from HARMAN Services (20)

3 Dimensions Of Transformation
3 Dimensions Of Transformation3 Dimensions Of Transformation
3 Dimensions Of Transformation
 
Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance
 
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and GovernanceHow to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
 
Digital Transformation: Connected API Ecosystems
Digital Transformation: Connected API EcosystemsDigital Transformation: Connected API Ecosystems
Digital Transformation: Connected API Ecosystems
 
Webinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoTWebinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoT
 
Microsoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D KeshariaMicrosoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D Kesharia
 
15 Big Data Billionaires
15 Big Data Billionaires15 Big Data Billionaires
15 Big Data Billionaires
 
Digital Transformation in Travel
Digital Transformation in TravelDigital Transformation in Travel
Digital Transformation in Travel
 
Digital Transformation in Retail
Digital Transformation in RetailDigital Transformation in Retail
Digital Transformation in Retail
 
Digital Transformation in Media
Digital Transformation in MediaDigital Transformation in Media
Digital Transformation in Media
 
Digital Transformation in Hospitality
Digital Transformation in HospitalityDigital Transformation in Hospitality
Digital Transformation in Hospitality
 
Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow
 
Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study
 
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - InfographicHow Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
 
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
 
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
 
24 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 2424 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 24
 
Webinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected CustomerWebinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected Customer
 
5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference
 
Cross-channel customer engagement: What 150 C-Level executives think about it!
Cross-channel customer engagement: What 150 C-Level executives think about it!Cross-channel customer engagement: What 150 C-Level executives think about it!
Cross-channel customer engagement: What 150 C-Level executives think about it!
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol

  • 1. Introduction to Microsoft Azure HDInsight Dattatrey Sindhol
  • 2. 2 Agenda Introduction Hadoop Distributions Microsoft Azure HDInsight Microsoft BI and Data Platform HDInsight - Use Cases HDInsight - Typical Implementation Further Learning
  • 3. 3 Introduction What is Big Data? “Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications”.
  • 4. 4 Introduction Hadoop is an open source framework, from Apache foundation, capable of processing very large volumes of heterogeneous data sets in a distributed fashion across clusters of commodity computers and hardware using a simplified programming model. What is Hadoop?
  • 5. 5 Introduction Conclusion In simple terms, Big Data is the Challenge and Hadoop is the Solution.
  • 6. 6 Hadoop Distributions Amazon Elastic Map Reduce (EMR) Cloudera Hortonworks IBM InfoSphere BigInsights MapR Pivotal Teradata Intel Azure HDInsight Reference: How the 9 Leading Commercial Hadoop Distributions Stack Up
  • 7. 7 Which Distribution Should I Use? Cost Scalability Availability Existing Technology Stack Existing Infrastructure Existing Skillset
  • 8. 8 HDInsight - Overview Microsoft’s Hadoop Distribution in the Cloud Offers Hadoop on Windows Platform Based on Hortonworks Data Platform (HDP) Tightly integrated with Microsoft Technology Stack
  • 10. 10 Microsoft Data Platform and Enterprise BI Ecosystem
  • 11. 11 Why HDInsight? Microsoft Stack Runs on Windows Create & Destroy On-Demand DFS Implementation in Blob Storage DFS Implementation in Blob Storage Store data on Blob Storage for Later Use Automation using PowerShell Orchestration/Work flow using SSIS Scheduling using SQL Agent BI & Analytics with Power BI
  • 12. 12 Considerations Requires dropping and re-creating the cluster to scale-up/down Storage and Cluster should be in the same Data Center
  • 13. 13 HDInsight Versions COMPONENT VERSION 1.6 VERSION 2.1 VERSION 3.0 VERSION 3.1 (Current/Default) Hortonworks Data Platform (HDP) 1.1 1.3 2.0 2.1.7 Apache Hadoop & YARN 1.0.3 1.2.0 2.2.0 2.4.0 Tez 0.4.0 Apache Pig 0.9.3 0.11.0 0.12.0 0.12.1 Apache Hive & HCatalog 0.9.0 0.11.0 0.12.0 0.13.1 HBase 0.98.0 Apache Sqoop 1.4.2 1.4.3 1.4.4 1.4.4 Apache Oozie 3.2.0 3.3.2 4.0.0 4.0.0 Apache HCatalog 0.4.1 Merged with Hive Merged with Hive Merged with Hive Apache Templeton 0.1.4 Merged with Hive Merged with Hive Merged with Hive Ambari API v1.0 1.4.1 >=1.5.1 Zookeeper 3.4.5 3.4.5 Storm 0.9.1 Mahout 0.9.0 Phoenix 4.0.0.2.1.7.0-2162
  • 14. 14 HDInsight Use Case - Iterative Exploration
  • 15. 15 HDInsight Use Case - Data Warehouse on Demand
  • 16. 16 HDInsight Use Case - ETL Automation
  • 17. 17 HDInsight Use Case - BI Integration
  • 18. 18 Typical Implementation Transactional Social Warehouse Azure Blob Blob Blob Blob Blob Multi-Node HDInsight Cluster MapReduce • Hive • Java Reporting and Analytics • SSRS • Excel • Power BI Web Logs Clickstream Files (TXT, XML, JSON, ..) Collaboration Office 365 / SharePoint
  • 19. 19 Typical Implementation (Contd…) E-CommerceInternalSystems OLTP Transactional Internal Systems Customers Internal Systems Team Sqoop Or AzCopy Hive Metastore MapReduce Hive Multi-Node HDInsight Cluster MapReduce • Hive • Pig • Java • Python Collaboration, Reporting, and Analytics• SSRS • Excel • Power BI PowerShell / SSIS / SQL Agent Subscription & Cluster Management | Data Movement | Job Execution Warehouse Web Logs Social Web Logs Azure Blob Storage Blob Blob Blob Blob Blob Blob Blob
  • 20. 20 Further Reading and Learning Resources • HDInsight Emulator • http://azure.microsoft.com • Learning map for HDInsight: http://azure.microsoft.com/en-us/documentation/articles/hdinsight-learn-map
  • 21. 21 References • http://msdn.microsoft.com/en-us/library/dn749804.aspx • http://azure.microsoft.com/en-us/documentation/articles/hdinsight- component-versioning/ • http://msdn.microsoft.com/en-us/library/dn749848.aspx • http://msdn.microsoft.com/en-us/library/dn749787.aspx • http://msdn.microsoft.com/en-us/library/dn749805.aspx • http://msdn.microsoft.com/en-us/library/dn749876.aspx
  • 22. 22 Related Apache Projects Term Description Ambari / HUE Deployment, Configuration, and Monitoring Avro / Parquet / RC / Sequence Data serialization system Flume / S4 / Storm Collection and import of log and event data Hbase / Cassandra Column-oriented database scaling to billions of rows HCatalog Schema and Data Type Sharing over Pig, Hive, and MapReduce Hive / Drill / Impala Data Warehouse with SQL-Like Access Hive-QL/HQL SQL-Like Language to Query Hive Mahout Library of machine learning and data mining algorithms Pig High-level programming for Hadoop computations Oozie Orchestration and workflow management Sqoop Imports data from relational databases Tez Application framework for graph Whirr Cloud-agnostic deployment of clusters MapReduce / YARN MapReduce is a programming model for distributed data processing. MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have Map-Reduce 2.0 (MRv2) or YARN. Zookeeper Configuration management and coordination
  • 24. 24 Top 10 Mobile Companies Top 5 Outsourced Product Development Companies 2012 Partner of the year Windows Azure, Finalist 40 GLOBAL OFFICES 7500 EMPLOYEES 23 COUNTRIES Excellence Award Technology Agency of the Year