Honeycomb Under the Hood

HONEYCOMB:  UNDER THE HOOD @cyen

Speed of Time Series + Raw Power of Rich Events
= Interactive, Iterative Debugging for Systems

ORIGINS: FACEBOOK’S "SCUBA" "A fast, scalable, distributed, in-memory database built
at Facebook" "used extensively for interactive, ad hoc, analysis queries that run in under a second over live data" + Parse

ORIGINS: FACEBOOK’S "SCUBA" ▸ Flexibility to dive into dependencies and
natural partitions in the data ▸ Fast enough to support natural human query patterns ▸ Started out to debug MySQL performance regressions ▸ Now used anywhere repeated ad-hoc analysis is needed

DESIGN GOALS ▸ Read-time aggregation ▸ Ability to reconstruct raw
rows ▸ Flexible schemas and sparse rows ▸ Speedy analytical reads ("best effort availability") ▸ Near-realtime behavior ▸ Real-world hardware constraints :)

INGESTION GETTING THAT SWEET, SWEET DATA IN

INGESTION GOALS ▸ Simple, straightforward, and fast ▸ Don’t spend
innovation tokens here ▸ SSDs are cheap enough to support needed speeds ▸ Rely on ﬁlesystem to help with things like pruning

INGESTION FLOW API Kafka Storage Internet Dataset: 42

INGESTION FLOW API Kafka Internet Storage Partition: 2 Dataset: 42

INGESTION FLOW API Internet Storage Kafka Partition: 2 Dataset: 42

INGESTION FLOW API Internet Storage Kafka Partition: 2 Offset: 28267
Partition: 2 Dataset: 42

INGESTION FLOW 42, 1493025003, { 1345: "POST", 1373: 27.523 …
} Dataset ID: 42 Timestamp: 1493025003 Column ID: 1345, Value: "POST" Column ID: 1373, Value: 27.523 { }

} Dataset ID: 42 Timestamp: 1493025003 Column ID: 1345, Value: "POST" Column ID: 1373, Value: 27.523 { 3:1493025004 3:"POST" 3:27.523 1:1493025002 1:0.00 0:1493025000 0:"GET" 0:4.208 2:1493025003 } Col 0 Col 1345 Col 1373

} Dataset ID: 42 Timestamp: 1493025003 Column ID: 1345, Value: "POST" Column ID: 1373, Value: 27.523 3:1493025004 3:"POST" 3:27.523 1:1493025002 1:0.00 0:1493025000 0:"GET" 0:4.208 2:1493025003 1:1493025007 0:1493025006 0:"POST" 0:22.199 { } Min/Max Timestamp Latest Index Latest Kafka Offset Min/Max Timestamp Latest Index

INGESTION YOU MAY NOTICE ▸ No indices to maintain on
the write path ▸ No compaction or compression ▸ Open road to optimizations

READS QUERYING THAT DATA BACK OUT

READS GOALS ▸ Only pull minimum data necessary to answer
the question ▸ Approximate whenever possible ▸ Even if data ages out, results (previously-run queries) shouldn’t

READS FLOW Web UI

AVG(total_ms) where method = "POST" (node 3, 5, 6, 8)
READS FLOW Web UI 3 5 6 8

READS FLOW AVG(total_ms) where method = "POST" 3:1493025004 1:1493025002 0:1493025000
0:"GET" 0:4.208 2:1493025003 2:app16 1:app7 0:app25 3:app25 2:js 1:android 0:ios 3:js 2:0.027 1:1.253 0:0.497 3:2.119 2:"POST" 2:0.00 3:"POST" over the last 2 hours

READS FLOW AVG(total_ms) where method = "POST" 3:1493025004 1:1493025002 0:1493025000
0:"GET" 0:4.208 2:1493025003 2:app16 1:app7 0:app25 3:app25 2:js 1:android 0:ios 3:js 2:0.027 1:1.253 0:0.497 3:2.119 2:"POST" 2:0.00 3:"POST"

READS FLOW AVG(total_ms) where method = "POST" 3:1493025004 2:"POST" 1:1493025002
2:0.00 0:1493025000 0:"GET" 0:4.208 2:1493025003 2:app16 1:app7 0:app25 3:app25 2:js 1:android 0:ios 3:js 2:0.027 1:1.253 0:0.497 3:2.119 3:"POST" = 1.073

READS FLOW Web UI 3 5 6 8 S3

TRADEOFFS ▸ Always prefer availability and speed (from the perspective
of the user) ▸ Write-heavy workload means we optimize for write performance (append-only ingest) ▸ Optimizing reads is easier than optimizing writes ▸ Users can input particular patterns to degrade their own query performance

THANKS! Christine Yen @cyen

Honeycomb Under the Hood

Honeycomb Under the Hood

More Decks by Christine Yen

Other Decks in Programming

Featured

Transcript