B-Trees: The Data Structure That Runs The World
Every time you query a database, open a file, or search for anything on disk—you're using a B-tree. Or more likely, its refined cousin, the B+ tree. This isn't some academic curiosity. This is the backbone of modern computing infrastructure.
What You Just Saw
Play with the visualizer. Insert 20-30 values. Watch what happens:
- BST depth: Grows linearly with data (worst case: N levels for N items)
- B-tree depth: Grows logarithmically with a much larger base
With the same 30 values, your BST might be 10+ levels deep. Your B-tree (order 3)? 2-3 levels. Each level is a disk read. This is the entire point.
What Is a B-Tree?
A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in O(log n) time. It was invented in 1970 by Rudolf Bayer and Edward McCreight while working at Boeing Research Labs.
The key insight: disk I/O is the bottleneck, not CPU cycles. A B-tree is specifically designed to minimize the number of disk reads by:
- Keeping many keys in each node (high fanout)
- Ensuring the tree stays balanced (all leaves at same depth)
- Sizing nodes to match disk block/page sizes
Why B-Trees Are Superior
| Operation | B-Tree | Binary Search Tree | Hash Table |
|---|---|---|---|
| Search | O(log n) |
O(log n) to O(n) |
O(1) avg |
| Insert | O(log n) |
O(log n) to O(n) |
O(1) avg |
| Delete | O(log n) |
O(log n) to O(n) |
O(1) avg |
| Range Query | O(log n + k) |
O(log n + k) |
O(n) 💀 |
| Disk I/O per op | ~3-4 reads | O(log n) reads |
Unpredictable |
"But hash tables are O(1)!" Sure. In RAM. Now try doing a range query. Try ordering results. Try persisting to disk efficiently. Hash tables fall apart the moment you need anything beyond point lookups.
B+ Tree — The Refined Sibling
The B+ tree is the variant that every serious database uses: MySQL/InnoDB, PostgreSQL, SQLite, Oracle, SQL Server, MongoDB (for indexes). Why?
- All data lives in leaves. Internal nodes are pure index. More keys per internal node = shallower tree = fewer disk reads.
- Leaves are linked. Range queries become sequential scans.
SELECT * WHERE price BETWEEN 10 AND 100? Find first match, follow the chain. Trivial. - Better cache utilization. Internal nodes fit better in memory. Leaves can be streamed.
The Numbers That Matter
A B+ tree with order 100 (100 children per node) storing 1 billion keys:
- Tree depth: 5 levels
- Disk reads per search: 5 (often 3-4 with caching)
- Compare to binary tree: 30 levels, 30 disk reads
This is why databases are fast. This is why your SSD doesn't melt. The math works.
The Reality Check
B-trees were invented in 1970. Fifty-five years later, they're still the optimal choice for disk-based indexing. Think about that.
If you can invent something genuinely better for general-purpose disk-based indexing, you will:
- Become extremely wealthy
- Have your name in CS textbooks for the next century
- Change how every database on Earth works
You won't. Smarter people have tried. The constraints of disk I/O physics haven't changed. B-trees are the local maximum, and it's a very high peak.
Why Your Coworkers Don't Get It
CS fundamentals are dying. Bootcamps don't teach data structures beyond arrays and hashmaps. Universities are cutting systems programming courses. The result:
- Developers who think
Array.filter()is efficient for large datasets - "Just use MongoDB" without understanding what indexes actually do
- People who've never profiled an I/O-bound workload
- The belief that RAM is infinite and SSDs are magic
The Bottom Line
B-trees aren't sexy. They don't have a JavaScript framework. You can't put them on your LinkedIn skills. But every piece of software that stores and retrieves data at scale—databases, filesystems, search engines, version control—uses them or a close variant.
If you work with data and you don't understand B-trees, you're building on foundations you don't comprehend. That's not engineering. That's faith-based development.
Play with the visualizer. Watch the depth difference. Then go look at your database query plans with new eyes.