BST vs B-Tree: Same Data, Different Depths

Watch the depth difference grow as you add values. This is why B-trees run the world.

Visualizer
Why B-Trees Are The Shit
3
Tree Depth (disk reads per search)
0 BST
0 B-Tree
Binary Search Tree 0 nodes
B-Tree 0 nodes

B-Trees: The Data Structure That Runs The World

Every time you query a database, open a file, or search for anything on disk—you're using a B-tree. Or more likely, its refined cousin, the B+ tree. This isn't some academic curiosity. This is the backbone of modern computing infrastructure.

"Just use a hashmap." — Someone who's never had to persist data to disk, circa every day of my life.

What You Just Saw

Play with the visualizer. Insert 20-30 values. Watch what happens:

  • BST depth: Grows linearly with data (worst case: N levels for N items)
  • B-tree depth: Grows logarithmically with a much larger base

With the same 30 values, your BST might be 10+ levels deep. Your B-tree (order 3)? 2-3 levels. Each level is a disk read. This is the entire point.

What Is a B-Tree?

A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in O(log n) time. It was invented in 1970 by Rudolf Bayer and Edward McCreight while working at Boeing Research Labs.

The key insight: disk I/O is the bottleneck, not CPU cycles. A B-tree is specifically designed to minimize the number of disk reads by:

  • Keeping many keys in each node (high fanout)
  • Ensuring the tree stays balanced (all leaves at same depth)
  • Sizing nodes to match disk block/page sizes

Why B-Trees Are Superior

Operation B-Tree Binary Search Tree Hash Table
Search O(log n) O(log n) to O(n) O(1) avg
Insert O(log n) O(log n) to O(n) O(1) avg
Delete O(log n) O(log n) to O(n) O(1) avg
Range Query O(log n + k) O(log n + k) O(n) 💀
Disk I/O per op ~3-4 reads O(log n) reads Unpredictable

"But hash tables are O(1)!" Sure. In RAM. Now try doing a range query. Try ordering results. Try persisting to disk efficiently. Hash tables fall apart the moment you need anything beyond point lookups.

B+ Tree — The Refined Sibling

The B+ tree is the variant that every serious database uses: MySQL/InnoDB, PostgreSQL, SQLite, Oracle, SQL Server, MongoDB (for indexes). Why?

  • All data lives in leaves. Internal nodes are pure index. More keys per internal node = shallower tree = fewer disk reads.
  • Leaves are linked. Range queries become sequential scans. SELECT * WHERE price BETWEEN 10 AND 100? Find first match, follow the chain. Trivial.
  • Better cache utilization. Internal nodes fit better in memory. Leaves can be streamed.

The Numbers That Matter

A B+ tree with order 100 (100 children per node) storing 1 billion keys:

  • Tree depth: 5 levels
  • Disk reads per search: 5 (often 3-4 with caching)
  • Compare to binary tree: 30 levels, 30 disk reads

This is why databases are fast. This is why your SSD doesn't melt. The math works.

The Reality Check

B-trees were invented in 1970. Fifty-five years later, they're still the optimal choice for disk-based indexing. Think about that.

"What about LSM-trees?" LSM-trees are a write-optimized trade-off. You gain write throughput, you lose read performance, and you spend your life tuning compaction. They have their place (write-heavy workloads, time-series data). But for general-purpose OLTP? B+ trees win.

If you can invent something genuinely better for general-purpose disk-based indexing, you will:

  • Become extremely wealthy
  • Have your name in CS textbooks for the next century
  • Change how every database on Earth works

You won't. Smarter people have tried. The constraints of disk I/O physics haven't changed. B-trees are the local maximum, and it's a very high peak.

Why Your Coworkers Don't Get It

CS fundamentals are dying. Bootcamps don't teach data structures beyond arrays and hashmaps. Universities are cutting systems programming courses. The result:

  • Developers who think Array.filter() is efficient for large datasets
  • "Just use MongoDB" without understanding what indexes actually do
  • People who've never profiled an I/O-bound workload
  • The belief that RAM is infinite and SSDs are magic
Every time someone says "we can just cache it in Redis" without understanding why they need to cache it, a B-tree node sheds a tear.

The Bottom Line

B-trees aren't sexy. They don't have a JavaScript framework. You can't put them on your LinkedIn skills. But every piece of software that stores and retrieves data at scale—databases, filesystems, search engines, version control—uses them or a close variant.

If you work with data and you don't understand B-trees, you're building on foundations you don't comprehend. That's not engineering. That's faith-based development.

Play with the visualizer. Watch the depth difference. Then go look at your database query plans with new eyes.