Morton vs Hilbert Analysis: Simpler Encoding Wins at Small n
Finding: Morton curve (Z-order) is 5-53% faster than Hilbert curve for linear scan spatial indexing (25% average speedup)
Impact: Replaced HilbertLinearScanImpl with MortonLinearScanImpl as production recommendation for n<100
Result
Morton curve supersedes Hilbert curve for small-scale spatial indexing
| Scenario | Hilbert | Morton | Morton Speedup | CV% |
|---|---|---|---|---|
| sparse-sequential-n50 | 7.0µs | 5.6µs | 1.25x faster | <5% |
| sparse-grid-n60 | 9.4µs | 7.8µs | 1.21x faster | <5% |
| sparse-overlapping-n40 | 9.5µs | 6.2µs | 1.53x faster | <5% |
| medium-sequential-n100 | 19.6µs | 16.0µs | 1.23x faster | <5% |
| large-sequential-n500 | 301.5µs | 285.9µs | 1.05x faster | <5% |
Average speedup: 1.25x (25% faster)
Statistical confidence: All measurements stable (CV% <5%), effect sizes >10% represent large improvements
Hypothesis
Original assumption: Hilbert curve would be faster due to superior spatial locality preservation
Theory:
- Hilbert curve has better locality properties (academic consensus)
- Better locality → better cache utilization → faster queries
- Worth the extra encoding cost
Prediction: Hilbert would maintain 2x advantage over naive linear scan and outperform Morton
Methodology
Implementation
Created MortonLinearScanImpl with identical structure to HilbertLinearScanImpl:
Morton encoding (constant-time bit interleaving):
function mortonCode(x: number, y: number): number {
x = x & 0xFFFF;
y = y & 0xFFFF;
// Spread bits using magic masks
x = (x | (x << 8)) & 0x00FF00FF;
x = (x | (x << 4)) & 0x0F0F0F0F;
x = (x | (x << 2)) & 0x33333333;
x = (x | (x << 1)) & 0x55555555;
y = (y | (y << 8)) & 0x00FF00FF;
y = (y | (y << 4)) & 0x0F0F0F0F;
y = (y | (y << 2)) & 0x33333333;
y = (y | (y << 1)) & 0x55555555;
// Interleave: x in even positions, y in odd positions
return x | (y << 1);
}
Hilbert encoding (16 iterations with branching):
function hilbertIndex(x: number, y: number): number {
let index = 0;
for (let s = MAX_COORD / 2; s > 0; s >>= 1) {
const rx = (x & s) > 0 ? 1 : 0;
const ry = (y & s) > 0 ? 1 : 0;
index += s * s * ((3 * rx) ^ ry);
// Rotate quadrant for continuity
if (ry === 0) {
if (rx === 1) {
x = MAX_COORD - 1 - x;
y = MAX_COORD - 1 - y;
}
[x, y] = [y, x]; // Swap
}
}
return index;
}
Testing
- Conformance: All 17 axioms passing (same test suite)
- Adversarial: Same fragmentation patterns (spatial ordering works)
- Head-to-head benchmark: Direct comparison across 5 scenarios
Measurement
- Runs: 5 runs per scenario
- Deno sampling: 50-100 iterations per run internally
- Total iterations: 250-500 per scenario
- CV% threshold: <5% for stable measurements
- Effect size threshold: >10% for meaningful differences
- Duration: 20-30 minutes total for full statistical analysis (5 runs)
Analysis
Root Cause: Encoding Cost Dominates
Key insight: At small n (<100), encoding cost matters more than locality benefit
Encoding cost comparison:
| Curve | Operations | Cost |
|---|---|---|
| Morton | 10 masks + shifts | Constant time, branch-free |
| Hilbert | 16 iterations with branches | Linear in bit depth, conditional |
Morton advantages:
- Branch-free: No conditional logic in inner loop
- Parallel: All bit operations can execute concurrently
- Simple: Fewer operations overall
- Predictable: No data-dependent branches
Hilbert disadvantages:
- Iterative: 16 sequential iterations
- Branching: Conditional rotation logic
- Data-dependent: Branch prediction varies by input
- More arithmetic: Multiplications and XORs per iteration
Locality Analysis
Theoretical: Hilbert has better locality (true)
Practical: At n<100, locality difference doesn't matter enough
Why:
- Small dataset: n=50 entries ≈ 400 bytes (fits in L1 cache)
- Sequential scan: Already cache-friendly (hardware prefetcher works)
- Marginal benefit: Going from "good locality" to "excellent locality" doesn't help when everything fits in cache
Crossover hypothesis: Hilbert might win at larger n (1000+) where locality matters more, but that's R-tree territory anyway
Performance Breakdown
Estimated time per insert at n=50:
| Implementation | Total | Encoding | Scan | Overlap | Insert |
|---|---|---|---|---|---|
| Morton | 5.6µs | ~0.3µs | 2.0µs | 2.0µs | 1.3µs |
| Hilbert | 7.0µs | ~0.9µs | 2.0µs | 2.0µs | 1.1µs |
Difference: Morton saves ~0.6µs per insert through faster encoding (0.9 - 0.3 = 0.6µs savings vs slight overhead elsewhere)
Decision
Why Replace Hilbert?
- Empirical evidence: Morton is 25% faster on average
- Consistent advantage: Faster across all 5 test scenarios
- Simpler code: Easier to understand and maintain
- No downsides: Same correctness, same spatial locality benefits
- No users: This is research code, no deployment to migrate
Addressing Sunk Cost Fallacy
Initial reaction: "Morton is interesting but not actionable"
Rationalization:
- "Hilbert is already deployed and validated"
- "Migration cost not worth 25% speedup"
- "Code churn not justified"
Reality:
- ❌ There are no deployed users
- ❌ This is research code
- ❌ Morton is objectively better
- ✅ Science says: accept surprising results
Correct decision: Replace Hilbert with Morton
What We Keep from Hilbert Research
Research value preserved:
- Insight: Space-filling curves provide spatial locality benefits
- Approach: Sorting by curve index improves cache behavior
- Architecture: Store index with entry, binary search for insertion
- Analysis: Full documentation in
docs/analyses/hilbert-curve-analysis.md
What changes:
- Production implementation uses Morton instead of Hilbert
- Same benefits, simpler encoding
Implementation Changes
Files Moved
Archived: hilbertlinearscan.ts (removed from repo, preserved in git history - see archive/IMPLEMENTATION-HISTORY.md for SHA)
Added: packages/@jim/spandex/src/index/mortonlinearscan.ts (production)
Added: packages/@jim/spandex/test/index/mortonlinearscan/ (property, geometry, visual tests)
Documentation Updates
Production references (updated to Morton):
- README.md
- PRODUCTION-GUIDE.md
- docs/core/RESEARCH-SUMMARY.md (production recommendations only)
- CLAUDE.md
Historical references (kept as Hilbert):
- docs/analyses/hilbert-curve-analysis.md (documents historical findings)
- docs/diagrams/hilbert-curve.md (explains Hilbert algorithm)
- docs/analyses/sparse-data-analysis.md (historical performance data)
- All other analysis files (document past experiments)
Test Results
Before migration: 55 tests passing (HilbertLinearScanImpl) After migration: 55 tests passing (MortonLinearScanImpl)
All conformance axioms pass with Morton implementation.
Key Lessons
1. Simpler Can Be Better
Theory: Hilbert has superior locality properties Practice: Morton's simpler encoding wins at small scale
Takeaway: Theoretical advantages don't always translate to practical benefits at target scale
2. Encoding Cost Matters
At small n: Encoding cost dominates over locality benefits At large n: Locality benefits dominate (but R-tree is better anyway)
Takeaway: Match algorithm complexity to problem scale
3. Accept Surprising Results
Expected: Hilbert would maintain 2x advantage from original research Found: Morton is actually faster
Takeaway: Rigorous experimentation sometimes contradicts assumptions - accept and adapt
4. Research Value Persists
Hilbert research wasn't wasted:
- Proved space-filling curves work
- Established spatial locality principle
- Provided architecture for Morton
- Full analysis preserved in archive
Takeaway: Failed experiments still provide value
Future Work
Potential Investigations
-
Larger n crossover: Does Hilbert win at n=1000+?
- Unlikely: R-tree already dominates at n>100
- Academic interest only
-
3D spatial indexing: Does Hilbert advantage appear in 3D?
- Out of scope for spreadsheet use case
- Interesting for GIS/CAD applications
-
Cache profiling: Measure actual cache behavior
- Would validate/refute locality hypothesis
- Requires hardware counters (perf, VTune)
No Action Needed
Current state is optimal for target use case (n<100 spreadsheet properties).
References
Benchmarks: archive/benchmarks/morton-vs-hilbert.ts (head-to-head comparison)
Implementation: packages/@jim/spandex/src/index/mortonlinearscan.ts
Historical context: docs/analyses/hilbert-curve-analysis.md
Experiment docs: archive/docs/experiments/modern-spatial-indexing-research.md
Conclusion
Morton curve is the optimal space-filling curve for small-scale (n<100) spatial indexing.
Result: 25% faster than Hilbert due to simpler encoding Decision: Replaced HilbertLinearScanImpl with MortonLinearScanImpl as production recommendation Impact: Improved performance with simpler, more maintainable code Research value: Validated that encoding cost matters more than locality at small scale