Research Summary

Problem: Maintain non-overlapping 2D rectangles with last-writer-wins semantics
Solution: Rectangle decomposition (A \ B → ≤4 fragments)
Target: Pure JavaScript environments (browsers, Node.js, Deno, constrained runtimes like Google Apps Script)

Production Recommendations

n (size) Workload Use Why
Any Multi-attribute LazyPartitioned wrapper Independent attribute updates
< 100 All Morton spatial locality O(n) ≈ O(1), faster via locality
100-200 Write-heavy Context-dependent See transition zone analysis
100-600 High overlap Morton spatial locality Decomposition cost dominates
> 200 Read-heavy R-tree (R* split) O(log n) query pruning wins
> 600 All R-tree (R* split) O(log n) hierarchical indexing

See PRODUCTION-GUIDE for implementation details.

Key Findings

This section summarizes research outcomes. For detailed methodology and data, see individual analysis documents linked below.

1. Morton Curve Optimization (5-53% speedup, 25% average)

Finding: Morton curve (Z-order) via bit interleaving provides 25% average speedup over Hilbert curve at small n (range: 5-53% depending on scenario), while being simpler to implement.

Impact: Production algorithm for n<100. Constant-time encoding vs iterative Hilbert.

See morton-vs-hilbert-analysis.md for detailed comparison.

2. Linear Scan Wins for Sparse Data (n < 100)

Finding: O(n) linear scan with spatial locality is 5-10x faster than O(log n) R-trees for n < 100.

Why: Tree construction overhead (node allocation, bbox updates) exceeds traversal benefits at small scales.

Impact: Use Morton spatial locality for n<100, R-tree for n≥100. Crossover varies by workload (see transition zone analysis).

See sparse-data-analysis.md for performance data.

3. R* Split Algorithm (Construction + Quality)

Finding: R* split (Beckmann 1990) is fastest for tree construction and provides best query performance on overlapping/large datasets.

Performance: 20-25x faster construction than Quadratic split, 30-35% faster queries on overlapping data vs Midpoint.

Impact: Production R-tree algorithm. Optimal balance of construction speed and tree quality.

See r-star-analysis.md for split algorithm comparison.

4. Transition Zone Mapped (100 < n < 600)

Finding: Crossover point between linear scan and R-tree varies by workload:

Impact: Concrete thresholds replace "workload-dependent" guidance.

See transition-zone-analysis.md for 23-scenario benchmark matrix.

5. Failed Experiments

FastRTree (R* axis + midpoint split): 1.29x slower, rejected.
Bulk Insert API: 1.01-1.39x slower due to LWW sequential dependency, rejected.
Learned Indexes: Requires ML runtimes (TensorFlow/PyTorch), impractical for constrained environments.

See archive/docs/experiments/ for full analyses.

6. Implementation Constraints

TypedArrays: Only performance optimization compatible with constrained runtimes (Google Apps Script). WASM/WebGPU/SharedArrayBuffer unavailable.

Implementation style: Imperative + TypedArrays required for production (~14x faster than functional style with .flatMap/.filter).

Bundle sizes: Morton ~2.3KB, R-tree ~5.9KB, LazyPartitioned ~2.1KB (minified). Already optimal.

7. Comprehensive Testing (35 Benchmark Scenarios)

Coverage:

Adversarial validation: Pathological patterns (concentric, diagonal, checkerboard) validate O(n) fragmentation bound. Empirical k ≈ 2.3 overlaps per insert.

See adversarial-patterns.md and benchmark-statistics.md.

8. Production Readiness (October 2025)

Optimization study: No further micro-optimizations viable (<10% impact threshold). Current performance is algorithmically determined.

Test coverage: Axiom-based conformance, ASCII snapshots, cross-implementation consistency, adversarial worst-case.

Conclusion: Implementations production-ready. Future gains require new algorithms, not micro-optimizations.

Algorithms

Linear Scan (O(n)): Flat array with Morton spatial locality. Best for n<100 (2x faster via cache locality). Bundle: ~2.3KB.

R-tree (O(log n)): Hierarchical index with R* split (Beckmann 1990). Best for n≥100. Bundle: ~5.9KB.

See packages/@jim/spandex/src/index/ for implementations.

Methodology

Benchmarks: 35 scenarios (algorithmic patterns + user patterns + workloads), 5 runs, CV% <5%.

Testing: Axiom-based correctness (LWW semantics, disjointness, fragment bounds), adversarial worst-case validation (k ≈ 2.3 overlaps/insert), cross-implementation consistency.

Reproduce: deno task bench:update • deno task test • deno task test:adversarial

Documentation Map

Document Purpose
GETTING-STARTED Tutorial for new users
PRODUCTION-GUIDE Algorithm selection guide
TROUBLESHOOTING Common issues and solutions
theoretical-foundation Proofs, complexity analysis
morton-vs-hilbert-analysis Space-filling curve comparison
sparse-data-analysis Why O(n) wins for n<100
transition-zone-analysis Crossover thresholds by workload
r-star-analysis Split algorithm comparison
adversarial-patterns Worst-case fragmentation validation
benchmark-statistics Statistical methodology
alternatives-analysis Why not quadtrees/grids?
archive/docs/experiments/ Failed experiments (preserved learning)

References

Academic Literature

API Documentation


Result: Algorithm choice depends on n and workload. Morton linear scan for n<100, R-tree for n≥100. See PRODUCTION-GUIDE for decision tree.