Theoretical Foundation
Mathematical model and correctness proofs.
Glossary
- AABB: Axis-Aligned Bounding Box
- Rectangle:
[xmin, ymin, xmax, ymax]using closed intervals[min, max](both endpoints included) - GridRange: Google Sheets API type using half-open intervals
[start, end)(end excluded). Converted via adapter. - Spatial Partition: Disjoint rectangles maintained via geometric set difference
- Rectangle Decomposition:
A \ Bproduces β€4 disjoint fragments - Last Writer Wins (LWW): Most recent insertion wins in overlaps
- Spatial Index: Data structure for geometric queries (R-trees, quadtrees, linear scan)
Problem
Given: Sequence of 2D range insertions with values
Output: Minimal non-overlapping range decomposition
Constraint: Last writer wins
Mathematical Model
Domain: Let S = β Γ β (2D grid of natural numbers: rows Γ columns)
Values: Let V = any value type (strings, objects, etc.)
State: Maintain function f: S β V βͺ {β₯} where:
f(row, col) = vmeans cell(row, col)has valuevf(row, col) = β₯means cell is unset (β₯ = "bottom" = undefined)
Operations:
Insert(R, v): For all points p in rectangle R, setf(p) = vQuery(): Return minimal set of rectangles{(Rβ,vβ), (Rβ,vβ), ...}that covers all cells wheref β β₯
Minimality: No two returned rectangles can be merged without violating value consistency.
Algorithms
Two distinct algorithms, each with multiple implementation strategies:
Algorithm 1: Linear Scan with Rectangle Decomposition
Implementations: See packages/@jim/spandex/src/index/ for current active implementations.
INSERT(R, v):
1. Find overlapping rectangles O (linear scan through flat array)
2. Remove O from storage
3. For each r β O: compute fragments r \ R (set subtraction)
4. Insert R and valid fragments into flat array
Complexity: O(n) insert, O(nΒ²) for n sequential inserts
Space: O(n) entries (β4n worst case)
Best for: Sparse data (n < 100)
Optimizations: Spatial locality (2x faster), compact storage, TypedArrays
Algorithm 2: Hierarchical R*-tree (O(log n))
INSERT(R, v):
1. Traverse tree to find overlapping rectangles O (O(log n) average)
2. For each r β O: compute fragments r \ R (set subtraction)
3. Insert R and fragments into tree with R* node splitting
R* Split (Beckmann et al., 1990): Choose axis minimizing perimeter sum, choose split minimizing overlap. O(m log m) per split (m=10).
Space: O(n) Best for: Large datasets (n β₯ 1000)
R-Tree Insert Complexity
Operation: insert(R_new, v) requires finding and modifying ALL overlapping entries (not just one).
Steps:
- Find overlaps: O(k Γ log n) worst case, O(log n) average (k = overlap count)
- Decompose: O(k) - each produces β€4 fragments
- Reinsert: (4k + 1) Γ O(log n) = O(k Γ log n)
Overall:
- Best (k=0): O(log n)
- Average (constant k): O(log n)
- Worst (k = Ξ(n)): O(n log n)
Empirical: Adversarial tests show k β 2.3 average (sublinear growth), not Ξ(n).
Correctness Proof
Theorem: Rectangle decomposition maintains valid spatial partition with LWW semantics.
Invariants:
- Disjoint: β rα΅’, rβ±Ό β R, i β j βΉ rα΅’ β© rβ±Ό = β
- LWW: β cell c, value(c) from most recent insert
- Coverage: β inserted cell c, β r β R: c β r
- Minimality: No adjacent rectangles with same value
Proof by Induction:
Base case: Empty index trivially satisfies all invariants.
Inductive step: Assume invariants hold before insert(R_new, v_new). Prove they hold after.
Steps:
- Find overlaps O =
- Remove O: R' = R \ O (preserves disjointness)
- Decompose: fragments(r) = r \ R_new for each r β O (β€4 per r)
- Insert: R'' = R' βͺ {R_new} βͺ βfragments(r)
Invariant preservation:
- Disjoint: Fragments disjoint from R_new (by set difference) and from R' (subsets of removed) β
- LWW: R_new overwrites (most recent), fragments retain old values β
- Coverage: R_new + fragments cover all cells previously in O β
- Minimality: Optional (most implementations skip)
Cumulative fragmentation:
- Theoretical worst-case: β€4n rectangles after n inserts
- Empirical typical: ~2.3n rectangles (validated in analyses/adversarial-patterns.md)
- Practical bound: O(n) due to spatial locality and geometric constraints
β
Correctness of Spatial Locality Optimization
Theorem: Space-filling curve ordering preserves all invariants.
Proof: Sorting affects STORAGE ORDER only, not geometric decomposition. Same fragments generated, just stored in different order.
- Disjointness: Geometry unchanged β
- LWW: Sorting happens after value assignment β
- Coverage: Same fragments β
β
Complexity Analysis
Linear Scan Implementations
| Operation | Time | Space | Notes |
|---|---|---|---|
| Insert | O(n) | O(1) | n = existing rectangles |
| Query (all) | O(n) | O(n) | Return all rectangles |
| Query (region) | O(n) | O(k) | k = matching rectangles |
| n Inserts | O(nΒ²) | O(4n) | Worst case: each splits all |
R-tree Implementation
| Operation | Time | Space | Notes |
|---|---|---|---|
| Insert | O(log n) | O(log n) | Average case tree traversal |
| Query (all) | O(n) | O(n) | Full tree walk |
| Query (region) | O(log n + k) | O(k) | k = matching rectangles |
| n Inserts | O(n log n) | O(n) | Balanced tree structure |
Fragmentation Worst-Case
Pathological pattern:
Insert rectangle that overlaps ALL existing rectangles
Each overlap creates up to 4 fragments
Repeat n times
Analysis:
- Insert 1: 1 rectangle, 0 fragments β total: 1
- Insert 2: Overlaps 1, creates β€4 fragments β total: 1 + 4 = 5
- Insert 3: Overlaps 5, creates β€20 fragments β total: 1 + 20 = 21
- Insert 4: Overlaps 21, creates β€84 fragments β total: 1 + 84 = 85
- ...
- Insert n: Overlaps (4^(n-1)), creates β€4^n fragments β total: 4^n
Theoretical worst-case: O(4^n) - EXPONENTIAL (but geometrically impossible)
Why geometrically impossible:
- Bounded area constraint: Total grid area is finite β fragments bounded by minimum rectangle size
- Disjoint requirement: No overlaps allowed β total fragment area β€ grid area
- Minimum fragment size: Each fragment covers β₯1 cell β max fragments β€ total cells
Practical bounds:
- Linear worst-case: O(n) to O(4n) rectangles after n inserts
- Empirical typical: ~2.3n (k β 2.3 overlaps per insert, validated via adversarial patterns)
- Example: 100 pathological inserts β 232 ranges (2.3x), not 4^100
Geometric Bound (Formal Proof)
Theorem: After n insertions into bounded grid (area A), max rectangles R_max β€ A / A_min.
Proof:
- Grid area = A (finite)
- Rectangles disjoint (proven above)
- Each rectangle area β₯ A_min
- R Γ A_min β€ A
- Therefore: R β€ A / A_min
Implication: Exponential O(4^n) geometrically impossible.
Example: 10^6 cell grid β max 10^6 rectangles, not 4^100 β 10^60
Empirical: 100 pathological inserts β 232 ranges (2.3x), not 4^100
Conclusion: Practical O(n) rectangles after n inserts.
β
Geometry
Rectangle Format
Standard: [xmin, ymin, xmax, ymax] per ISO 19107:2019 (x=columns, y=rows)
Layered Architecture:
- Core:
Rectanglewith closed intervals[min, max] - Adapters: Convert external API types (GridRange, A1) to Rectangle
| Layer | Format | Example: Rows 0-9 |
|---|---|---|
| Core (implementations) | Rectangle (closed) | [0, 0, 9, 9] |
| Adapter (GridRange) | GridRange (half-open) | {startRow: 0, endRow: 10} |
| Adapter (A1) | A1 notation | "A1:J10" |
Conversion: GridRange β· Rectangle via Β±1 on end coordinates. undefined β Β±β.
Operations
Intersection: A β© B β β
!(a.xmax < b.xmin || b.xmax < a.xmin || a.ymax < b.ymin || b.ymax < a.ymin);
Subtraction: A \ B β β€4 fragments (cuts rectangle A around B)
Visual example (rectangle A minus overlapping B):
Before: After decomposition:
βββββββββββ βββββββββββ β Top fragment
β A β β A β
β β βββββ¬ββββ¬ββ€
β ββββ β β A β B βAβ β Left, Center (B wins), Right
β β Bβ β βββββ΄ββββ΄ββ
β ββββ β β A β β Bottom fragment
βββββββββββ βββββββββββ
Result: A \ B produces 4 disjoint fragments (Top, Bottom, Left, Right)
Inclusive intervals [min, max]:
- Top:
[a.xmin, a.ymin, a.xmax, b.ymin-1] - Bottom:
[a.xmin, b.ymax+1, a.xmax, a.ymax] - Left:
[a.xmin, max(a.ymin,b.ymin), b.xmin-1, min(a.ymax,b.ymax)] - Right:
[b.xmax+1, max(a.ymin,b.ymin), a.xmax, min(a.ymax,b.ymax)]
Half-open intervals [min, max): No Β±1 adjustments needed (end already exclusive).
Testing
Axioms (must hold β implementations):
- Non-duplication: No identical (bounds, value) pairs
- Last-writer-wins: Insertion order matters for overlaps
- Disjointness: No overlapping rectangles (maintained after each operation)
- Correctness: Property-based random testing
References
- de Berg, M., Cheong, O., van Kreveld, M., & Overmars, M. (2008). Computational Geometry: Algorithms and Applications (3rd ed.). Springer-Verlag. ISBN: 978-3-540-77973-5. (Rectangle decomposition algorithms)
- ISO 19107:2019 Geographic Information β Spatial Schema. International Organization for Standardization. (AABB representation standards)
- Samet, H. (1990). The Design and Analysis of Spatial Data Structures. Addison-Wesley. ISBN: 978-0-201-50255-9. (Comprehensive survey of spatial indexing methods)
- Guttman, A. (1984). "R-trees: A Dynamic Index Structure For Spatial Searching". Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57. doi:10.1145/602259.602266 (Hierarchical spatial indexing - for comparison)
- Shapiro, M., PreguiΓ§a, N., Baquero, C., & Zawirski, M. (2011). "Conflict-Free Replicated Data Types". In Stabilization, Safety, and Security of Distributed Systems, Lecture Notes in Computer Science vol 6976, pp. 386-400. Springer. doi:10.1007/978-3-642-24550-3_29 (Last-Writer-Wins conflict resolution semantics)
See Also:
- RESEARCH-SUMMARY.md - Executive summary and production recommendations
- PRODUCTION-GUIDE.md - Implementation selection guide
- BENCHMARKS.md - Empirical performance data