Theoretical Foundation

Mathematical model and correctness proofs.

Glossary

AABB: Axis-Aligned Bounding Box
Rectangle: [xmin, ymin, xmax, ymax] using closed intervals [min, max] (both endpoints included)
GridRange: Google Sheets API type using half-open intervals [start, end) (end excluded). Converted via adapter.
Spatial Partition: Disjoint rectangles maintained via geometric set difference
Rectangle Decomposition: A \ B produces ≤4 disjoint fragments
Last Writer Wins (LWW): Most recent insertion wins in overlaps
Spatial Index: Data structure for geometric queries (R-trees, quadtrees, linear scan)

Problem

Given: Sequence of 2D range insertions with values
Output: Minimal non-overlapping range decomposition
Constraint: Last writer wins

Mathematical Model

Domain: Let S = ℕ × ℕ (2D grid of natural numbers: rows × columns)
Values: Let V = any value type (strings, objects, etc.)

State: Maintain function f: S → V ∪ {⊥} where:

f(row, col) = v means cell (row, col) has value v
f(row, col) = ⊥ means cell is unset (⊥ = "bottom" = undefined)

Operations:

Insert(R, v): For all points p in rectangle R, set f(p) = v
Query(): Return minimal set of rectangles {(R₁,v₁), (R₂,v₂), ...} that covers all cells where f ≠ ⊥

Minimality: No two returned rectangles can be merged without violating value consistency.

Algorithms

Two distinct algorithms, each with multiple implementation strategies:

Algorithm 1: Linear Scan with Rectangle Decomposition

Implementations: See packages/@jim/spandex/src/index/ for current active implementations.

INSERT(R, v):
  1. Find overlapping rectangles O (linear scan through flat array)
  2. Remove O from storage
  3. For each r ∈ O: compute fragments r \ R (set subtraction)
  4. Insert R and valid fragments into flat array

Complexity: O(n) insert, O(n²) for n sequential inserts

Space: O(n) entries (≈4n worst case)

Best for: Sparse data (n < 100)

Optimizations: Spatial locality (2x faster), compact storage, TypedArrays

Algorithm 2: Hierarchical R*-tree (O(log n))

INSERT(R, v):
  1. Traverse tree to find overlapping rectangles O (O(log n) average)
  2. For each r ∈ O: compute fragments r \ R (set subtraction)
  3. Insert R and fragments into tree with R* node splitting

R* Split (Beckmann et al., 1990): Choose axis minimizing perimeter sum, choose split minimizing overlap. O(m log m) per split (m=10).

Space: O(n) Best for: Large datasets (n ≥ 1000)

R-Tree Insert Complexity

Operation: insert(R_new, v) requires finding and modifying ALL overlapping entries (not just one).

Steps:

Find overlaps: O(k × log n) worst case, O(log n) average (k = overlap count)
Decompose: O(k) - each produces ≤4 fragments
Reinsert: (4k + 1) × O(log n) = O(k × log n)

Overall:

Best (k=0): O(log n)
Average (constant k): O(log n)
Worst (k = Θ(n)): O(n log n)

Empirical: Adversarial tests show k ≈ 2.3 average (sublinear growth), not Θ(n).

Correctness Proof

Theorem: Rectangle decomposition maintains valid spatial partition with LWW semantics.

Invariants:

Disjoint: ∀ rᵢ, rⱼ ∈ R, i ≠ j ⟹ rᵢ ∩ rⱼ = ∅
LWW: ∀ cell c, value(c) from most recent insert
Coverage: ∀ inserted cell c, ∃ r ∈ R: c ∈ r
Minimality: No adjacent rectangles with same value

Proof by Induction:

Base case: Empty index trivially satisfies all invariants.

Inductive step: Assume invariants hold before insert(R_new, v_new). Prove they hold after.

Steps:

Find overlaps O =
Remove O: R' = R \ O (preserves disjointness)
Decompose: fragments(r) = r \ R_new for each r ∈ O (≤4 per r)
Insert: R'' = R' ∪ {R_new} ∪ ⋃fragments(r)

Invariant preservation:

Disjoint: Fragments disjoint from R_new (by set difference) and from R' (subsets of removed) ✓
LWW: R_new overwrites (most recent), fragments retain old values ✓
Coverage: R_new + fragments cover all cells previously in O ✓
Minimality: Optional (most implementations skip)

Cumulative fragmentation:

Theoretical worst-case: ≤4n rectangles after n inserts
Empirical typical: ~2.3n rectangles (validated in analyses/adversarial-patterns.md)
Practical bound: O(n) due to spatial locality and geometric constraints

∎

Correctness of Spatial Locality Optimization

Theorem: Space-filling curve ordering preserves all invariants.

Proof: Sorting affects STORAGE ORDER only, not geometric decomposition. Same fragments generated, just stored in different order.

Disjointness: Geometry unchanged ✓
LWW: Sorting happens after value assignment ✓
Coverage: Same fragments ✓

∎

Complexity Analysis

Linear Scan Implementations

Operation	Time	Space	Notes
Insert	O(n)	O(1)	n = existing rectangles
Query (all)	O(n)	O(n)	Return all rectangles
Query (region)	O(n)	O(k)	k = matching rectangles
n Inserts	O(n²)	O(4n)	Worst case: each splits all

R-tree Implementation

Operation	Time	Space	Notes
Insert	O(log n)	O(log n)	Average case tree traversal
Query (all)	O(n)	O(n)	Full tree walk
Query (region)	O(log n + k)	O(k)	k = matching rectangles
n Inserts	O(n log n)	O(n)	Balanced tree structure

Fragmentation Worst-Case

Pathological pattern:

Insert rectangle that overlaps ALL existing rectangles
Each overlap creates up to 4 fragments
Repeat n times

Analysis:

Insert 1: 1 rectangle, 0 fragments → total: 1
Insert 2: Overlaps 1, creates ≤4 fragments → total: 1 + 4 = 5
Insert 3: Overlaps 5, creates ≤20 fragments → total: 1 + 20 = 21
Insert 4: Overlaps 21, creates ≤84 fragments → total: 1 + 84 = 85
...
Insert n: Overlaps (4^(n-1)), creates ≤4^n fragments → total: 4^n

Theoretical worst-case: O(4^n) - EXPONENTIAL (but geometrically impossible)

Why geometrically impossible:

Bounded area constraint: Total grid area is finite → fragments bounded by minimum rectangle size
Disjoint requirement: No overlaps allowed → total fragment area ≤ grid area
Minimum fragment size: Each fragment covers ≥1 cell → max fragments ≤ total cells

Practical bounds:

Linear worst-case: O(n) to O(4n) rectangles after n inserts
Empirical typical: ~2.3n (k ≈ 2.3 overlaps per insert, validated via adversarial patterns)
Example: 100 pathological inserts → 232 ranges (2.3x), not 4^100

Geometric Bound (Formal Proof)

Theorem: After n insertions into bounded grid (area A), max rectangles R_max ≤ A / A_min.

Proof:

Grid area = A (finite)
Rectangles disjoint (proven above)
Each rectangle area ≥ A_min
R × A_min ≤ A
Therefore: R ≤ A / A_min

Implication: Exponential O(4^n) geometrically impossible.

Example: 10^6 cell grid → max 10^6 rectangles, not 4^100 ≈ 10^60

Empirical: 100 pathological inserts → 232 ranges (2.3x), not 4^100

Conclusion: Practical O(n) rectangles after n inserts.

∎

Geometry

Rectangle Format

Standard: [xmin, ymin, xmax, ymax] per ISO 19107:2019 (x=columns, y=rows)

Layered Architecture:

Core: Rectangle with closed intervals [min, max]
Adapters: Convert external API types (GridRange, A1) to Rectangle

Layer	Format	Example: Rows 0-9
Core (implementations)	Rectangle (closed)	`[0, 0, 9, 9]`
Adapter (GridRange)	GridRange (half-open)	`{startRow: 0, endRow: 10}`
Adapter (A1)	A1 notation	`"A1:J10"`

Conversion: GridRange ⟷ Rectangle via ±1 on end coordinates. undefined → ±∞.

Operations

Intersection: A ∩ B ≠ ∅

!(a.xmax < b.xmin || b.xmax < a.xmin || a.ymax < b.ymin || b.ymax < a.ymin);

Subtraction: A \ B → ≤4 fragments (cuts rectangle A around B)

Visual example (rectangle A minus overlapping B):

Before:        After decomposition:
┌─────────┐    ┌─────────┐  ← Top fragment
│    A    │    │    A    │
│         │    ├───┬───┬─┤
│    ┌──┐ │    │ A │ B │A│  ← Left, Center (B wins), Right
│    │ B│ │    └───┴───┴─┘
│    └──┘ │    │    A    │  ← Bottom fragment
└─────────┘    └─────────┘

Result: A \ B produces 4 disjoint fragments (Top, Bottom, Left, Right)

Inclusive intervals [min, max]:

Top: [a.xmin, a.ymin, a.xmax, b.ymin-1]
Bottom: [a.xmin, b.ymax+1, a.xmax, a.ymax]
Left: [a.xmin, max(a.ymin,b.ymin), b.xmin-1, min(a.ymax,b.ymax)]
Right: [b.xmax+1, max(a.ymin,b.ymin), a.xmax, min(a.ymax,b.ymax)]

Half-open intervals [min, max): No ±1 adjustments needed (end already exclusive).

Testing

Axioms (must hold ∀ implementations):

Non-duplication: No identical (bounds, value) pairs
Last-writer-wins: Insertion order matters for overlaps
Disjointness: No overlapping rectangles (maintained after each operation)
Correctness: Property-based random testing

References

de Berg, M., Cheong, O., van Kreveld, M., & Overmars, M. (2008). Computational Geometry: Algorithms and Applications (3rd ed.). Springer-Verlag. ISBN: 978-3-540-77973-5. (Rectangle decomposition algorithms)
ISO 19107:2019 Geographic Information — Spatial Schema. International Organization for Standardization. (AABB representation standards)
Samet, H. (1990). The Design and Analysis of Spatial Data Structures. Addison-Wesley. ISBN: 978-0-201-50255-9. (Comprehensive survey of spatial indexing methods)
Guttman, A. (1984). "R-trees: A Dynamic Index Structure For Spatial Searching". Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57. doi:10.1145/602259.602266 (Hierarchical spatial indexing - for comparison)
Shapiro, M., Preguiça, N., Baquero, C., & Zawirski, M. (2011). "Conflict-Free Replicated Data Types". In Stabilization, Safety, and Security of Distributed Systems, Lecture Notes in Computer Science vol 6976, pp. 386-400. Springer. doi:10.1007/978-3-642-24550-3_29 (Last-Writer-Wins conflict resolution semantics)

See Also:

RESEARCH-SUMMARY.md - Executive summary and production recommendations
PRODUCTION-GUIDE.md - Implementation selection guide
BENCHMARKS.md - Empirical performance data