Theoretical Foundation

Mathematical model and correctness proofs.

Glossary

Problem

Given: Sequence of 2D range insertions with values
Output: Minimal non-overlapping range decomposition
Constraint: Last writer wins

Mathematical Model

Domain: Let S = β„• Γ— β„• (2D grid of natural numbers: rows Γ— columns)
Values: Let V = any value type (strings, objects, etc.)

State: Maintain function f: S β†’ V βˆͺ {βŠ₯} where:

Operations:

Minimality: No two returned rectangles can be merged without violating value consistency.

Algorithms

Two distinct algorithms, each with multiple implementation strategies:

Algorithm 1: Linear Scan with Rectangle Decomposition

Implementations: See packages/@jim/spandex/src/index/ for current active implementations.

INSERT(R, v):
  1. Find overlapping rectangles O (linear scan through flat array)
  2. Remove O from storage
  3. For each r ∈ O: compute fragments r \ R (set subtraction)
  4. Insert R and valid fragments into flat array

Complexity: O(n) insert, O(nΒ²) for n sequential inserts

Space: O(n) entries (β‰ˆ4n worst case)

Best for: Sparse data (n < 100)

Optimizations: Spatial locality (2x faster), compact storage, TypedArrays

Algorithm 2: Hierarchical R*-tree (O(log n))

INSERT(R, v):
  1. Traverse tree to find overlapping rectangles O (O(log n) average)
  2. For each r ∈ O: compute fragments r \ R (set subtraction)
  3. Insert R and fragments into tree with R* node splitting

R* Split (Beckmann et al., 1990): Choose axis minimizing perimeter sum, choose split minimizing overlap. O(m log m) per split (m=10).

Space: O(n) Best for: Large datasets (n β‰₯ 1000)


R-Tree Insert Complexity

Operation: insert(R_new, v) requires finding and modifying ALL overlapping entries (not just one).

Steps:

  1. Find overlaps: O(k Γ— log n) worst case, O(log n) average (k = overlap count)
  2. Decompose: O(k) - each produces ≀4 fragments
  3. Reinsert: (4k + 1) Γ— O(log n) = O(k Γ— log n)

Overall:

Empirical: Adversarial tests show k β‰ˆ 2.3 average (sublinear growth), not Θ(n).


Correctness Proof

Theorem: Rectangle decomposition maintains valid spatial partition with LWW semantics.

Invariants:

  1. Disjoint: βˆ€ rα΅’, rβ±Ό ∈ R, i β‰  j ⟹ rα΅’ ∩ rβ±Ό = βˆ…
  2. LWW: βˆ€ cell c, value(c) from most recent insert
  3. Coverage: βˆ€ inserted cell c, βˆƒ r ∈ R: c ∈ r
  4. Minimality: No adjacent rectangles with same value

Proof by Induction:

Base case: Empty index trivially satisfies all invariants.

Inductive step: Assume invariants hold before insert(R_new, v_new). Prove they hold after.

Steps:

  1. Find overlaps O =
  2. Remove O: R' = R \ O (preserves disjointness)
  3. Decompose: fragments(r) = r \ R_new for each r ∈ O (≀4 per r)
  4. Insert: R'' = R' βˆͺ {R_new} βˆͺ ⋃fragments(r)

Invariant preservation:

Cumulative fragmentation:

∎


Correctness of Spatial Locality Optimization

Theorem: Space-filling curve ordering preserves all invariants.

Proof: Sorting affects STORAGE ORDER only, not geometric decomposition. Same fragments generated, just stored in different order.

∎


Complexity Analysis

Linear Scan Implementations

Operation Time Space Notes
Insert O(n) O(1) n = existing rectangles
Query (all) O(n) O(n) Return all rectangles
Query (region) O(n) O(k) k = matching rectangles
n Inserts O(nΒ²) O(4n) Worst case: each splits all

R-tree Implementation

Operation Time Space Notes
Insert O(log n) O(log n) Average case tree traversal
Query (all) O(n) O(n) Full tree walk
Query (region) O(log n + k) O(k) k = matching rectangles
n Inserts O(n log n) O(n) Balanced tree structure

Fragmentation Worst-Case

Pathological pattern:

Insert rectangle that overlaps ALL existing rectangles
Each overlap creates up to 4 fragments
Repeat n times

Analysis:

Theoretical worst-case: O(4^n) - EXPONENTIAL (but geometrically impossible)

Why geometrically impossible:

  1. Bounded area constraint: Total grid area is finite β†’ fragments bounded by minimum rectangle size
  2. Disjoint requirement: No overlaps allowed β†’ total fragment area ≀ grid area
  3. Minimum fragment size: Each fragment covers β‰₯1 cell β†’ max fragments ≀ total cells

Practical bounds:


Geometric Bound (Formal Proof)

Theorem: After n insertions into bounded grid (area A), max rectangles R_max ≀ A / A_min.

Proof:

  1. Grid area = A (finite)
  2. Rectangles disjoint (proven above)
  3. Each rectangle area β‰₯ A_min
  4. R Γ— A_min ≀ A
  5. Therefore: R ≀ A / A_min

Implication: Exponential O(4^n) geometrically impossible.

Example: 10^6 cell grid β†’ max 10^6 rectangles, not 4^100 β‰ˆ 10^60

Empirical: 100 pathological inserts β†’ 232 ranges (2.3x), not 4^100

Conclusion: Practical O(n) rectangles after n inserts.

∎


Geometry

Rectangle Format

Standard: [xmin, ymin, xmax, ymax] per ISO 19107:2019 (x=columns, y=rows)

Layered Architecture:

Layer Format Example: Rows 0-9
Core (implementations) Rectangle (closed) [0, 0, 9, 9]
Adapter (GridRange) GridRange (half-open) {startRow: 0, endRow: 10}
Adapter (A1) A1 notation "A1:J10"

Conversion: GridRange ⟷ Rectangle via Β±1 on end coordinates. undefined β†’ ±∞.

Operations

Intersection: A ∩ B β‰  βˆ…

!(a.xmax < b.xmin || b.xmax < a.xmin || a.ymax < b.ymin || b.ymax < a.ymin);

Subtraction: A \ B β†’ ≀4 fragments (cuts rectangle A around B)

Visual example (rectangle A minus overlapping B):

Before:        After decomposition:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  ← Top fragment
β”‚    A    β”‚    β”‚    A    β”‚
β”‚         β”‚    β”œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€
β”‚    β”Œβ”€β”€β” β”‚    β”‚ A β”‚ B β”‚Aβ”‚  ← Left, Center (B wins), Right
β”‚    β”‚ Bβ”‚ β”‚    β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”˜
β”‚    β””β”€β”€β”˜ β”‚    β”‚    A    β”‚  ← Bottom fragment
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Result: A \ B produces 4 disjoint fragments (Top, Bottom, Left, Right)

Inclusive intervals [min, max]:

Half-open intervals [min, max): No Β±1 adjustments needed (end already exclusive).

Testing

Axioms (must hold βˆ€ implementations):

  1. Non-duplication: No identical (bounds, value) pairs
  2. Last-writer-wins: Insertion order matters for overlaps
  3. Disjointness: No overlapping rectangles (maintained after each operation)
  4. Correctness: Property-based random testing

References

  1. de Berg, M., Cheong, O., van Kreveld, M., & Overmars, M. (2008). Computational Geometry: Algorithms and Applications (3rd ed.). Springer-Verlag. ISBN: 978-3-540-77973-5. (Rectangle decomposition algorithms)
  2. ISO 19107:2019 Geographic Information β€” Spatial Schema. International Organization for Standardization. (AABB representation standards)
  3. Samet, H. (1990). The Design and Analysis of Spatial Data Structures. Addison-Wesley. ISBN: 978-0-201-50255-9. (Comprehensive survey of spatial indexing methods)
  4. Guttman, A. (1984). "R-trees: A Dynamic Index Structure For Spatial Searching". Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, pp. 47-57. doi:10.1145/602259.602266 (Hierarchical spatial indexing - for comparison)
  5. Shapiro, M., PreguiΓ§a, N., Baquero, C., & Zawirski, M. (2011). "Conflict-Free Replicated Data Types". In Stabilization, Safety, and Security of Distributed Systems, Lecture Notes in Computer Science vol 6976, pp. 386-400. Springer. doi:10.1007/978-3-642-24550-3_29 (Last-Writer-Wins conflict resolution semantics)

See Also: