Design Decision: Rectangle Decomposition
Finding: Rectangle decomposition with LWW semantics is the optimal approach for spreadsheet property storage.
Result
Chosen Algorithm: Maintain non-overlapping rectangles via geometric set difference (A \ B → ≤4 fragments)
| Approach | Match to Problem | Complexity | Status |
|---|---|---|---|
| Rectangle decomp | Perfect | Simple | ✅ Chosen |
| Quadtree | Poor | Complex | ❌ Doesn't fit API |
| Interval tree | Poor | Complex | ❌ Point queries |
| Sparse 2D array | Poor | Memory | ❌ Space inefficient |
| Vector-based (SIMD) | N/A | N/A | ❌ Not GAS-compatible |
Impact: Validated core algorithm choice; alternatives don't fit spreadsheet API pattern.
Why rectangle decomposition works:
- Directly produces
GridRange[]output (API requirement) - LWW semantics via overlap resolution (business logic)
- Efficient for both sparse and dense data (with appropriate storage)
Storage strategies (see sparse-data-analysis.md):
- Linear scan (O(n)): Optimal for sparse data (n < 100)
- R-tree (O(log n)): Optimal for large data (n > 1000)
See also: theoretical-foundation.md
❌ Alternatives Considered (And Why They Didn't Work)
1. Quadtree / Region Quadtree
Concept: Recursively subdivide 2D space into quadrants
┌─────┬─────┐
│ A │ B │
├─────┼─────┤
│ C │ D │
└─────┴─────┘
Pros:
- O(log n) point queries
- Good for sparse data
- Spatial locality
Cons:
- ❌ "Get all ranges" requires full tree walk + reconstruction into rectangles
- ❌ Fragmentation at quad boundaries (not natural rectangle boundaries)
- ❌ Output format mismatch (quadrants ≠ GridRange)
Verdict: Doesn't fit. API expects rectangles, not quadrants.
2. Interval Tree (2D Segment Tree)
Concept: Two-level interval tree (rows × columns)
Pros:
- O(log² n + k) range queries
- Handles overlaps
Cons:
- ❌ Still O(n) for "get all ranges"
- ❌ Doesn't maintain non-overlapping invariant
- ❌ Output still needs decomposition step
- ❌ Complex construction (O(n log n))
Verdict: Adds complexity without solving the core problem.
3. Sparse 2D Array / HashMap
Concept: Store per-cell values in Map<(row,col), T>
Pros:
- O(1) point lookup
- Simple implementation
Cons:
- ❌ O(rows × cols) space for large ranges
- ❌ "Get all ranges" requires scanning ALL cells + grouping (expensive!)
- ❌ Large formatting range (e.g., entire column) = millions of entries
Example: Format column A (rows 0-1M) = 1 million HashMap entries vs 1 rectangle
Verdict: Unacceptable space complexity.
4. Vector-Based (SIMD)
Concept: Use SIMD operations for overlap detection
Pros:
- Potential speedup on overlap checks
- Modern hardware utilization
Cons:
- ❌ Still requires O(n) scan
- ❌ JavaScript/TypeScript has limited SIMD support
- ❌ Fragment generation doesn't parallelize well
- ❌ Modern V8 already optimizes array iteration
Verdict: Marginal gains, not worth the complexity.
5. Skip List / B-tree
Concept: 1D skip list or B-tree per row, then per column
Pros:
- O(log n) lookup within a row
Cons:
- ❌ 2D queries still require scanning multiple rows
- ❌ Doesn't maintain non-overlapping invariant
- ❌ Complex to maintain consistency across dimensions
Verdict: Designed for 1D data, awkward for 2D rectangles.
6. Other R-tree Variants (R*, Hilbert R-tree, STR)
Concept: Improved R-tree variants with better split heuristics or bulk loading
Variants:
- R*-tree: Optimized split algorithm (minimizes overlap + area + perimeter)
- Hilbert R-tree: Uses space-filling curves for better spatial locality
- STR-tree: Sort-Tile-Recursive bulk loading for static datasets
These variants provide better R-tree performance, but they're still O(log n) with hierarchical overhead. For n < 100, flat storage wins regardless. Worth exploring if extending to very large datasets (n > 10,000).
Note: My R-tree uses a simplified splitting strategy. Production systems targeting n > 10,000 should consider R*-tree or Hilbert R-tree.
Why Rectangle Decomposition Wins
Key Insight: API Shape Determines Data Structure
Google Sheets API signature:
applyFormat(ranges: GridRange[], format: Format): void
getAllFormattedRanges(): Array<{gridRange: GridRange, format: Format}>
API requirements:
- Output must be rectangles (not points or quadrants)
- Output must be minimal (users don't want 1000 tiny ranges)
- Output must be non-overlapping (API doesn't support precedence)
Comparison Matrix
| Approach | Insert | Get All | Point Query | Space | Output Format |
|---|---|---|---|---|---|
| Flat Array | O(n) | O(n) | O(n) | O(n) | ✅ GridRange[] |
| R-tree | O(log n) | O(n) | O(log n) | O(n log n) | ✅ GridRange[] |
| Quadtree | O(log n) | O(n) | O(log n) | O(n log n) | ❌ Needs conversion |
| Interval Tree | O(log²n) | O(n) | O(log²n) | O(n log n) | ❌ Needs decomposition |
| HashMap | O(r×c) | O(r×c) | O(1) | O(r×c) | ❌ Needs grouping |
Primary operation: query() (all ranges) happens on every render, export, serialize, undo/redo
Secondary operation: insert() happens during user edits
Rare operation: Point queries (handled with linear scan, fast enough for n < 100)
Real-World Validation
From sparse-data-analysis.md:
- Typical spreadsheet: n < 100 ranges per property
- Flat array: 0.01ms for n=100 (cache-friendly, V8-optimized)
- R-tree overhead only justified when n > 1000
When Would Alternatives Win?
| Use Case | Better Choice | Why |
|---|---|---|
| Point queries dominant | Quadtree | O(log n) point lookup vs O(n) scan |
| Millions of tiny ranges | Spatial hash | Grid-based partitioning |
| 3D+ dimensions | KD-Tree, Octree | Generalizes to higher dimensions |
| Dense, uniform grid | 2D Array | Direct indexing |
But for spreadsheet property storage: Rectangle decomposition is optimal.
Conclusion
Rectangle decomposition (flat + R-tree) is the right choice because:
- ✅ Matches API requirements - Query returns ranges directly
- ✅ Optimal for common case -
query()(all ranges) is O(n), can't do better - ✅ Simple mental model - Easy to reason about and maintain
- ✅ Performance validated - BENCHMARKS.md for empirical data
- ✅ Flexible - Flat for sparse (n < 100), R-tree for dense (n > 1000)
The key insight: Let the API shape determine your data structure. When the API expects rectangles, store rectangles.
See also:
- sparse-data-analysis.md - Why flat storage wins for typical spreadsheet usage
- theoretical-foundation.md - Mathematical foundation and complexity analysis
- BENCHMARKS.md - Empirical performance measurements