Transition Zone Analysis: O(n) → O(log n) Crossover Points

Historical Note: This analysis documents crossover point experiments using the 23-scenario benchmark matrix. Absolute timings reflect experimental conditions at the time. Core findings remain valid: crossover varies by workload (n=100-600), with overlap patterns significantly affecting the transition point. For current measurements, see BENCHMARKS.md.

Finding: Crossover points where R-tree overtakes linear scan vary by workload and overlap pattern, ranging from n=100 to n=600.

Result

Workload Pattern Crossover n Linear Scan Advantage R-tree Advantage
Write-heavy Sequential 200 n < 200 n > 200
Write-heavy Grid 100 n < 100 n > 100
Write-heavy Overlapping 600 n < 600 n > 600
Read-heavy Sequential 100 n < 100 n > 100
Read-heavy Grid 100 n < 100 n > 100
Read-heavy Overlapping 500 n < 500 n > 500
Mixed (80/20) Sequential 100 n < 100 n > 100
Mixed (80/20) Grid 100 n < 100 n > 100
Mixed (80/20) Overlapping 600 n < 600 n > 600

Impact: Replaced vague "100-1000 is workload-dependent" with concrete decision thresholds. Enables data-driven implementation selection.


Key Insights

1. Read-Heavy Favor R-trees Sooner

Crossover at n=100 for most read-heavy scenarios. R-tree query pruning (O(log n)) beats linear O(n).

Example at n=100: Linear 207µs vs R*-tree 65µs (3.2x)

2. Overlapping Delays Crossover

Crossover much later (n=500-600). Rectangle decomposition dominates cost regardless of storage.

Write-heavy: Linear wins at n=100 (97µs vs 326µs), loses at n=600 (2.5ms vs 2.0ms)

3. Sequential Shows Fastest Divergence

Minimal overlap = pure storage cost dominates.


Practical Recommendations

For Spreadsheet Properties (Real Use Case)

Typical scenario: n < 100 per property (backgrounds, borders, etc.)

Recommendation: Use MortonLinearScanImpl exclusively

Rationale: All crossover points occur at n ≥ 100, so sparse data always favors linear scan with spatial locality.

For Consolidated or Heavy Usage

Scenario: Single index with many ranges (n > 100)

Decision matrix:

Your Workload Your Pattern Choose
Mostly reads Any RStarTreeImpl
Mostly writes Low overlap RStarTreeImpl (n > 200)
Mostly writes High overlap MortonLinearScan (n < 600)
Mixed (read + write) Low overlap RStarTreeImpl (n > 100)
Mixed (read + write) High overlap Context-dependent

Methodology

Data sizes: n = 100, 200, 300, ..., 1000 (10 points)

Patterns:

Workloads:

Implementations tested:

Historical note: Original analysis used OptimizedLinearScanImpl and ArrayBufferRTreeImpl. MortonLinearScan is 25% faster than the original OptimizedLinearScan.

Statistical quality: Single-run benchmarks with stable p75/p99 values (CV% < 10%)


Validation

Success Criteria

✅ All 90 scenarios (10 sizes × 3 patterns × 3 workloads) ✅ Clear crossover points per workload/pattern ✅ Low variance (p75/p99 within 1.5x) ✅ Quantified "workload-dependent" zone

Limitations

Single-run data (clear trends, low variance). Discrete sampling every 100. Implementation-specific to MortonLinearScan vs R*-tree.


Comparison to Prior Research

Before This Analysis

Data Size Recommendation
n < 100 MortonLinearScan
100 < n < 1000 "Workload-dependent"
n > 1000 RStarTreeImpl

After This Analysis

Data Size Workload Pattern Recommendation
n < 100 All All MortonLinearScanImpl
n > 100 Read-heavy All RStarTreeImpl
n > 200 Write-heavy Sequential RStarTreeImpl
n > 600 Write-heavy Overlapping RStarTreeImpl
n > 1000 All All RStarTreeImpl

Impact: Transition zone refined from 900-value range to specific thresholds per scenario.


References


Conclusion: The O(n) vs O(log n) crossover depends on workload and overlap. For the primary use case (sparse spreadsheet properties with n < 100), linear scan wins universally. For consolidated indices, choose based on workload: read-heavy favors R-tree at n > 100, write-heavy with high overlap favors linear scan until n > 600.