What breaks if we switch from REST to gRPC for all internal services?
Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 mo...)
Decision
Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 months), moderate-performance (Class B → hybrid REST/gRPC), and integration-heavy (Class C → remain REST for 12+ months), rather than switching all services to gRPC simultaneously.. Because a blanket REST-to-gRPC migration breaks browser client compatibility, eliminates HTTP caching infrastructure, disrupts debugging workflows (curl, Postman, browser DevTools), requires protobuf schema management overhead, and forces team reskilling simultaneously across all services — tiered classification isolates these breakage points to manageable batches while capturing performance gains where they matter most (services requiring <50ms response time).. Key failure modes: Inconsistent service boundaries causing increased cognitive load for developers maintaining both communication patterns; Premature optimization of low-traffic services consuming resources that could be allocated to actual performance bottlenecks; Misclassification of services leading to wrong protocol choice — e.g., a Class C service that actually has latency-sensitive internal callers. Thresholds: Response time < 50ms for Class A services, Class A migration within 6 months, Class C remains on REST for 12+ months
Next actions
What usually goes wrong
- Risk assessment focused on known threats, missed novel vectors
- Compliance checkbox passed but operational security remained weak
- Low-probability high-impact scenario treated as negligible
Council notes
Attack grid ⓘSurvival rate shows how the recommendation holds under stress scenarios. Low scores indicate conditional vulnerability, not a flaw in the recommendation.
Scenario detail (8)
Evidence boundary
Observed from your filing
- What breaks if we switch from REST to gRPC for all internal services?
Assumptions used for analysis
- The organization runs a microservices architecture with multiple internal services communicating synchronously over REST today
- There exist measurable performance differences between REST/JSON and gRPC/protobuf for the organization's actual payload sizes and call patterns
- The engineering team has capacity to maintain two communication paradigms simultaneously during a multi-month transition
- Service classification into A/B/C tiers can be done objectively based on measurable metrics rather than political negotiation
- The organization's infrastructure (load balancers, service mesh, API gateways, observability stack) can support gRPC — specifically HTTP/2 end-to-end
Inferred candidate specifics
- Adopt a tiered migration strategy: classify internal services into performance-critical (Class A → gRPC within 6 months), moderate-performance (Class B → hybrid REST/gRPC), and integration-heavy (Class C → remain REST for 12+ months), rather than switching all services to gRPC simultaneously.. Because a blanket REST-to-gRPC migration breaks browser client compatibility, eliminates HTTP caching infrastructure, disrupts debugging workflows (curl, Postman, browser DevTools), requires protobuf schema management overhead, and forces team reskilling simultaneously across all services — tiered classification isolates these breakage points to manageable batches while capturing performance gains where they matter most (services requiring <50ms response time).. Key failure modes: Inconsistent service boundaries causing increased cognitive load for developers maintaining both communication patterns; Premature optimization of low-traffic services consuming resources that could be allocated to actual performance bottlenecks; Misclassification of services leading to wrong protocol choice — e.g., a Class C service that actually has latency-sensitive internal callers. Thresholds: Response time < 50ms for Class A services, Class A migration within 6 months, Class C remains on REST for 12+ months
- Create a service inventory spreadsheet cataloging all internal services, including relevant data such as current RPS, p99 latency, the number of consumer services, external integration dependencies, and current debugging/caching dependencies on REST semantics. Use this inventory to classify services into Class A, B, or C based on the tiered criteria and develop a concrete migration sequencing plan.
- b002 won by default as the only surviving implementation branch with a concrete recommendation. Its confidence (0.75) exceeded b006 (0.40), and b006 was structurally disqualified as a reframe that provided zero specific breakage points. However, b002's defense quality was low (0.40) and it was correctly criticized for not directly answering 'what breaks.' The killed b003 had stronger technical specifics and higher original confidence, but was structurally disqualified in round 4 for providing a migration plan instead of a breakage analysis. This tension — the best technical branch was killed while the survivor is adequate but underspecified — is reflected in the moderate confidence score.
- b006: Optimize REST with HTTP/3, compression, and JSON Schema instead of migrating to gRPC
- Defense quality 0.10. Does not answer the question asked — provides zero specific breakage points from a gRPC switch. Dismisses gRPC as 'hype' without engaging its documented advantages for internal communication (strong typing, streaming, code generation). Even as a reframe, it lacks substantiation: no benchmarks comparing REST+HTTP/3 vs gRPC for internal service communication.
- b003 (killed): Phased migration using Envoy sidecar proxies with gRPC-JSON transcoding
- Killed in round 4 because it provided a migration plan without first establishing what breaks. Had the strongest technical specifics (p99 serialization drop from ~12ms JSON to ~2ms protobuf, Envoy transcoding adding ~1-2ms/hop, Buf Schema Registry). Its failure modes (transcoding latency accumulation over 6+ hops, proto schema breaking changes causing silent data corruption) were more specific than the winner's. However, it was structurally disqualified for not answering the diagnostic question.
- b001 (killed): Simple hybrid — REST for low-priority, gRPC for high-demand
Inferred specifics table
| Value | Kind | Basis | Where introduced |
|---|---|---|---|
| Class A → gRPC within 6 months | estimate | synthetic | chosen_path |
| Class C → remain REST for 12+ months | estimate | synthetic | chosen_path |
| services requiring <50ms response time | threshold | synthetic | chosen_path |
| 0.75 | estimate | synthetic | selection_rationale |
| 0.40 | estimate | synthetic | selection_rationale |
| 40 | estimate | synthetic | selection_rationale |
| b006: Optimize REST with HTTP/3 | estimate | synthetic | rejected_alternatives.path |
| quality 0.10 | version | heuristic | rejected_alternatives.rationale |
| benchmarks comparing REST+HTTP/3 vs gRPC for internal | estimate | heuristic | rejected_alternatives.rationale |
| in round 4 because it provided a | estimate | synthetic | rejected_alternatives.rationale |
| drop from ~12ms JSON to ~2ms protobuf | threshold | synthetic | rejected_alternatives.rationale |
| JSON to ~2ms protobuf | threshold | synthetic | rejected_alternatives.rationale |
| transcoding latency accumulation over 6+ hops | technology | synthetic | rejected_alternatives.rationale |
| in round 3 for assuming clean categorization | estimate | synthetic | rejected_alternatives.rationale |
| in round 2 for answering a fundamentally | estimate | synthetic | rejected_alternatives.rationale |
| p99 ~200ms at 100K msg/s | threshold | synthetic | rejected_alternatives.rationale |
| in round 2 as impractical for retrofitting | estimate | synthetic | rejected_alternatives.rationale |
| Benchmarking reveals REST+HTTP/3 with compression closes the | estimate | heuristic | reversal_conditions |
| to within 10% of gRPC for the | threshold | heuristic | reversal_conditions |
| fewer than 3 services meeting Class A | estimate | synthetic | reversal_conditions |
Unknowns blocking a firmer verdict
- The winning branch (b002) was critiqued for not directly inventorying what breaks — it focuses on migration strategy rather than a comprehensive breakage catalog. The killed b003 branch had significantly more specific technical failure modes (proto schema corruption, transcoding latency accumulation) that the winner lacks.
- No branch provided a complete 'what breaks' inventory covering all dimensions: load balancer reconfiguration, observability pipeline changes, testing tool replacement, CI/CD pipeline modifications, service mesh compatibility, and team skill gaps.
- The <50ms threshold for Class A services and the 6/12-month timelines are synthetic — no branch grounded these numbers in measured system data or named engineering heuristics.
- Verdict is largely model-reasoning only — the 3 evidence items (quality mean=1.00) all mapped to b003 which was killed. The surviving winner has no external evidence support.
- REST+HTTP/3 optimization (b006's point) was not seriously evaluated against gRPC for internal services — this remains a legitimate unexplored alternative that could change the recommendation if benchmarked.
Fragility signals
- Hubris: ANNOTATE
Operational signals to watch
Flip conditions
Branch battle map
Battle timeline (4 rounds)
Minority report
What if the opposite were true? What *improves* if we optimize REST with HTTP/3, compression, and JSON Schema instead of chasing gRPC? Both branches fixate on gRPC's hype while ignoring REST's maturity in caching, idempotency, and ecosystem tooling.
Pre-mortem (3 scenarios)
Censor oversight
REOPEN SPAR
The winning decision (b003) provides a detailed migration plan but fails to directly address the original question 'what breaks'. It also assumes certain expertise and doesn't scope infrastructure coupling. Surviving branch b002 offers a more nuanced approach that was not selected despite higher confidence in some model outputs.
Structural issues
- SELECTION MISMATCH: b002 provides a reasonable classification framework and polyglot persistence approach, which is more nuanced than b003's blanket gRPC migration
- CONSULTING FOG: The winning decision describes a migration plan but doesn't directly address 'what breaks' when switching to gRPC