Why Lift-and-Shift Fails Quietly: Architectural Smells That Appear After Migration
The architectural debt that migrated workloads accumulate — and why it doesn't show up until you're in production, paying real bills, and fielding real complaints.
Introduction
Every cloud migration starts with a promise: “We’ll get onto cloud first, optimize later.” That sentence is where the trouble begins.
Lift-and-shift — rehosting an on-premises workload to cloud VMs or containers with minimal re-architecture — is not inherently wrong. The problem is that it leaves on-premises assumptions baked into a system that is now operating in a fundamentally different environment. The failure doesn’t arrive on day one. It arrives three months later, in a Slack alert at 2am, or in an invoice that made a VP ask uncomfortable questions.
This post is an honest accounting of the patterns I see repeatedly across lifted workloads. Not theoretical anti-patterns from a whitepaper — actual architectural smells that surface after migration, often slowly, and often expensively.
Table of Contents
- Introduction
- The Illusion of a Successful Migration
- 1. Latency Amplification
- 2. Chatty Services: The N+1 Problem at Infrastructure Scale
- 3. Cost Surprises: The Bill That Doesn’t Look Like the PoC
- 4. Stateful Assumptions: The Session State Time Bomb
- 5. The Observability Void: Flying Blind in a New Environment
- 6. The Monolith Wearing Microservice Clothing
- The Pre-Migration Architecture Review Checklist
- A Realistic Migration Philosophy
The Illusion of a Successful Migration
The migration checklist looks clean. The app is running. Your runbook said “verify the app responds on port 443 after cutover”—it does. The infrastructure team celebrates. Two weeks later, a senior engineer notices P95 latency has crept up from 80ms to 340ms. Nobody touched the code. Nothing changed. Or did it?
What changed is everything underneath: the network topology, the storage subsystem, the proximity of services to each other, the cost model, and the failure modes. The application code is the same. The environment it assumes is not.
The core trap: On-premises assumptions about network latency, storage I/O, and service co-location are almost always violated in cloud environments—and the application has no way to tell you.
The architectural smells described below all share this root cause. They don’t register as bugs because nothing broke. They register as drift—subtle, compounding, and expensive.
| Smell | When it surfaces | Who notices first |
|---|---|---|
| Latency amplification | Week 2–4 | End users, support tickets |
| Chatty services | Week 3–6 | On-call engineer, APM alert |
| Cost surprises | End of month 1 | Finance, FinOps |
| Stateful assumptions | First scale-out event | Angry users, random 401s |
| Observability void | First production incident | Everyone, at once |
| Monolith in disguise | First dependency failure | On-call, 2am |
1. Latency Amplification
This is the first smell that appears, and it is almost always misdiagnosed. Engineers see higher response times and assume the cloud hardware is slower. It is not. The hardware is often faster. The network is not.
On a physical LAN, a service call between two rack-mounted servers has sub-millisecond round-trip times. In a cloud VPC, even two services in the same availability zone incur a baseline overhead of 1–3ms per call. Cross-AZ jumps can be 5–15ms. Cross-region calls are 40–120ms depending on geography. These numbers seem trivial until you look at how a typical on-premises service was designed.
On-premises: 40 calls × 0.1ms avg = 4ms network overhead
After migration: 40 calls × 4ms avg = 160ms network overhead
Before your application runs a single line of business logic.
Same call graph. Same code. 8× more latency — purely from network topology.
This is not a contrived example. A typical monolith-to-cloud migration of an e-commerce service that was making 40 synchronous downstream calls per checkout request saw aggregate request latency jump from ~50ms to ~420ms, without any code change. The call count didn’t increase. The per-call latency did.
Why engineers miss this
Because latency in on-premises systems is treated as a constant. Engineers design call patterns assuming 0.1–0.5ms round trips and never test for higher values. They also rarely instrument at the individual call level. APM tools get configured after the incident, not before.
Diagnosing it
Pull distributed traces for your slowest P95 requests. Count the spans. If a request is producing more than 10–15 spans and they’re mostly synchronous, you have a latency budget problem.
# Quick span count check with OpenTelemetry + Jaeger
# For a given trace ID, count unique service spans:
curl -s "http://jaeger:16686/api/traces/{traceId}" \
| jq '[.data[0].spans[] | .operationName] | length'
# If this number is > 20 for a single user-facing request,
# you have a chattiness problem worth investigating.
Mitigation
- Consolidate reads with batch APIs — single call, multiple entities
- Introduce async messaging (SNS/SQS, Azure Service Bus) for non-critical paths
- Add Redis/ElastiCache for hot reference data to eliminate repetitive downstream calls
- Enforce connection pooling at the application tier, not just the DB tier
- Audit your
HttpClientorfetchusage for missingkeepAlive/ connection reuse settings
2. Chatty Services: The N+1 Problem at Infrastructure Scale
You know the N+1 query problem at the ORM level. Chatty services are the same anti-pattern, one abstraction layer higher. Instead of your ORM issuing one query per entity in a list, your service architecture issues one HTTP call per item in a response.
On LAN, with 0.2ms call latency, a service that makes 60 calls to render a dashboard is annoying but functional. In a cloud VPC, the same pattern is a 300–600ms tax on every page load—before your application logic has done anything.
Where it hides
Chatty patterns hide in places that were designed for synchronous, co-located communication:
- Direct per-row database reads in a loop
- Synchronous REST chains with no batching
- Per-entity audit log writes (one INSERT per action)
- Naive SDK usage that issues separate API calls for each resource lookup
- GraphQL resolvers making independent DB queries for each field
// BEFORE migration — looks fine on-prem at 0.1ms per call
async function getOrderSummaries(orderIds: string[]) {
return Promise.all(
orderIds.map(id => orderService.getOrder(id)) // N HTTP calls
);
}
// After cloud migration: 50 orders × 4ms avg = 200ms just for fetching.
// Nothing else has run. No business logic. Just fetching.
// AFTER — batch endpoint, single round trip
async function getOrderSummaries(orderIds: string[]) {
return orderService.getOrdersBatch({ ids: orderIds });
}
The connection pool trap
Chatty services also exhaust connection pools faster than on-prem environments. On-premises, services were often co-located on the same host as their dependencies. In cloud, each service call traverses the network and holds an open connection during transit. Under concurrency, this creates connection exhaustion at the database or downstream service before CPU or memory is anywhere near saturation.
-- PostgreSQL connection audit — run this during peak load
SELECT
count(*) as total_connections,
state,
wait_event_type,
wait_event,
application_name
FROM pg_stat_activity
GROUP BY state, wait_event_type, wait_event, application_name
ORDER BY total_connections DESC;
-- If "idle in transaction" count > 20% of max_connections,
-- your app is holding connections open unnecessarily.
-- Solution: PgBouncer in transaction mode.
Mitigation
- Implement batch endpoints on all internal APIs — treat per-entity endpoints as a client convenience, not the default
- Use
DataLoader(or equivalent) pattern to coalesce multiple calls within a single request lifecycle - Set
idle_in_transaction_session_timeouton PostgreSQL to detect connection-holding bugs - Profile connection pool utilization under realistic concurrency before production cutover
3. Cost Surprises: The Bill That Doesn’t Look Like the PoC
The proof of concept ran for two weeks and cost $340. The production migration bill for the first full month is $8,200. Nobody changed the architecture. What happened?
Cloud costs in production bear almost no relationship to PoC costs. Load, data gravity, and idle state are invisible in a two-week test window.
Cost surprises in lifted workloads cluster around three sources that on-premises budgets never accounted for explicitly.
Data egress: the hidden tax on distributed systems
On-premises, data moving between servers is free. In cloud, data leaving a region, leaving an AZ, or leaving the cloud provider’s network is metered. A system designed assuming free internal data movement will generate egress charges that are impossible to predict from architecture diagrams alone.
| Pattern | On-prem cost | Cloud cost | Notes |
|---|---|---|---|
| Log aggregation from 10 nodes | $0 | ~$45/mo egress | Unbounded with node count |
| Cross-AZ DB replication | $0 | ~$0.01/GB both directions | Surprise at high write volumes |
| CDN origin pull (unoptimized) | $0 | $0.085–$0.09/GB | Amplified by cache misses |
| Backup to external storage | $0 | Per GB retrieval + egress | DR drills get expensive fast |
| Inter-service traffic (cross-AZ) | $0 | $0.01/GB per direction | Invisible in single-AZ PoCs |
Mitigation: Map every data flow that crosses an AZ or region boundary. Colocate high-bandwidth communicating services in the same AZ. Use VPC endpoints to keep cloud service traffic off the public internet (and off the egress meter).
Right-sizing: the over-provisioning hangover
On-premises server sizing follows a capital expenditure model: you buy headroom for 3–5 years. That instinct carries into cloud. Engineers provision m5.4xlarge instances because the on-prem equivalent was a 16-core server. Cloud doesn’t reward that behavior—you pay for every idle CPU cycle.
Actionable: Use AWS Compute Optimizer or Azure Advisor after 14+ days of production data. Do not right-size during migration—you need a baseline first. But do not let over-provisioned instances run for more than 30 days without a review.
Idle infrastructure: the midnight shift that never clocks out
On-premises servers run 24/7 because the capital cost is sunk. Cloud charges per hour. Development and staging environments that mirror production—spun up for a migration and left running—are a consistent source of surprise bills.
# GitHub Actions: automatic environment teardown
# Scale dev AKS cluster to 0 outside business hours
name: Stop dev cluster
on:
schedule:
- cron: '0 20 * * 1-5' # 8pm weekdays
- cron: '0 8 * * 6' # Saturday morning (safety net)
jobs:
scale-down:
runs-on: ubuntu-latest
steps:
- name: Scale AKS dev cluster to 0
run: |
az aks scale \
--resource-group rg-dev \
--name aks-dev \
--node-count 0
Other cost patterns to audit immediately post-migration
- Unattached EBS volumes / managed disks — VMs decommissioned during migration often leave orphaned disks that continue to bill
- NAT Gateway bandwidth — egress through NAT Gateway is billed per GB; replace with VPC endpoints for AWS service traffic
- Licensing surprises — SQL Server or Oracle licenses tied to physical core counts may not map cleanly to cloud vCPU billing; verify with your licensing agreement before migration
4. Stateful Assumptions: The Session State Time Bomb
This smell detonates the moment you try to scale horizontally—which you will eventually do, because cloud makes horizontal scaling trivially easy and it seems like the obvious fix when CPU utilization spikes.
Many applications lifted from on-prem store session state in memory or on the local filesystem. On-prem, a single server or a sticky load balancer was the entire deployment. In cloud, your auto-scaler spins up three new instances, and suddenly 33% of requests are hitting instances with no session state for that user.
// On-prem pattern — works with single server, silent killer in cloud
app.use(session({
secret: 'keyboard cat',
resave: false,
saveUninitialized: true,
// No store defined — defaults to in-memory MemoryStore
}));
// Cloud-ready pattern: externalize session to Redis
import RedisStore from 'connect-redis';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
app.use(session({
store: new RedisStore({ client: redisClient }),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
cookie: { secure: true, httpOnly: true, maxAge: 3600000 }
}));
Filesystem assumptions
File system dependencies are equally dangerous. Applications that write uploads to /tmp, generate reports to a local path, or cache computed data on disk will silently break when:
- Containers are rescheduled to different nodes
- Kubernetes pods restart due to OOM or liveness probe failure
- Auto-scaling adds a new instance that has no existing local state
Mitigation: Audit every File.WriteAllBytes, fs.writeFile, Path.Combine(AppDomain...), or equivalent. Replace with object storage (S3, Azure Blob) at the upload boundary. Use ephemeral storage only for truly transient scratch data within a single request lifecycle.
5. The Observability Void: Flying Blind in a New Environment
On-premises monitoring stacks—Nagios, Zabbix, in-house Grafana dashboards pointed at Prometheus—do not migrate cleanly. The exporters, agents, and dashboards were tuned for physical hardware metrics: disk I/O, NIC throughput, CPU steal time. These metrics mean almost nothing in a cloud context.
What you need to observe in cloud is different:
- Cold start times and pod scheduling latency
- Spot instance interruption rates
- Managed service throttling (Cosmos DB RU exhaustion, SQS throttling)
- Connection pool utilization over time
- Cost-per-request, not just cost-per-hour
- Distributed trace depth and span count
Almost none of this was instrumented on-prem. The danger window is the period immediately after migration when your legacy monitoring reports “all green” because it’s watching things that are fine, while actual user-facing metrics are degrading invisibly.
Do not lift your monitoring stack. Build a new observability layer before cutover. The minimum viable set: distributed tracing (OpenTelemetry), infrastructure metrics (CloudWatch / Azure Monitor), and user-facing synthetic monitoring with realistic traffic patterns.
Prometheus recording rules for post-migration observability
# Add these before cutover, not after the first incident
groups:
- name: migration_signals
interval: 30s
rules:
- record: job:http_request_duration_p95:rate5m
expr: histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
- record: job:db_connection_pool_saturation:avg
expr: avg(db_pool_active / db_pool_max) by (service)
- record: job:downstream_call_depth:max
expr: max(trace_span_count) by (trace_root_service)
- record: job:egress_bytes_hourly:rate1h
expr: sum(rate(network_transmit_bytes_total[1h])) by (zone, region)
6. The Monolith Wearing Microservice Clothing
This is the most architecturally insidious smell because it looks correct from the outside. The team containerized the application, deployed it to Kubernetes, and set up separate deployments for each service. On the surface: microservices. Underneath: a distributed monolith.
The telltale signs:
- Shared database schemas across “separate” services
- Synchronous HTTP chains: Service A blocks on B, which blocks on C, which blocks on D
- Shared libraries that bundle business logic and deploy identically with every service
- Database transactions that span multiple service boundaries
- Deployments that must be coordinated — you can’t update Service B without also updating A
This pattern is not always avoidable during migration — full service decomposition has its own cost and risk. But you need to know you have it. A distributed monolith you know about and are managing deliberately is an acceptable migration phase. A distributed monolith you think is a clean microservice architecture is a production incident waiting to happen.
Diagnostic: Draw your actual service dependency graph using your APM’s service map view. If it looks like a star with one service in the center that everything calls—that center is your monolith. If it looks like a linear chain (A → B → C → D → E), you have a synchronous dependency pipeline that will cascade-fail under load.
The Pre-Migration Architecture Review Checklist
The smells above are all detectable before migration if you know what to look for. This is the review I run before advising any lift-and-shift engagement.
Call patterns & latency budget
- Count synchronous downstream calls per request at P95 load — flag if > 15
- Identify any call patterns that loop over collections without batching
- Confirm connection pool sizes are appropriate for expected cloud concurrency
State & storage
- Identify all in-process or in-memory state that must survive a pod restart
- Map every place the app reads from or writes to the local filesystem
- Confirm session management does not rely on server affinity or in-memory stores
Cost
- Estimate cross-AZ and cross-region data flows, calculate egress cost at 2× peak
- Identify any licensing model tied to CPU count or physical host (SQL Server, Oracle)
- Catalogue all non-production environments and confirm shutdown automation exists
Observability
- Map existing monitoring agents — identify cloud equivalents before cutover
- Confirm distributed tracing (OpenTelemetry or equivalent) is instrumented before go-live
- Define SLO targets for P95 latency, error rate, and availability before migration
Architecture
- Identify any shared database schema across logical services
- Check for hardcoded IPs or hostnames that assume on-prem DNS resolution
- Verify secret management — on-prem flat files or config files must not migrate to cloud VMs
- Confirm there is no direct dependency on physical host characteristics (CPU topology, NUMA, local NVMe)
A Realistic Migration Philosophy
Lift-and-shift is not a failure state. It’s a phase. The mistake is treating it as a destination.
Every workload you migrate should have a documented list of known architectural debts created by the lift, an owner for each item, and a timeline to address them—agreed before the migration button is pressed, not discovered six months later during a post-mortem.
The smells in this post are not exotic edge cases. They are the default outcome of a standard lift-and-shift operation. The teams that avoid them are not smarter or more experienced. They are more deliberate. They migrate with their eyes open, they instrument before they cut over, and they treat “it’s running” as the beginning of the work, not the end of it.
Moving to cloud does not modernize your architecture. It gives you a new environment in which your existing architectural decisions—good and bad—will be amplified.
The test of a successful migration is not whether the application starts. It’s whether, 90 days later, your latency profile is understood, your cost trend is predictable, and your on-call team is sleeping through the night.
Part of an ongoing series on production-grade cloud architecture.
Next: When Kubernetes Makes Things Worse — Operational Debt in Over-Orchestrated Systems.