Introduction

Every cloud migration starts with a promise: “We’ll get onto cloud first, optimize later.” That sentence is where the trouble begins.

Lift-and-shift — rehosting an on-premises workload to cloud VMs or containers with minimal re-architecture — is not inherently wrong. The problem is that it leaves on-premises assumptions baked into a system that is now operating in a fundamentally different environment. The failure doesn’t arrive on day one. It arrives three months later, in a Slack alert at 2am, or in an invoice that made a VP ask uncomfortable questions.

This post is an honest accounting of the patterns I see repeatedly across lifted workloads. Not theoretical anti-patterns from a whitepaper — actual architectural smells that surface after migration, often slowly, and often expensively.

Table of Contents


The Illusion of a Successful Migration

The migration checklist looks clean. The app is running. Your runbook said “verify the app responds on port 443 after cutover”—it does. The infrastructure team celebrates. Two weeks later, a senior engineer notices P95 latency has crept up from 80ms to 340ms. Nobody touched the code. Nothing changed. Or did it?

What changed is everything underneath: the network topology, the storage subsystem, the proximity of services to each other, the cost model, and the failure modes. The application code is the same. The environment it assumes is not.

The core trap: On-premises assumptions about network latency, storage I/O, and service co-location are almost always violated in cloud environments—and the application has no way to tell you.

The architectural smells described below all share this root cause. They don’t register as bugs because nothing broke. They register as drift—subtle, compounding, and expensive.

Smell When it surfaces Who notices first
Latency amplification Week 2–4 End users, support tickets
Chatty services Week 3–6 On-call engineer, APM alert
Cost surprises End of month 1 Finance, FinOps
Stateful assumptions First scale-out event Angry users, random 401s
Observability void First production incident Everyone, at once
Monolith in disguise First dependency failure On-call, 2am

1. Latency Amplification

This is the first smell that appears, and it is almost always misdiagnosed. Engineers see higher response times and assume the cloud hardware is slower. It is not. The hardware is often faster. The network is not.

On a physical LAN, a service call between two rack-mounted servers has sub-millisecond round-trip times. In a cloud VPC, even two services in the same availability zone incur a baseline overhead of 1–3ms per call. Cross-AZ jumps can be 5–15ms. Cross-region calls are 40–120ms depending on geography. These numbers seem trivial until you look at how a typical on-premises service was designed.

On-premises: 40 calls × 0.1ms avg = 4ms network overhead
After migration: 40 calls × 4ms avg = 160ms network overhead

Before your application runs a single line of business logic.
on-premises after lift-and-shift API Gateway Service A Service B Database 0.1ms 0.1ms 0.2ms ~1.2ms × 40 calls = 48ms total API Gateway Service A Service B Database 2ms 3ms 5ms ~10ms × 40 calls = 400ms total

Same call graph. Same code. 8× more latency — purely from network topology.

This is not a contrived example. A typical monolith-to-cloud migration of an e-commerce service that was making 40 synchronous downstream calls per checkout request saw aggregate request latency jump from ~50ms to ~420ms, without any code change. The call count didn’t increase. The per-call latency did.

Why engineers miss this

Because latency in on-premises systems is treated as a constant. Engineers design call patterns assuming 0.1–0.5ms round trips and never test for higher values. They also rarely instrument at the individual call level. APM tools get configured after the incident, not before.

Diagnosing it

Pull distributed traces for your slowest P95 requests. Count the spans. If a request is producing more than 10–15 spans and they’re mostly synchronous, you have a latency budget problem.

# Quick span count check with OpenTelemetry + Jaeger
# For a given trace ID, count unique service spans:

curl -s "http://jaeger:16686/api/traces/{traceId}" \
  | jq '[.data[0].spans[] | .operationName] | length'

# If this number is > 20 for a single user-facing request,
# you have a chattiness problem worth investigating.

Mitigation

  • Consolidate reads with batch APIs — single call, multiple entities
  • Introduce async messaging (SNS/SQS, Azure Service Bus) for non-critical paths
  • Add Redis/ElastiCache for hot reference data to eliminate repetitive downstream calls
  • Enforce connection pooling at the application tier, not just the DB tier
  • Audit your HttpClient or fetch usage for missing keepAlive / connection reuse settings

2. Chatty Services: The N+1 Problem at Infrastructure Scale

You know the N+1 query problem at the ORM level. Chatty services are the same anti-pattern, one abstraction layer higher. Instead of your ORM issuing one query per entity in a list, your service architecture issues one HTTP call per item in a response.

On LAN, with 0.2ms call latency, a service that makes 60 calls to render a dashboard is annoying but functional. In a cloud VPC, the same pattern is a 300–600ms tax on every page load—before your application logic has done anything.

Where it hides

Chatty patterns hide in places that were designed for synchronous, co-located communication:

  • Direct per-row database reads in a loop
  • Synchronous REST chains with no batching
  • Per-entity audit log writes (one INSERT per action)
  • Naive SDK usage that issues separate API calls for each resource lookup
  • GraphQL resolvers making independent DB queries for each field
// BEFORE migration — looks fine on-prem at 0.1ms per call
async function getOrderSummaries(orderIds: string[]) {
  return Promise.all(
    orderIds.map(id => orderService.getOrder(id)) // N HTTP calls
  );
}

// After cloud migration: 50 orders × 4ms avg = 200ms just for fetching.
// Nothing else has run. No business logic. Just fetching.

// AFTER — batch endpoint, single round trip
async function getOrderSummaries(orderIds: string[]) {
  return orderService.getOrdersBatch({ ids: orderIds });
}

The connection pool trap

Chatty services also exhaust connection pools faster than on-prem environments. On-premises, services were often co-located on the same host as their dependencies. In cloud, each service call traverses the network and holds an open connection during transit. Under concurrency, this creates connection exhaustion at the database or downstream service before CPU or memory is anywhere near saturation.

-- PostgreSQL connection audit — run this during peak load
SELECT 
  count(*) as total_connections,
  state,
  wait_event_type,
  wait_event,
  application_name
FROM pg_stat_activity
GROUP BY state, wait_event_type, wait_event, application_name
ORDER BY total_connections DESC;

-- If "idle in transaction" count > 20% of max_connections,
-- your app is holding connections open unnecessarily.
-- Solution: PgBouncer in transaction mode.

Mitigation

  • Implement batch endpoints on all internal APIs — treat per-entity endpoints as a client convenience, not the default
  • Use DataLoader (or equivalent) pattern to coalesce multiple calls within a single request lifecycle
  • Set idle_in_transaction_session_timeout on PostgreSQL to detect connection-holding bugs
  • Profile connection pool utilization under realistic concurrency before production cutover

3. Cost Surprises: The Bill That Doesn’t Look Like the PoC

The proof of concept ran for two weeks and cost $340. The production migration bill for the first full month is $8,200. Nobody changed the architecture. What happened?

Cloud costs in production bear almost no relationship to PoC costs. Load, data gravity, and idle state are invisible in a two-week test window.

Cost surprises in lifted workloads cluster around three sources that on-premises budgets never accounted for explicitly.

Data egress: the hidden tax on distributed systems

On-premises, data moving between servers is free. In cloud, data leaving a region, leaving an AZ, or leaving the cloud provider’s network is metered. A system designed assuming free internal data movement will generate egress charges that are impossible to predict from architecture diagrams alone.

Pattern On-prem cost Cloud cost Notes
Log aggregation from 10 nodes $0 ~$45/mo egress Unbounded with node count
Cross-AZ DB replication $0 ~$0.01/GB both directions Surprise at high write volumes
CDN origin pull (unoptimized) $0 $0.085–$0.09/GB Amplified by cache misses
Backup to external storage $0 Per GB retrieval + egress DR drills get expensive fast
Inter-service traffic (cross-AZ) $0 $0.01/GB per direction Invisible in single-AZ PoCs

Mitigation: Map every data flow that crosses an AZ or region boundary. Colocate high-bandwidth communicating services in the same AZ. Use VPC endpoints to keep cloud service traffic off the public internet (and off the egress meter).

Right-sizing: the over-provisioning hangover

On-premises server sizing follows a capital expenditure model: you buy headroom for 3–5 years. That instinct carries into cloud. Engineers provision m5.4xlarge instances because the on-prem equivalent was a 16-core server. Cloud doesn’t reward that behavior—you pay for every idle CPU cycle.

Actionable: Use AWS Compute Optimizer or Azure Advisor after 14+ days of production data. Do not right-size during migration—you need a baseline first. But do not let over-provisioned instances run for more than 30 days without a review.

Idle infrastructure: the midnight shift that never clocks out

On-premises servers run 24/7 because the capital cost is sunk. Cloud charges per hour. Development and staging environments that mirror production—spun up for a migration and left running—are a consistent source of surprise bills.

# GitHub Actions: automatic environment teardown
# Scale dev AKS cluster to 0 outside business hours

name: Stop dev cluster
on:
  schedule:
    - cron: '0 20 * * 1-5'   # 8pm weekdays
    - cron: '0 8 * * 6'       # Saturday morning (safety net)

jobs:
  scale-down:
    runs-on: ubuntu-latest
    steps:
      - name: Scale AKS dev cluster to 0
        run: |
          az aks scale \
            --resource-group rg-dev \
            --name aks-dev \
            --node-count 0

Other cost patterns to audit immediately post-migration

  • Unattached EBS volumes / managed disks — VMs decommissioned during migration often leave orphaned disks that continue to bill
  • NAT Gateway bandwidth — egress through NAT Gateway is billed per GB; replace with VPC endpoints for AWS service traffic
  • Licensing surprises — SQL Server or Oracle licenses tied to physical core counts may not map cleanly to cloud vCPU billing; verify with your licensing agreement before migration

4. Stateful Assumptions: The Session State Time Bomb

This smell detonates the moment you try to scale horizontally—which you will eventually do, because cloud makes horizontal scaling trivially easy and it seems like the obvious fix when CPU utilization spikes.

Many applications lifted from on-prem store session state in memory or on the local filesystem. On-prem, a single server or a sticky load balancer was the entire deployment. In cloud, your auto-scaler spins up three new instances, and suddenly 33% of requests are hitting instances with no session state for that user.

// On-prem pattern — works with single server, silent killer in cloud
app.use(session({
  secret: 'keyboard cat',
  resave: false,
  saveUninitialized: true,
  // No store defined — defaults to in-memory MemoryStore
}));

// Cloud-ready pattern: externalize session to Redis
import RedisStore from 'connect-redis';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false,
  cookie: { secure: true, httpOnly: true, maxAge: 3600000 }
}));

Filesystem assumptions

File system dependencies are equally dangerous. Applications that write uploads to /tmp, generate reports to a local path, or cache computed data on disk will silently break when:

  • Containers are rescheduled to different nodes
  • Kubernetes pods restart due to OOM or liveness probe failure
  • Auto-scaling adds a new instance that has no existing local state

Mitigation: Audit every File.WriteAllBytes, fs.writeFile, Path.Combine(AppDomain...), or equivalent. Replace with object storage (S3, Azure Blob) at the upload boundary. Use ephemeral storage only for truly transient scratch data within a single request lifecycle.


5. The Observability Void: Flying Blind in a New Environment

On-premises monitoring stacks—Nagios, Zabbix, in-house Grafana dashboards pointed at Prometheus—do not migrate cleanly. The exporters, agents, and dashboards were tuned for physical hardware metrics: disk I/O, NIC throughput, CPU steal time. These metrics mean almost nothing in a cloud context.

What you need to observe in cloud is different:

  • Cold start times and pod scheduling latency
  • Spot instance interruption rates
  • Managed service throttling (Cosmos DB RU exhaustion, SQS throttling)
  • Connection pool utilization over time
  • Cost-per-request, not just cost-per-hour
  • Distributed trace depth and span count

Almost none of this was instrumented on-prem. The danger window is the period immediately after migration when your legacy monitoring reports “all green” because it’s watching things that are fine, while actual user-facing metrics are degrading invisibly.

Do not lift your monitoring stack. Build a new observability layer before cutover. The minimum viable set: distributed tracing (OpenTelemetry), infrastructure metrics (CloudWatch / Azure Monitor), and user-facing synthetic monitoring with realistic traffic patterns.

Prometheus recording rules for post-migration observability

# Add these before cutover, not after the first incident

groups:
  - name: migration_signals
    interval: 30s
    rules:
      - record: job:http_request_duration_p95:rate5m
        expr: histogram_quantile(0.95,
          sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

      - record: job:db_connection_pool_saturation:avg
        expr: avg(db_pool_active / db_pool_max) by (service)

      - record: job:downstream_call_depth:max
        expr: max(trace_span_count) by (trace_root_service)

      - record: job:egress_bytes_hourly:rate1h
        expr: sum(rate(network_transmit_bytes_total[1h])) by (zone, region)

6. The Monolith Wearing Microservice Clothing

This is the most architecturally insidious smell because it looks correct from the outside. The team containerized the application, deployed it to Kubernetes, and set up separate deployments for each service. On the surface: microservices. Underneath: a distributed monolith.

The telltale signs:

  • Shared database schemas across “separate” services
  • Synchronous HTTP chains: Service A blocks on B, which blocks on C, which blocks on D
  • Shared libraries that bundle business logic and deploy identically with every service
  • Database transactions that span multiple service boundaries
  • Deployments that must be coordinated — you can’t update Service B without also updating A

This pattern is not always avoidable during migration — full service decomposition has its own cost and risk. But you need to know you have it. A distributed monolith you know about and are managing deliberately is an acceptable migration phase. A distributed monolith you think is a clean microservice architecture is a production incident waiting to happen.

Diagnostic: Draw your actual service dependency graph using your APM’s service map view. If it looks like a star with one service in the center that everything calls—that center is your monolith. If it looks like a linear chain (A → B → C → D → E), you have a synchronous dependency pipeline that will cascade-fail under load.


The Pre-Migration Architecture Review Checklist

The smells above are all detectable before migration if you know what to look for. This is the review I run before advising any lift-and-shift engagement.

Call patterns & latency budget

  • Count synchronous downstream calls per request at P95 load — flag if > 15
  • Identify any call patterns that loop over collections without batching
  • Confirm connection pool sizes are appropriate for expected cloud concurrency

State & storage

  • Identify all in-process or in-memory state that must survive a pod restart
  • Map every place the app reads from or writes to the local filesystem
  • Confirm session management does not rely on server affinity or in-memory stores

Cost

  • Estimate cross-AZ and cross-region data flows, calculate egress cost at 2× peak
  • Identify any licensing model tied to CPU count or physical host (SQL Server, Oracle)
  • Catalogue all non-production environments and confirm shutdown automation exists

Observability

  • Map existing monitoring agents — identify cloud equivalents before cutover
  • Confirm distributed tracing (OpenTelemetry or equivalent) is instrumented before go-live
  • Define SLO targets for P95 latency, error rate, and availability before migration

Architecture

  • Identify any shared database schema across logical services
  • Check for hardcoded IPs or hostnames that assume on-prem DNS resolution
  • Verify secret management — on-prem flat files or config files must not migrate to cloud VMs
  • Confirm there is no direct dependency on physical host characteristics (CPU topology, NUMA, local NVMe)

A Realistic Migration Philosophy

Lift-and-shift is not a failure state. It’s a phase. The mistake is treating it as a destination.

Every workload you migrate should have a documented list of known architectural debts created by the lift, an owner for each item, and a timeline to address them—agreed before the migration button is pressed, not discovered six months later during a post-mortem.

The smells in this post are not exotic edge cases. They are the default outcome of a standard lift-and-shift operation. The teams that avoid them are not smarter or more experienced. They are more deliberate. They migrate with their eyes open, they instrument before they cut over, and they treat “it’s running” as the beginning of the work, not the end of it.

Moving to cloud does not modernize your architecture. It gives you a new environment in which your existing architectural decisions—good and bad—will be amplified.

The test of a successful migration is not whether the application starts. It’s whether, 90 days later, your latency profile is understood, your cost trend is predictable, and your on-call team is sleeping through the night.


Part of an ongoing series on production-grade cloud architecture.
Next: When Kubernetes Makes Things Worse — Operational Debt in Over-Orchestrated Systems.