By Jess Lulka
Content Marketing Manager
Whether you’re streaming a video, playing an online game, shopping online, or accessing a cloud-based application, the responsiveness of that activity plays a critical role in user satisfaction. At the heart of this responsiveness lies network latency, where lower numbers lead to better application performance and minimize issues. High network latency results in degraded application performance, slow data gathering, and an overall frustrating user experience. Optimizing for it is important not only to internet users, but also to developers who build mission-critical applications. Applications that rely on low-latency to function seamlessly include streaming analytics applications, real-time data management, API integration, plus AI and ML model training and development.
If you’re dealing with latency issues, you’re not without options. With the right strategies, tools, and know-how, you can reduce latency in cloud networks and improve your application’s performance.
Key takeaways:
Network latency is the amount of time it takes for data packets to move from their origin to a destination.
High network latency affects system reliability, user experience, and the ability of applications to process real-time data. Low latency results in smoother, more responsive applications and better developer experience.
Causes of network latency include physical distance between services, network congestion, application design, and server performance.
Strategies to reduce latency include application optimization, use of a content delivery network, data compression, traffic prioritization, and network monitoring.
Network latency refers to the time it takes for a network packet to travel from its source to its destination. Latency is especially important for applications that rely on real-time data collection and processing, as this can influence developer and user experience:
From a user standpoint, it often refers to how long a page or application takes to load and affects page load times, application response times, and interface interactivity.
For developers, high network latency can result in service outages, reduced API performance, increased server loads, and higher infrastructure costs.
But latency isn’t the only thing that determines application speed and data transfer. You should also be aware of:
Latency vs. bandwidth: Bandwidth is the maximum amount of data you can transfer between point A and B on a network, measured in Megabits per second (Mbps). The more bandwidth within your networking infrastructure, the more data you can send at any given time.
Latency vs. throughput: Throughput is the average amount of data that can go through a network over a specific amount of time and successfully reach its destination without any data packet loss. Think of it as how many users can successfully access the network at a given time without experiencing degradation. The goal is high throughput but low latency to support multiple users at a time.
Latency vs. jitter: Jitter tracks all the variations and fluctuations in data packet arrival times. High amounts of jitter signal that even if the data packets are successfully traveling across the network, there’s too much variation in how long they’re taking to travel to deliver a consistent, stable user experience.
Application latency vs. network latency: Network latency isn’t the only type of latency that developers must be aware of or regularly address. Application latency is how long it takes for the system to respond to a user request, such as how quickly your application can complete an API call or load on a user’s mobile phone.
Experiencing unexpected latency and not sure why? Use our application latency troubleshooting guide to walk through common causes, plus practical steps you can take to pinpoint and reduce latency.
Network latency is measured in milliseconds as the time between when a send operation is initiated and when the matching receive operation completes. It’s calculated based on time to time to first byte (TTFB), round-trip time (RTT), or a ping command:
Time to first byte measures the responsiveness of a web server and is the duration of time from when the client sends an HTTP request until it is received by the server or client browser. The two main factors that determine TTFB are how quickly a web server processes the request and creates a response, and how quickly the client replies.
Round-trip time is how long it takes for the client to send a request and get a response from the server. Network latency can cause delays and increase your RTT averages.
A ping command is used to determine how long it takes to send data to its destination and get a response. This is a great way to test for overall network reliability.
“What’s a good network latency target in milliseconds?” is a common question that comes up. Though it depends on your main use case and how quickly you need data processed, there are general guidelines for acceptable network performance and specific use cases.
| Use Case | Latency Target | User Impact |
|---|---|---|
| Ultra-low latency systems (AI model training, financial services) | <10 ms | Required for time-critical, deterministic systems where delays directly affect outcomes. |
| Online gaming (fast-paced) | <30 ms ideal; <100 ms acceptable | Very sensitive to delay; latency above ~120 ms noticeably degrades gameplay. |
| Online gaming (strategy) | <50 ms ideal; <120 ms tolerable | Slightly more tolerant than FPS games, but still impacted by higher latency. |
| General web browsing and e-commerce | <100 ms good; 200–500 ms acceptable | Lower latency improves perceived responsiveness, user engagement, and can influence conversions. |
| VoIP & video conferencing | <150 ms smooth; 150–250 ms manageable; >300 ms problematic | Conversation flow degrades as latency increases, especially beyond 300 ms. |
| Streaming on-demand video | <200 ms good; up to 500 ms acceptable | Buffering can mask latency, making higher delays tolerable. |
| Streaming live interactive video | <3 seconds | Lower latency improves interactivity for live events and broadcasts. |
| AR / VR applications | <20 ms | Critical to maintain realism and prevent motion sickness or discomfort. |
Note: Latency targets suggested via data from PubNub.
Consumable’s developers needed a cloud provider that could help them address latency issues and provide consistent performance for high-throughput, low-latency applications. DigitalOcean set them up for success with a seamless migration, Load Balancers, and Kubernetes for orchestration.
Before diving into fixes and solutions, it’s necessary to understand the root causes of network latency. By pinpointing their origin, you can make informed decisions to troubleshoot network latency issues and implement the ideal solutions.
The internet might seem to provide an instantaneous browsing experience, but data packets need to travel physically between servers and end-users for proper functioning. The farther the data has to travel or the more intermediate routers it has to go through, the longer it takes—leading to increased latency. For example, if your server is in New York and your user is in Tokyo, the data packets have a longer journey compared to if both were located in the same city.
Just as a highway can get congested during rush hour, networks can become congested when too many users access data simultaneously. This congestion can slow data transmission and increase network latency. Peak usage times, large file transfers, or sudden spikes in traffic can all contribute to network connection congestion.
The type of network that the data must go through has a large impact on data speed and network latency. For instance, a wireless network has higher latency than a fiber-optic cable or wired connection because of its distance to the server, the increased number of connected users, and lower bandwidth. Generally speaking, overall latency increases if data packets must switch between mediums to get to their destinations.
Server performance plays an important role in determining latency. If a server is overwhelmed with requests or lacks the necessary resources (such as sufficient RAM or CPU), it can struggle to process and send data promptly. Outdated hardware or software can further aggravate server performance issues and lead to high network latency.
Transmission Control Protocol (TCP) slow start is a congestion-control mechanism that increases latency at the beginning of a connection by sending only a small amount of data and gradually ramping up throughput. For short-lived microservice calls (as an HTTP REST API call or Message Queue), those initial roundtrips can dominate request time, making users feel like service access is slower. Many developers disable the slow start option in an effort to improve TCP latency optimization, but doing so reduces features for network congestion, bandwidth discovery, and resource sharing.
Head-of-line (HOL) blocking happens when one delayed packet or request prevents others behind it from being processed, increasing tail latency (or long-tail traffic). Protocols like HTTP/2 reduce HOL blocking with multiplexing, while HTTP/3 (built on QUIC over UDP) addresses it at the transport layer by allowing independent streams to progress without being blocked by packet loss.
The way an application is architected (monolithic or microservice) can influence latency. Complex applications with multiple layers or those that require frequent database queries can introduce delays. Inefficient code or unoptimized databases can also slow data retrieval and processing, further increasing network latency.
Running into slow or inconsistent network performance on your servers? Learn how to identify the root cause of asymmetric network issues, use common diagnostic tools to spot bottlenecks, and apply practical fixes that help restore reliable, predictable network behavior.
Reducing network latency requires knowledge of the application architecture. Depending on what type of architecture you’re using, there are different challenges.
In a monolithic architecture, latency is generally lower and more predictable because function calls occur in-process rather than over the network. This simplicity eliminates serialization and network overhead, making performance easier to understand and optimize. Centralized data access further reduces latency by avoiding cross-service calls, allowing developers to rely on fast, direct queries and in-memory caching.
The trade-off becomes apparent as the system scales. All features share the same runtime and infrastructure, which can lead to resource contention and latency spikes during traffic surges. Scaling the monolith requires scaling the entire application, even if only one component is performance-critical, and database bottlenecks often emerge as the dominant source of latency. As a result, monoliths trade flexibility and fine-grained scalability for simpler, lower-overhead latency characteristics.
Microservices architectures introduce latency tradeoffs because they replace the fast, in-process calls notorious in monolithic architectures with network-based communication between independent services. Smaller, more focused services improve scalability, deployment speed, and team ownership. However, each additional move between a network device (or a network hop) adds overhead that can compound across a request path. As a result, service granularity becomes a performance decision rather than just organizational, especially as applications incorporate thousands of microservices, multiplying the amount of data sent across networks.
Communication and data patterns further shape latency. Synchronous calls simplify workflows, but at the expense of increasing user experience sensitivity to slow or failing downstream services, whereas asynchronous messaging reduces coupling and perceived latency at the cost of eventual consistency and more difficult debugging.
Having each service own its data improves isolation but requires multiple calls to assemble responses, pushing teams toward caching or denormalized data models. Balancing these tradeoffs is essential to gaining the benefits of microservices without experiencing higher latency.
Understanding cloud metrics—from CPU and memory usage to request latency and error rates— and how they influence operations is essential for spotting performance issues early, optimizing your network, making informed scaling decisions, and keeping applications running smoothly.
There are many ways to reduce packet loss and latency for your network and applications, but you don’t need to implement all of them to get results. Instead, experiment until you find those that most closely align with your infrastructure needs, application setups, and adequately address your latency issues.
Efficient code significantly reduces processing times. Regularly review and optimize your application’s code to eliminate bottlenecks. Similarly, optimizing database queries can speed up data retrieval, reducing user wait times. Keep code lean and databases well-structured to prevent unnecessary delays and improve overall performance.
CDNs store cached versions of your content in multiple locations worldwide. When a user requests data, it’s served from the nearest location, reducing the physical distance data needs to travel. You can strategically place content closer to users with CDNs and multi-CDNs to drastically reduce load times and enhance the user experience.
Note that this differs from implementing edge computing servers to reduce the physical distance between a data source and its destination. While CDNs support static content (images, videos, and files), edge computing latency is suited for real-time applications and data processing at the origin point.
Distributed systems spread the load across multiple servers, preventing any single server from becoming a bottleneck. Load balancers distribute incoming traffic across servers, ensuring no single server is overwhelmed. This setup enhances the stability and reliability of your applications.
Investing in high-quality hosting can make a big difference in performance. Opt for reliable hosting providers, like DigitalOcean, that provide high-performance infrastructure with GPU Droplets. Building out your infrastructure can include investing in isolated VMs, increasing overall processing power with GPUs, or increasing storage bandwidth to hold more data for processing.
Caching stores frequently accessed data in a ready-to-serve state. This reduces the need for repeated data processing or retrieval, speeding up response times. Implementing effective caching strategies (both on the server and client side) can lead to lower latency, reduced load times, and improved user experience.
Reduce file sizes and provide efficient data transfer to offer a smoother user experience. Compressing data reduces the amount of information that needs to be transferred, speeding up transmission times. For example, optimizing media files, such as images and videos, helps them load faster.
QoS settings allow you to prioritize certain types of traffic over others. For example, real-time communication can be prioritized over background tasks. Proactively manage and prioritize your network traffic to ensure optimal performance for critical operations and improve overall responsiveness.
Regularly monitor your network performance and measure network latency to spot and address issues before they affect your users. Tools like DigitalOcean Monitoring provide insights into application performance, helping you make informed decisions about managing online services, network configuration, necessary connected devices, and hardware upgrades. Staying proactive and using network monitoring tools to analyze your network can prevent potential issues and ensure consistent performance.
DigitalOcean’s monitoring function to measure token usage for a deployed AI agent.
HTTP/2 and HTTP/3 are updates to the traditional HTTP protocol. These standards introduce features to increase data packet transfer speed across networks for scalable, cloud-native applications, specifically:
HTTP/2 introduces multiplexing, allowing multiple requests to be sent over a single TCP connection. This is beneficial for web services and microservices that simultaneously handle multiple requests.
HTTP/3 uses Quick UDP Internet Connections (QUIC) standards to reduce overall network latency and avoid TCP HOL blocking issues. It can fetch multiple objects at once, and doesn’t require TLS acknowledgement from the server—reducing overall latency and increasing data speed.
Subnetting involves grouping network endpoints that frequently interact with each other. This network inside a network reduces total data travel time, limits inefficient network routing, and minimizes latency.
DigitalOcean’s Global Load Balancer is designed to improve application availability and reduce latency with multi-region failover, autoscaling, and DDoS protection.
DigitalOcean provides robust, globally distributed infrastructure for your startup’s applications. Our state-of-the-art network infrastructure, built with redundancy and resilience in mind, delivers high availability and consistent performance.
With multiple regions and availability zones, developers can implement DigitalOcean Droplets close to users or dependent services, reducing physical distance and round-trip time. Keeping related resources (Droplets, databases, and object storage) in the same region further minimizes intra-application latency.
To reduce network overhead within an application, DigitalOcean provides Virtual Private Cloud (VPC) networking. This lowers latency and improves consistency, especially for service-to-service communication. Built-in monitoring and alerts help teams observe latency trends and identify network bottlenecks before they impact users.
To reduce data access and transfer time, database agents maintain a set of pre-established connections that reduce overhead for repeated data requests. Our Managed Databases run on enterprise-class hardware for fast performance and minimal downtime.
DigitalOcean Load Balancers add another layer of latency control by evenly distributing traffic across multiple Droplets, preventing any single instance from becoming overloaded. Health checks ensure traffic is only routed to responsive backends, preventing slow or failing Droplets from increasing request latency.
Support for protocols like HTTP/2 helps reduce connection overhead through multiplexing, while Global Load Balancers can route users to the nearest healthy region, minimizing latency for globally distributed audiences. DigitalOcean also provides managed Kubernetes, making it easy to efficiently scale applications and maintain performance even with high connectivity and data requirements.
NoBid moved off AWS to DigitalOcean and now runs hundreds of containers across multiple regions with secure VPC networking, network Load Balancers, and private internal networking—giving them the bandwidth performance and traffic handling they need for billions of auctions per month without compromising responsiveness.
What is considered high network latency?
High network latency will depend on the type of application, as some require lower latency than others for ideal operations. General guidance is for ultra-sensitive latency use cases, such as financial services or AI, latency should be less than 10ms. Online browsing and e-commerce operate best with a latency under 100ms.
How do you fix network latency?
There are multiple ways that you can address network latency issues. Top strategies include application code optimization, use of a CDN, load balancer implementation, hardware upgrades, caching strategies, data compression, traffic prioritization with QoS settings, performance monitoring, using HTTP/2 and HTTP/3 standards, and implementing subnetting.
Is latency more important than bandwidth?
Both latency and bandwidth are important, but ultimately different. Latency is the time it takes to send a data packet from one destination to another. Bandwidth is how many data packets you can send over your network without overloading the system.
What causes sudden spikes in latency?
Causes of sudden network latency spikes include server hardware performance, network congestion, the number of online users, physical distance between data endpoints, plus application design. Having increased activity on the network without the proper bandwidth or throughput can also result in latency spikes.
How do CDNs reduce latency?
CDNs reduce latency by optimizing content delivery. They pull static content from the original server and then transfer it through the CDN via caching. CDNs perform this function across a global data center network, regardless of the end user’s location. This helps distribute server load and reduce overall network traffic.
DigitalOcean offers AI startups and digital native enterprises a simple, reliable way to build and scale applications while keeping latency low and performance predictable. With globally distributed data centers, fast SSD-backed compute, and built-in networking features, DigitalOcean helps reduce the distance—and complexity—between your users and workloads. Whether you’re running APIs, real-time services, or AI inference, DigitalOcean is designed to deliver consistent performance without the overhead of traditional hyperscalers.
Key features include:
Scalable virtual machines (Droplets) optimized for fast boot times and consistent performance
Global data center regions that let you deploy closer to users to reduce network latency
Private networking and VPCs to minimize hop count and improve service-to-service communication
GPU-powered and AI-ready infrastructure for low-latency inference and data processing
Managed AI services for deploying LLM-powered applications with predictable performance
Comprehensive documentation and API references covering networking, performance tuning, and scaling
Step-by-step tutorials and architecture guides for building low-latency, production-ready systems
Transparent support plans with access to responsive technical guidance when performance matters
Predictable, transparent pricing with no long-term contracts
Get started with DigitalOcean to build fast, reliable applications, reduce latency for your users, and spend less time optimizing infrastructure—and more time shipping performance-sensitive code.
Jess Lulka is a Content Marketing Manager at DigitalOcean. She has over 10 years of B2B technical content experience and has written about observability, data centers, IoT, server virtualization, and design engineering. Before DigitalOcean, she worked at Chronosphere, Informa TechTarget, and Digital Engineering. She is based in Seattle and enjoys pub trivia, travel, and reading.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.