Hyperscale data centers host hundreds and thousands of servers connected through a high-performance data network. Such powerful cloud infrastructure provides compute resources for many customers, enabling them to control and operate their enterprises in a reliable and efficient way.
Hosting of data and applications in core and edge clouds requires precise common time for reasons such as:
- Distributed compute processes need to be synchronized
- Transaction sequence must be maintained
- Timestamping assures efficient backup and mirroring of huge amounts of data
Today’s timing architectures commonly use network-hosted NTP servers. These NTP solutions are sufficiently accurate and reliable for best-effort timing with microsecond accuracy but might not meet requirements for increasingly resilient and precise synchronization of mission-critical applications. So where do they fall short?
Challenges for data center timing
The current practices for data center timing include various weaknesses that must be addressed:
- Network-delivered time suffers from delay and delay variations in packet networks and even more from asymmetric delay. Congested links have a further negative impact on the accuracy of packet time protocols. Methods need to be applied which improve the forwarding characteristics of timestamped packets.
- A network-hosted server might suffer from outages caused by DDoS attacks, attacks on the global DNS system or GNSS issues. Any data center should have independent time sources, using independent network-delivered timing technologies, local clocks with extended holdover, or their own robust GNSS receivers, and preferably a mix of these technologies.
- Accurate time must be provided to applications hosted on standard servers in data centers. The IT network in a data center as well as the server architecture might not be optimized for delivery of accurate timing. Hence, the on-site IT architecture needs a way to deliver accurate time to the software applications running on COTS servers.
Optimized timing architecture for cloud data centers:
While NTP has served us well over the last few decades, this technology has some limitations. The accuracy could significantly be improved by supplying the NTP client from a GNSS-synchronized NTP server at each data center. This however creates an unacceptable vulnerability, as GNSS receivers can be easily compromised by jamming and spoofing attacks.
Applying cesium atomic clocks in data centers offers a solution. These clocks provide excellent holdover capabilities and can survive even extended unavailability of satellite-delivered timing. An ePRTC solution provides highest accuracy and availability by combining GNSS receivers with atomic cesium clocks in a redundant configuration. A combination of multi-band GNSS receivers (which mitigate atmospheric propagation issues) with ultra-stable cesium atomic clocks provides UTC-traceable time that can be maintained with an accuracy of better than 40ns for more than two weeks. This allows a data center to survive even extended periods of GNSS outages.
With highly accurate timing at core sites, there is a need for a more sophisticated way of delivering synchronization to edge data centers. PTP is a superior time protocol to NTP as it provides significantly higher accuracy as well as sophisticated management options. Packet networks assisting PTP delivery with boundary and transparent clocks functionality can significantly improve timing accuracy.
However, there is still some impact of the packet network behavior on the quality of the synchronization network. An independent timing network using a separate wavelength in the DWDM transport network known as an optical timing channel (OTC) is a sensible way to deliver timing with the highest accuracy. As DWDM technology is used for data center interconnect, this OTC can be applied on an out-of-band wavelength above 1600nm.
A combination of such native optical PTP transport with boundary clock class D functionality provides very precise synchronization, sufficiently accurate for even the most time-sensitive applications.
For delivery of accurate time to server-hosted applications, the servers need to be enhanced with the ability to process highly accurate time. There are different approaches, such as integrating timing features into the server hardware and software, adding NID cards with timing capability or inserting time cards or time modules into open servers. With the Open Compute Project’s (OCP) Time Appliance Project (TAP) initiated by Facebook, the third approach is gaining strong momentum. Pluggable time cards with integrated GNSS receivers and reasonable oscillators offer a perfect way to provide high-accuracy, UTC-traceable timing in close proximity to the software applications running on the server.
The diagram below compares essential timing technologies. There are pros and cons for each of them. But combining network-delivered with satellite-delivered synchronization and backing it up with cesium atomic clocks is a highly resilient strategy for meeting the timing requirements of even the most demanding data center applications.