Author’s Note: There’s a lot of talk and plenty of noise related to NFV in the telecom industry. It’s a time of market transformation with many executives and marketeers making claims for how their companies are doing NFV better than others, or how they are ahead of the competition in one or more ways. I think we are missing an important voice: The CTOs. I am talking with my peers from service providers and suppliers to get a sense of what is real. I will be sharing these conversations in a series called The Real CTOs of NFV. The title is fun but the intent is serious. Following is my first Real CTO conversation with Martin Taylor of Metaswitch.
Prayson: Martin, you certainly are one of the first Real CTOs of NFV and I am pleased to get this chance to talk with you about virtualization. Can you start by describing Metaswitch’s role in the telecom industry’s NFV transformation?
Martin: Metaswitch is a supplier of virtualized network functions specifically in the IP multimedia communications space. We have a portfolio of VNFs that include an IMS core solution, a session border controller solution and a telephony application service solution. Really all the pieces you need to put together an IMS network for either fixed or mobile voice.
All of those products are available in the form of virtualized network functions ready to go on either OpenStack or VMware. In fact, they have been available since the end of 2014. We are very much at the leading edge in this space of virtualizing IMS.
Prayson: What is it about Metaswitch that allows for you to be so far ahead in NFV?
Martin: Metaswitch is a software company and always has been. We’ve been in business for 35 years. For the first 20 years we delivered software, pure and simple, mainly to networking OEMs for incorporation in the boxes they were delivering.
About 15 years ago we started to build software directly for network operators and in almost all cases that software was delivered in the form of an appliance because that’s what network operators buy. That appliance was basically just an Intel x86 server. Essentially, our software was running on completely unmodified x86 hardware.
For some of our products that was relatively easy to do. For some of them it was very difficult. For example, building a session border controller that is really scalable and efficient and carrier-class purely on x86 hardware is an extremely hard thing to do. All of our competitors use proprietary hardware for that.
We always believed in the power of doing things in software on x86, riding the wave of innovation in x86 silicon. Having software that’s designed natively for x86 gives you an enormous leg up when it comes to virtualization compared with vendors who traditionally relied on proprietary hardware to do the heavy lifting.
Certainly in the SBC space that’s very much the situation. All our competitors in the SBC space are scrambling now because they’ve got to learn first how to get SBC implemented on x86 and secondly they’ve got to wrestle with the problems of actually making that software perform in a virtualized environment.
Prayson: What is it that’s so hard about SBC?
Martin: It’s dealing with huge numbers of small packets with RTP. We are handling in the order of 5 million packets per second on a single server in order to relay carrier-scale numbers of concurrent media streams.
The other aspect where proprietary hardware is typically being deployed is in the area of network protection. A very important role in SBC is to handle DDOS attacks. Traditionally, vendors have used network processor chips and content addressable memories to manage blacklisting of IP addresses of malicious sources. We are leveraging x86 architecture by making very clever use of the cache and by writing software that is very close to the metal. Those techniques are hard for people who have traditionally relied on network processor chips.
Prayson: You were really one of the first in the IP multimedia space if not THE first, correct?
Martin: I think we were one of the first in the delivery of VNFs in the NFV market as a whole. We first demonstrated a session border controller running in a cloud environment in February 2013. That’s about nine months before the white paper that coined the term Network Functions Virtualization came out.
We mystified a lot of people at that time. A lot of people didn’t really understand what that demonstration was about. So I do think we were well ahead of the game in terms of anticipating NFV and its implications.
We also took the step of starting to design VNFs specifically for cloud environments. One of the things that’s becoming very clear to the telcos is they won’t get the real benefits of NFV if they take software that used to run on a ATCA blade physical appliance and just port that without any architectural change to a virtualized environment.
I think they see even more benefits in NFV coming from VNFs that were designed from the ground up for the cloud. You take a very different approach to the architecture and the software when you do that.
To use a current term in this space — “micro services” is an architecture where you decompose a complex application into a lot of smaller individual pieces that can run in their own individual virtual machines or containers, which together deliver a complex network function that can be independently scaled and evolved.
That kind of approach is one that telcos are strongly encouraging their VNF vendors to follow. It’s one that we followed when we designed Clearwater, our IMS product, in early 2012.
Prayson: So Martin, one of the obstacles to NFV is it’s not just recompiling software from appliances but re-architecting it to fit into this cloud environment.
Martin: That’s not to say there aren’t benefits in deploying traditional software architectures in NFV. But there are even greater benefits if you properly re-architect the software.
I think that network operators do understand that initially they’re not going to have a tremendous choice of VNFs that are properly architected for the cloud. They are going to have to start with VNFs that probably began life as code running on hardware appliances. This is a transition that will take a number of years.
Metaswitch is ahead of the game here at least with our Clearwater IMS core because we already embodied that architecture.
Prayson: I think one example of that with Clearwater is its horizontal scalability. It’s one of the few, maybe the only one, able to operate in a cloud environment using additional resources to scale up and down based on policy and demand.
Martin: That’s because we adopted a web-scale architecture in the design of Clearwater where the message processing is all done by stateless elements and that’s backed up by a distributed fault-tolerant state store based on open source. We’re using Apache Cassandra for that. That design pattern is ideal when you want something that both dynamically scales and also scales very large.
We’ve tested Clearwater to 10s of millions of subscribers in cloud environments. It scales essentially without limit. That makes it much easier for network operators to deliver services that are as agile and dynamic as those put in place by some of their over-the-top competitors.
Prayson: You mentioned fault tolerance. That’s another aspect that’s different in these cloud-based environments. We’re moving from a system composed of highly available elements to a highly available system composed of unreliable elements.
Martin: Absolutely. One of the fundamental requirements for designing for the cloud is that you’ve got to allow for the fact that any individual physical server can go away at any time. And furthermore that an entire cloud instance can go away at any time. You have to design the application very defensively, allowing for failures at any level of the stack and ensuring that your service continues to be available regardless of what goes wrong.
Prayson: When do you think you’ll see critical mass or large adoption by carriers for these types of solutions?
Martin: We started our first deployments of virtualized network functions back in the middle of 2014. I would hesitate to describe those as full-blown NFV environments. They were more kind of point solutions that were deployed on cloud software but not with the full vision of orchestration that NFV promises.
I think that represents the very early stage of adoption dating back a year now or more. We expect to see our first fully orchestrated, full implementation of the NFV vision being deployed by the end of this calendar year. For critical mass you’re probably talking about the middle of 2016 for things to really start ramping up. It’s pretty near term.
Prayson: Having been involved in this from some very early days, you’ve had a long history especially on the implementation side. What is the most surprising or unexpected thing that you’ve found during this period?
Martin: The thing that surprised us, and actually gratified us, is the extent of which the promise of NFV has really taken root amongst most of the large telcos. Many of them are embracing the NFV vision with tremendous enthusiasm and engineering top-down transformations not just of their technology base but also their organization and culture, which are key aspects of this transition.
It was very unclear at the very beginning of NFV, despite the fact there was a lot of telco backing for the original thought, just how many of them would actually embrace it and do something concrete about it.
It represents a huge challenge for them, but we have been very pleasantly surprised by the rate at which it’s being adopted.
Prayson: I think that cultural and organizational aspect is something that some people are talking about but it’s not clear that the overall industry recognizes how important it is. I think they know how hard it is going to be. But I don’t think they recognize how important it is to change how they work to get the benefits of these cloud-based architectures. There is a fundamental shift in terms of breaking down the silos and moving from a whole waterfall model to agile and dev ops to go much faster but also to close the loop with the customer.
Martin: The biggest single challenge that they face in that transition is moving away from the idea of a “one throat to choke” relationship with a vendor for a particular service. And, accepting responsibility for standing up services where there is a common generic NFV infrastructure and the services themselves are put together by a collection of virtualized network functions, which almost certainly will come from a different set of vendors.
Making the whole thing work to deliver services does require some kind of system integration and proving and testing of the combination of the VNFs and underlying NFV infrastructure. That is the one of the biggest things that the network operators have to embrace.
Many network operators are beginning to experience the limitations of the “one throat to choke” approach because in some cases no matter how hard you choke that throat you’re still not necessarily getting what you want out of the service.
Over the last decade we saw a quite a strong swing of the pendulum towards complete outsourcing, complete dependence on the vendors. With NFV, we are seeing quite a strong swing of the pendulum back to more insourcing and more responsibility on the shoulders of the network operator for making the services work.
Prayson: Some the statements coming from AT&T have been the strongest in that vein. Talking about how they were not getting the innovation from their Tier 1 suppliers and the need to broaden the array of suppliers they’re working to create an ecosystem of suppliers. They have been the loudest but I’m seeing indications of that from nearly all of them.
Martin: They have written very eloquently and publically about their plans in that direction. I think a lot of other network operators are paying very close attention to AT&T’s approach and borrowing ideas voraciously.
Prayson: In addition to being a leader in the NFV movement, Metaswitch has been a leader in the open source movement. What have been the benefits, as well as the drawbacks, to that approach.
Martin: The benefit of putting Clearwater into open source has been a tremendous boost in visibility for Metaswitch in the NFV space and in IMS particularly.
It’s proven to be a very powerful tool in helping educate network operators about the possibilities of NFV. One of the things that we found very early on – dating back to early 2014 – is that quite a number of network operators downloaded Clearwater. It’s open source and you can go get it from the ProjectClearwater.org web site.
They really used it as a tool to explore NFV as a whole but also to pick it apart to understand the architectural principals. One of the reasons that network operators, at least those we have talked to, have gotten quite wised up about web-scale architectures and decomposed applications and that whole approach is because they saw Clearwater as an absolutely prime example of what was possible if you set out to design a virtualized network function with cloud as its native environment.
I do think we had quite a strong role in educating network operators as to the possibilities of NFV and the fact that this is something that was real and could be deployed. We’ve had Clearwater in production networks since late 2014 working very happily.
The fact that we went open source with it was powerful in raising our visibility. I supposed the drawback is it limits the amount of revenue we can get specifically from that product. There is no upfront licensing fee. If you want support then we charge you for support based on the volume of calls or subscribers you are going to support from a Clearwater deployment.
We’re not open sourcing our entire product portfolio. Products like Perimeta, our session border controller, and our MetaSphere telephony application server continue to be commercially licensed virtualized network functions, complimenting Clearwater. There are upfront license fees for those.
Prayson: Have you been able to get good community involvement in the Clearwater project?
Martin: We’ve had a lot of good feedback from people who have worked with it. We’ve had a smattering of enhancements that third parties have done. I wouldn’t say that we’ve got a massive community around it.
It’s worth pointing out that IMS is a pretty rarified area of technology. Traditionally, IMS is a technology that’s been embraced by Tier 1 network operators and supplied by Tier 1 network vendors. So, it’s not like you’ve got legions of software tinkerers that have any idea of what IMS is all about.
Building a community around it wasn’t our primary objective in putting it out there in open source. It was mainly to give people maximally easy access to the software to help learn about NFV and help learn about Metaswitch’s capabilities.
Prayson: Another way to achieve that same goal would be to deliver Clearwater not as open source but as a zero-cost license and then charge for support. How did you make that decision and do you believe that making it open source helped with some of the points you mentioned in terms of helping the operators understand how to structure these type of software products for web-scale cloud deployment?
Martin: If you are going to make a piece of software freely available it makes sense just to open source it. We felt that from the education standpoint the fact that the source is out there and people could pick it apart to see how it works is a good thing.
You get a lot more visibility if it’s open source then if you just say it’s free to try.
Prayson: I have one last topic I’d like to discuss. Licensing has come up as an issue as our customers have come through trials to being ready for deployment. Operators are interested in varying licensing models and especially pay-as-you go or as revenue is captured. How is Metaswitch handling licensing?
Martin: Licensing is a problem we identified very early on - once you produce a piece of software, which can be cloned effectively an unlimited number of times in the cloud. Obviously, you want to enable that capability because you want the customer to be able to dynamically scale your virtualized network function.
In the past when we were licensing software for use on appliances we would use traditional techniques like tying the software license to some unique identifier on the box like a MAC address. But you can’t do that if you’re delivering a virtual network function as an image. Typically network operators don’t want there to be any kind of physical token authentication factor in the network that ties to that VNF image.
One of the things we did quite early on was develop a license management framework. It essentially tracks the number of instances of virtualized network function components that are deployed in the network either based on how many instances are out there or based on some property of that instance like how many current sessions are being supported on it, or how many subscribers are registered, depending on how the network operator wants to pay for the right to use that software. Typically, when we ship our VNFs we ship this licensing framework with them.
We’ve always said this is a problem that everybody’s going to have. In some of the discussions we’ve had with network operators they’ve realized they don’t really want to be wrestling with each and every vendor’s different license management framework.
The more enlightened ones are putting in place some generic framework that everybody will need to work to. That creates another challenge for vendors. It means they’re going to have to integrate their VNF software with whatever licensing framework the network operators put in place.
There aren’t any standard solutions here. There aren’t any open source solutions for this. We’re seeing network operators talking about writing their own.
There’s an opportunity here for an enterprising software vendor to go out there and build a product that does license management for VNFs.
Prayson: That was great information. Martin, I appreciate your time. Thank you for that.
For other entries in the Real CTOs of NFV series, please click here.