Network service outages happen. It’s not a matter of if but when. Cloud platforms and content delivery networks (CDNs) with 100% uptime SLAs aren’t immune. They experience outages just like everything else.
The question is: what do you do when one of your network services goes down? Will the lack of redundant services knock you offline? Or will you failover to another provider, maintaining a seamless user experience? On the back-end, how will that failover process work? Will it be automated or manual?
Most midsize and large organizations have redundant systems in place to help them survive an outage. What they might or might not have in place is the automated mechanism that redirects traffic to those redundant systems when a core service goes down.
IBM NS1 Connect Filter Chain™ technology uses the power of DNS to automatically reroute traffic between service providers when there is a network service disruption. With a few basic rules in place, NS1 Connect monitors your network’s status and switches endpoints as needed. You set the rules and the priorities upfront; everything after that happens automatically.
On the NS1 platform, filter chain configurations are applied to individual records within DNS zones. Filter chains determine how NS1 handles queries against each record—specifically, which answers to return. Each filter chain uses a unique logic to process queries. You can create combinations of filters to achieve a specific outcome based on your operational or business needs.
Of course, not everyone wants to direct failover traffic in the same way. So, we’ve put together a quick guide on how to build active-active, active-passive and manual failover systems by using filter chains.
Active-active failover
In this use case, NS1 or third-party data sources monitor the status of individual endpoints in your application delivery infrastructure. When the data indicates an outage on one system, NS1 automatically routes traffic to the secondary systems you choose. It’s called “active-active” because those secondary systems are probably up and running as part of your load balancing system anyway. When there is an outage in one system, NS1 just rebalances the load toward the already active systems.
The first filter in the chain is “Up”. This filter tells the system whether the service provider’s endpoint is operational or not.
The second filter in the chain is either “Shuffle” or “Weighted Shuffle”. If the “Up” filter returns a “false” answer for any endpoint, it automatically distributes traffic to other providers. Shuffle distributes traffic randomly, while Weighted Shuffle distributes it based on weights you provide.
Finally, specify how many answers you want DNS to provide to inbound queries. RFC 1912 requires that only one answer should be returned for every CNAME query. The “Select First N” filter allows you to specify the number of answers that are returned to the requesting client, but the default must be one.
Active-passive failover
As in the active-active use case, NS1 or third-party data sources monitor the status of your application delivery infrastructure and route traffic to secondary systems in the event of a primary system outage. The difference here is that the secondary systems may not be handling traffic already—they’re only spun up when needed as a redundant option.
As in the previous example, the first filter in this chain is “Up”. Drawing from monitoring data, NS1 figures out which of the underlying services are online.
The second filter in this chain is “Priority”. This filter creates a logic that prioritizes active systems over passive or backup systems. If the higher priority answers are available, they will sort to the first position on the possible answer list. If not, NS1 continues down the priority list until it finds an available resource.
Finally, “Select First N” dictates the number of answers to deliver. The answer you’d want it to deliver in this case is one.
Manual failover
Sometimes you want to make failover decisions only after you know more about the situation. In these cases, the filter chain is the implementation mechanism that you use once you’ve determined where you want traffic to go. Instead of pointing a data feed to NS1, you’ll manually turn the filter on when it’s needed by using the active-passive logic.
The first filter in this chain is “Up”, with the difference here that you manually define which services are up and down (instead of a data feed doing that for you).
The second filter in this chain is “Priority”, starting with active systems over passive or backup systems. If the higher priority answers are available, they sort to the first position on the possible answer list. If not, NS1 continues down the priority list until it finds an available resource.
Finally, “Select First N” dictates the number of answers to deliver. The answer you’d want it to deliver in this case is one.
Multi-cloud or multi-CDN availability
In the “active-active” scenario above, the filter chain uses a simple up/down metric to steer traffic. However, sometimes service availability is more nuanced. For example, services sometimes experience regional outages that result in poor service quality—while the service as a whole is technically “up”, it may not be performing at optimal capacity. This filter chain lets you add some nuance to what is considered “up”, using NS1 Connect’s advanced analytics tool as the data source.
The first filter in this chain is “Pulsar Availability Threshold”. This filter allows you to set a percentage value that will determine the usage of a service based on availability metrics.
The second filter in the chain is “Weighted Shuffle”, which distributes traffic to other providers that meet the definition of “available” from the first filter. Traffic is distributed based on weights that you provide.
The third filter is “Pulsar Performance Sort”, which takes the weighted distribution from the previous filter and directs traffic to the fastest available service, eliminating low-performing services based on a threshold you define.
Finally, “Select First N” will dictate the number of answers to deliver. The answer you’d want it to deliver in this case is one.
For more information on how to use filter chains to improve performance and resilience, decrease costs and more, explore more below.
Guard against outages with resilient, redundant network services
Was this article helpful?
YesNo