Community ForumCommunity Wiki * Blog Home  * Log in
« »

Decreasing BGP Failover Time Using IP SLA

posted in Cisco Networking
by on June 13th, 2012 tags: , , , ,


Service providers are moving away from from providing TDM point-to-point based circuits and we are now seeing more provisioning of Metro Ethernet to the customer site.

This leaves us with an issue in that when your BGP peer becomes unreachable, because your local FastEthernet interface on the CE will still be up/up as it will probably be connected to some Layer 2 device, the customer network could suffer a complete outage for up to 3 minutes. The BGP default hold time is 180 seconds. For a customer that has been sold a 100M pipe with resilience this is not going to make them happy.

Here is the topology I am using for this example:

Gateways 1 & 2 have an iBGP neighborship over the f0/0 cross-link and provide a virtual default gateway using HSRP on the f0/1 LAN.

The LAN_HOST is not aware of routing and simply has a default route pointed to the HSRP address 192.168.1.1.

Gateway 1 is the primary router for inbound and outbound traffic. This is enforced using the following policies:

Under normal conditions, the LAN can reach the Internet.

Here is an extended PING while PE1 experiences an outage.

That’s a long outage!

There are 2 problems:

Now let’s speed things up.

Using IP SLA & BGP Failover

On Gateway 1 create an IP SLA process which starts PINGing the eBGP peer 10.1.1.253 every 5 seconds.

Next create an object which tracks this process. Use number 2 because Object 1 is used on the HSRP interface tracker.

Next create a /32 static route to the peer using the peer itself as the next hop which uses the object status to validate itself. This will override the /30 connected route.

Next create a prefix list to match this route.

Then create a route-map which matches the prefix-list.

Finally add the following neighbor statement under the BGP process which uses the route-map and the BGP failover feature.

The output is truncated, the full command is:  “neighbor 10.1.1.253 fall-over route-map PEER_REACHABLE”

With this in effect, outage time is shorter because the eBGP peer on Gateway 1 is shut down immediately upon it being unreachable which will purge any stagnant routes in the routing table.

Another final touch is to switch over the HSRP primary to avoid sub optimal routing by tracking the same object we created for BGP Failover.

This at least removes a hop from our trace.

There is little more we can do on the customer’s network AS100 as the remaining failover delay exists on the service provider network AS200.

PE1 & PE2 peer using loopbacks learned via OSPF with the Next-Hop-Self option set. The default OSPF hold-time is 40 seconds on the broadcast segment over the 172.16.0.0/30 network. When OSPF dies, the BGP Next-hop becomes unreachable and the associated routes are removed long before the BGP peering times out.

Just for giggles, changing the OSPF Hello & dead intervals to 2 & 6 respectively results in the following improved failover time.

We could just avoid all this headache and reduce the BGP hold timers in the first place, but that would be no fun :-)

I am open for constructive criticism from the senior forum members as to what better designs could be deployed in this scenario.

I hope you enjoyed reading and it has been beneficial for you.

For completeness, please see below the configs for both Gateways and PEs.

Gateway 1

track 2 rtr 1
!
!
!
!
interface FastEthernet0/0
ip address 192.168.255.1 255.255.255.252
duplex auto
speed auto
!
interface FastEthernet0/1
ip address 192.168.1.253 255.255.255.0
duplex full
speed 100
standby 1 ip 192.168.1.1
standby 1 priority 105
standby 1 preempt
standby 1 track GigabitEthernet1/0
standby 1 track 2 decrement 10
!
interface GigabitEthernet1/0
ip address 10.1.1.254 255.255.255.252
negotiation auto
!
router bgp 100
no synchronization
bgp log-neighbor-changes
network 192.168.1.0
neighbor 10.1.1.253 remote-as 200
neighbor 10.1.1.253 fall-over route-map PEER_REACHABLE
neighbor 10.1.1.253 route-map INTERNET in
neighbor 10.1.1.253 route-map PRIMARY out
neighbor 192.168.255.2 remote-as 100
neighbor 192.168.255.2 next-hop-self
no auto-summary
!
ip forward-protocol nd
ip route 10.1.1.253 255.255.255.255 GigabitEthernet1/0 10.1.1.253 track 2
ip route 0.0.0.0 0.0.0.0 198.77.64.40
no ip http server
no ip http secure-server
!
!
!
!
ip prefix-list INTERNET seq 5 permit 198.77.64.40/32
!
ip prefix-list PEER_REACHABLE seq 5 permit 10.1.1.253/32
!
ip prefix-list PRIMARY seq 5 permit 192.168.1.0/24
ip sla 1
icmp-echo 10.1.1.253
frequency 5
ip sla schedule 1 life forever start-time now
logging alarm informational
!
!
!
route-map PEER_REACHABLE permit 10
match ip address prefix-list PEER_REACHABLE
!
route-map INTERNET permit 10
match ip address prefix-list INTERNET
set local-preference 150
!
route-map INTERNET permit 20
!
route-map PRIMARY permit 10
match ip address prefix-list PRIMARY
set metric 50
!
route-map PRIMARY permit 20

Gateway 2

interface FastEthernet0/0
ip address 192.168.255.2 255.255.255.252
duplex auto
speed auto
!
interface FastEthernet0/1
ip address 192.168.1.254 255.255.255.0
duplex auto
speed auto
standby 1 ip 192.168.1.1
standby 1 preempt
standby 1 track GigabitEthernet1/0
!
interface GigabitEthernet1/0
ip address 10.2.2.254 255.255.255.252
negotiation auto
!
router bgp 100
no synchronization
bgp log-neighbor-changes
network 192.168.1.0
neighbor 10.2.2.253 remote-as 200
neighbor 10.2.2.253 route-map BACKUP out
neighbor 192.168.255.1 remote-as 100
neighbor 192.168.255.1 next-hop-self
no auto-summary
!
ip forward-protocol nd
ip route 0.0.0.0 0.0.0.0 198.77.64.40
no ip http server
no ip http secure-server
!
!
!
!
ip prefix-list BACKUP seq 5 permit 192.168.1.0/24
!
ip prefix-list INTERNET seq 5 permit 198.77.64.40/32
logging alarm informational
!
!
!
route-map BACKUP permit 10
match ip address prefix-list BACKUP
set metric 200
!
route-map BACKUP permit 20

PE1

interface Loopback0
ip address 1.1.1.1 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
duplex half
!
interface GigabitEthernet1/0
ip address 10.1.1.253 255.255.255.252
negotiation auto
!
interface FastEthernet2/0
ip address 172.16.1.1 255.255.255.252
ip ospf hello-interval 2
ip ospf dead-interval 6
duplex auto
speed auto
!
interface FastEthernet2/1
no ip address
shutdown
duplex auto
speed auto
!
interface GigabitEthernet3/0
no ip address
shutdown
negotiation auto
!
router ospf 1
log-adjacency-changes
network 1.1.1.1 0.0.0.0 area 0
network 172.16.0.0 0.0.255.255 area 0
!
router bgp 200
no synchronization
bgp log-neighbor-changes
neighbor 2.2.2.2 remote-as 200
neighbor 2.2.2.2 update-source Loopback0
neighbor 2.2.2.2 next-hop-self
neighbor 10.1.1.254 remote-as 100
no auto-summary

PE2

interface Loopback0
ip address 2.2.2.2 255.255.255.255
!
interface FastEthernet0/0
no ip address
shutdown
duplex half
!
interface GigabitEthernet1/0
ip address 10.2.2.253 255.255.255.252
negotiation auto
!
interface FastEthernet2/0
ip address 172.16.1.2 255.255.255.252
ip ospf hello-interval 2
ip ospf dead-interval 6
duplex auto
speed auto
!
interface FastEthernet2/1
no ip address
shutdown
duplex auto
speed auto
!
interface GigabitEthernet3/0
ip address 10.3.3.253 255.255.255.252
negotiation auto
!
router ospf 1
log-adjacency-changes
network 2.2.2.2 0.0.0.0 area 0
network 172.16.0.0 0.0.255.255 area 0
!
router bgp 200
no synchronization
bgp log-neighbor-changes
neighbor 1.1.1.1 remote-as 200
neighbor 1.1.1.1 update-source Loopback0
neighbor 1.1.1.1 next-hop-self
neighbor 10.2.2.254 remote-as 100
neighbor 10.3.3.254 remote-as 40
no auto-summary

Finally here are the IP Routing and BGP table on Gateway 1 before and after a failover.

Gateway1#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route

Gateway of last resort is 198.77.64.40 to network 0.0.0.0

10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C       10.1.1.252/30 is directly connected, GigabitEthernet1/0
S       10.1.1.253/32 [1/0] via 10.1.1.253, GigabitEthernet1/0
192.168.255.0/30 is subnetted, 1 subnets
C       192.168.255.0 is directly connected, FastEthernet0/0
C    192.168.1.0/24 is directly connected, FastEthernet0/1
198.77.64.0/32 is subnetted, 1 subnets
B       198.77.64.40 [20/0] via 10.1.1.253, 00:01:04
S*   0.0.0.0/0 [1/0] via 198.77.64.40
Gateway1#show ip bgp sum
Gateway1#show ip bgp
BGP table version is 30, local router ID is 192.168.255.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network          Next Hop            Metric LocPrf Weight Path
* i192.168.1.0      192.168.255.2            0    100      0 i
*>                  0.0.0.0                  0         32768 i
*> 198.77.64.40/32  10.1.1.253                    150      0 200 40 i
Gateway1#
*May 28 23:41:08.563: %TRACKING-5-STATE: 2 rtr 1 state Up->Down
*May 28 23:41:08.563: %BGP-5-ADJCHANGE: neighbor 10.1.1.253 Down Route to peer lost
Gateway1#
*May 28 23:41:09.631: %HSRP-5-STATECHANGE: FastEthernet0/1 Grp 1 state Active -> Speak
Gateway1#
Gateway1#
Gateway1#
Gateway1#
Gateway1#
*May 28 23:41:19.631: %HSRP-5-STATECHANGE: FastEthernet0/1 Grp 1 state Speak -> Standby
Gateway1#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route

Gateway of last resort is 198.77.64.40 to network 0.0.0.0

10.0.0.0/30 is subnetted, 1 subnets
C       10.1.1.252 is directly connected, GigabitEthernet1/0
192.168.255.0/30 is subnetted, 1 subnets
C       192.168.255.0 is directly connected, FastEthernet0/0
C    192.168.1.0/24 is directly connected, FastEthernet0/1
198.77.64.0/32 is subnetted, 1 subnets
B       198.77.64.40 [200/0] via 192.168.255.2, 00:00:16
S*   0.0.0.0/0 [1/0] via 198.77.64.40
Gateway1#show ip bgp
BGP table version is 32, local router ID is 192.168.255.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network          Next Hop            Metric LocPrf Weight Path
* i192.168.1.0      192.168.255.2            0    100      0 i
*>                  0.0.0.0                  0         32768 i
*>i198.77.64.40/32  192.168.255.2            0    100      0 200 40 i

Comments

A thread has been created on the site forum specifically for commenting on this blog post.