BGPv4 is an Exterior Gateway Protocol (EGP) and was introduced in 1995 in RFC 1771 and is now defined in RFC 4271. The major difference from earlier version of BGP and v4 is BGPv4 is classless and supports CIDR.
BGP is primarily used to propagate and advertise public networks across the internet. A large majority of Internet communications is made possible by BGP. Autonomous System numbers (AS) are assigned to companies wanting to advertise their networks/ IP ranges to the Internet. AS numbers are controlled and assigned by the Internet Assigned Numbers Authority (IANA) to Regional Internet Registries (RIRs) who then assign specific AS numbers to ISPs or companies requesting an AS number
Unlike IGPs, BGP is connection based and uses TCP port 179 to communicate with peers. Since TCP is used, routing via an IGP or static routes must be in place before BGP peering can establish.
Since each BGP node relies on downstream neighbors to pass along routes, BGP is considered a Distance Vector Protocol . Each node or makes route calculations based on the advertised routes from BGP peering neighbors. Unlike other distance vector protocols BGP uses a routes AS_PATH to determine best path selection for each route. For this reason BGP is commonly called a Path Vector protocol.
Packet Types/Neighbor States
Open Message Sent after the TCP connection is established. This message is used to identify the sending router and to specify operational parameters
- Open message includes:
- BGP version number
- AS number
- Hold time
- BGP ID (highest loopback IP or physical IP if no loopback exist)
- Optional Parameters
Keepalive Message Sent once a router accepts the parameters in the neighbors open message. Keepalives are then sent periodically.
Update Message Sent when route changes are made which include, new routes, withdrawn routes or both.
- Update message includes:
- Network Layer Reachable Information (NLRI) - used to advertise new routes
- Path Attributes
- Withdrawn Routes
- Note: each update message describes only a single BGP route. A new update message must be sent for each route being added.
Notification Message Sent whenever an error is detected between peers . Notification messages always cause the BGP connection to close.
- Open Sent
- Open Confirm
BGP Neighbor States
Idle - BGP always begins in the idle state in which it refuses all incoming connections. When a start event occurs the BGP process initializes and starts establishing a BGP connection with its neighbor.
- An error causes BGP to transition back to the idle state. The router can then try to automatically issue another start event. Too many attempts of a start event can cause flapping so limitations should be set to limit the number of retries.
Connect State - In this state BGP is waiting for the TCP connection to be completed.If the connection is successful then an Open message is sent and the router transitions to the OpenSent state.
- If the TCP connection is unsuccessful then BGP continues to listen for TCP connection attempts from the neighbor, resets its ConnectRety timer and transitions to the Active state
Active State - BGP is trying to initiate a TCP connection with a neighbor.
OpenSent State - An open message has been sent and BGP is waiting to receive an open message from its neighbor.
- If there are errors in the open message (incorrect AS number or version etc...) an error notification is sent and BGP transitions back to the idle state. If no errors are seen then a keepalive message is sent.
OpenConfirm State - The BGP process is waiting for a keepalive or notification from a neighbor.
- If a notification is received or a TCP disconnect is received the state transitions to idle. If the hold timer expires, an error is detected, or a stop event occurs, a notification is sent and the BGP connection is closed changing the state to idle.
Established State - BGP connection is fully established with a neighbor and update messages are exchanged with the new neighbor
- If any errors are found or the keepalive timer times out a notification message is sent and BGP is transitioned back to idle.
Path attributes are what allow BGP administrators to control and manipulate routing updates among peers. BGP path attributes allow you to control what routes are preferred, what routes are advertised to peers and what routes are added to the local routing table.
Path attributes fall into 1 of 4 categories:
Well-known Mandatory - must be included in all updates
Well-known Discretionary – must be supported but may or may not be included in updates
Optional Transitive - not required but peer must accept the attribute
Optional Nontransitive - not required and can be ignored
- Well-known Mandatory
- Well-known discretionary
- Optional Transitive
- Optional nontransitive
- MULTI_EXT_DSC (MED)
- ORIGIN - Specifies the origin of the routing update.
- IGP, EGP, Incomplete (preferred in this order)
- Routes learned from redistribution carry Incomplete origins because BGP cannot tell where the route originated.
- AS_PATH - Uses a sequence of AS numbers to describe the AS path to the destination.
- When a BGP speaker advertises a route to an EBGP peer it prepends it’s AS number to the AS_PATH. When advertising to iBGP peers the AS is not added.
- NEXT_HOP - Describes the next-hop router on the path to the advertised destination. The NEXT_HOP attribute is not always the address of the neighboring router. The following rules apply:
- If the advertising and receiving routers are in different ASs (external peers), the NEXT_HOP is the IP of the advertising router's interface
- If the advertising and receiving routers are in the same AS (internal peers), and the route refers to an internal destination, the NEXT_HOP is the IP of the neighbor that advertised the route
- If the advertising and receiving routers are in the same AS (internal peers), and the route refers to a route in a different AS, the NEXT_HOP is the IP of the external peer from which the route was learned
- LOCAL_PREF - Used only in updates between iBGP peers. It is used to communicate a BGP router's degree of preference for an advertised route.
- When multiple routes to the same destination are received from different iBGP peers the LOCAL_PREF is used to determine the best path.
- Highest value takes preference.
- Default value is 100
- ATOMIC_AGGREGATE - Used to alert downstream routers that a loss of path info has occurred due to summarization of subnets.
- If an update is received with the ATOMIC_AGGREGATE attribute set that BGP speaker cannot update the route with more specific information. Also the attribute must be set when passing the route to other peers.
- AGGREGATOR - Provides information about where the aggregation was performed by including the AS and router ID of the originating aggregating router.
- COMMUNITY - Used to simplify policy enforcement by setting a community value.
- 4 octets are used (AA:NN) where AA represents the AS and NN is an administratively set value. An example would be 65001:70.
- Cisco uses NN:AA instead and "ip bgp-community new-format" must be set to use AA:NN
- Reserved COMMUNITY values used for policy enforcement
- INTERNET - all routes belong to this community by default and advertised freely
- NO_EXPORT - routes cannot be advertised to EBGP peers or advertised outside the confederation.
- NO_ADVERTISE - routes cannot be advertised to any peer (EBGP or iBGP)
- LOCAL_AS - (aka NO_EXPORT_SUBCONFED per RFC 1997) routes cannot be advertised to EBGP peers including peers in other ASs within the same confederation.
- MULTI_EXT_DSC (MED) - used to influence routes entering the local AS.
- Carried in EBGP updates this attribute allows an AS to inform a directly connected AS of its preferred ingress points.
- Lowest value is preferred.
- Default value is 0
- MED cannot be passed beyond the directly connected AS. For this the AS_PATH must be manipulated.
- By default MEDs are not compared if two routes to the same destination are received from two different ASs
- ORIGINATOR_ID - 32-bit value created by route reflectors to prevent routing loops.
- The value is the RID of the originating router of a route in the local AS. If a BGP speaker sees it's RID in the ORIGINATOR_ID attribute of a received update it knows a loop has occurred and ignores the update.
- CLUSTER_LIST - A sequence of route reflection cluster IDs used by route reflectors to prevent routing loops.
- CLUSTER _LIST consist of all cluster IDs a specific route has passed through. If a route reflector sees its own cluster ID in this attribute it knows a loop has occurred and ignores the update.
- Administrative Weight - Cisco specific BGP parameter assigned to help prioritize outbound routes.
- Local to router only and not communicated out
- Weight between 0 and 65,535. The higher the weight the more preferable the route
- Weight considered before all other characteristics
- Routes generated by local router = 32,768
- Routes learned from a peer = 0
- AS_SET - Used to prevent loops (just like AS_PATH) by listing all ASs traversed (not listed in order) in the route. Used when an aggregate summarizes a route and starts the AS_PATH over. AS_SET is included (with all original ASs) so routers can determine if a loop has occurred.
- When AS_SET is included an ATOMIC_AGGREGATE does not have to be included with the aggregate.
- Updates are sent when, ASs change within an aggregate and AS_SET is included. Without the AS_SET no update would be sent since it’s an aggregate.
Attribute Order of Preference
- Adminastrative Weight (Cisco only) - Highest wins
- LOCAL_PREF - Highest wins
- Prefer route learned locally through IGP
- AS_PATH - Shortest path wins
- Origin Code - Lowest wins
- MED - Lowest wins
- EBGP > Confederation EBGP > IBGP routes
- BGP NEXT HOP - Lowest IGP metric to next hop wins
- BGP Router ID - Lowest wins
eBGP and iBGP
Exterior BGP (eBGP) is used to setup BGP peering among peers of different autonomous systems. eBGP peering is most common among ISPs and their customers. ISPs also establish peering points with other service providers via eBGP peering.
When an eBGP peer advertises routes to its neighboring peer the AS number is prepended to the AS_PATH. If a router receives the same route from multiple BGP peers then the route with the shortest AS_PATH is chosen and added to the routing table. Routers will then only advertise the best route to other BGP peers. An example AS_PATH would be 65001 65010 65111. Using this AS_PATH we can see the route was originated in AS 65111 and was then advertised to 65010 and then again advertised to 65001. To avoid loops, if a BGP peer sees its own AS number in the AS_PATH then it knows a loop will occur and discards the route.
Internal BGP (iBGP) is used to setup BGP peering among peers of the same AS. Usually iBGP peers fall inside the same company or organization. iBGP is usually seen in multihomed scenarios and transit ASs which are used to pass BGP routes from one AS to another.
When routes are advertised between iBGP peers the AS-PATH is not changes since the routes stay within the same AS. The AS number is not prepended to the AS-PATH until a route is advertised to an eBGP peer. Since the AS-PATH is used by BGP to protect against routing loops iBGP peers are unable to tell if a route advertised from another iBGP peer will cause a loop. To solve this issue iBGP peers do not advertise routes learned from iBGPs peer to other iBGP peers, thus providing loop avoidance within an AS.
The problem with the iBGP loop avoidance rule is BGP routes learned on one end of an AS are not fully propagated to routers on the other end of an AS. One of three solutions must be used to fully propagate BGP routes across the AS.
- All iBGP peers must be fully meshed and peer with all other iBGP routers within the AS.
- By fully meshing all iBGP peers each BGP router will receive updates from all routers in the AS
- Unfortunately this is not always possible and does not scale with a growing network
- Synchronization must be used and BGP routes must be redistributed into the IGP so the routes can be advertised across the AS.
- Most IGPs are unable to handle large BGP tables much less the full Internet BGP table.
- Route Reflectors must be established.
Route Reflectors are defined in RFC 4456 and used primarily in large autonomous systems to propagate routes to all BGP peers without the use of a fully meshed AS. In larger networks it is impractical to setup full mesh peering between all peers within the As. Route Reflection provides a means to centralize iBGP peering to a single router or group of routers known as route reflectors. All routers (known as clients) within the AS peer with a centralized router (route reflector or server). Normally iBGP routers do not advertise routes learned from iBGP peers internally but route reflectors are the exception to this rule. Route reflectors advertise routes to both iBGP and eBGPs peers thus allowing iBGP learned routes to propagate to all peers within an AS. A group of route reflector(s) and clients is known as a cluster. If multiple route reflectors exist within a single cluster then the cluster name must be defined on each route reflector.
The key benefit to using route reflection over other techniques such as Confederations is route reflection does not need to be supported by all routers in the cluster or AS. Route reflection just needs to be supported on the route reflectors or servers. Clients do not need any additional configurations to join a cluster. Clients are specified on each route reflector servicing the cluster.
- Client peers – routers who are a member of the cluster.
- Non-Client peers – routers who are not a member of the cluster
Route reflectors will treat each route differently depending on how a route is received. There are three rules followed by route reflectors.
- Locally originated routes and routes received from EBGP neighbors are propagated to all BGP peers (internal and external).
- Routes received from a client are propagated to all BGP peers (internal and external).
- Routes received from an iBGP non-client peer are propagated to all EBGP peers and all IBGP client peers.
Route Reflection Design
When designing your BGP network for route reflection you need to consider the location of the route reflectors in comparison to all client peers. Generally routers centralized to the network and are able to peer with all neighbors should be used as route reflectors. For example, in a star topology the hub router will be used as the route reflector. If it is not possible to have a single centralized route reflector then multiple route reflectors should be used. Multiple route reflectors should also be considered for redundancy in the event of a router failure.
For large networks you might also consider breaking down your network into multiple clusters. Router reflectors can be clients of other route reflectors. This will allow you to setup a hierarchal network of clusters. A good example would be to create a separate cluster for each city or geographical area and then each route reflector is a client for the back bone cluster. Route reflection can also be used alongside Confederations to improve control over routing updates across the network.
AS-path (Filter List)
Halabi, Sam and McPherson, Danny (2000). Internet Routing Architectures, 2nd Edition,. Cisco Press. ISBN Ciscopress ISBN 1-57870-233-X