rfc9696v1.txt   rfc9696.txt 
Internet Engineering Task Force (IETF) Y. Wei, Ed. Internet Engineering Task Force (IETF) Y. Wei, Ed.
Request for Comments: 9696 Z. Zhang Request for Comments: 9696 Z. Zhang
Category: Informational ZTE Corporation Category: Informational ZTE Corporation
ISSN: 2070-1721 D. Afanasiev ISSN: 2070-1721 D. Afanasiev
Yandex Yandex
P. Thubert P. Thubert
Cisco Systems Individual
T. Przygienda T. Przygienda
Juniper Networks Juniper Networks
December 2024 December 2024
Routing in Fat Trees (RIFT) Applicability and Operational Considerations Routing in Fat Trees (RIFT) Applicability and Operational Considerations
Abstract Abstract
This document discusses the properties, applicability, and This document discusses the properties, applicability, and
operational considerations of Routing in Fat Trees (RIFT) in operational considerations of Routing in Fat Trees (RIFT) in
skipping to change at line 112 skipping to change at line 112
8.2. Informative References 8.2. Informative References
Acknowledgments Acknowledgments
Contributors Contributors
Authors' Addresses Authors' Addresses
1. Introduction 1. Introduction
This document discusses the properties and applicability of "RIFT: This document discusses the properties and applicability of "RIFT:
Routing in Fat Trees" [RFC9692] in different deployment scenarios and Routing in Fat Trees" [RFC9692] in different deployment scenarios and
highlights the operational simplicity of the technology compared to highlights the operational simplicity of the technology compared to
traditional routing solutions. It also documents special classical routing solutions. It also documents special
considerations when RIFT is used with or without overlays and/or considerations when RIFT is used with or without overlays and/or
controllers and how RIFT identifies miscablings and reroutes around controllers and how RIFT identifies miscablings and reroutes around
node and link failures. node and link failures.
2. Terminology 2. Terminology
This document uses the terminology defined in [RFC9692]. The most This document uses the terminology defined in [RFC9692]. The most
frequently used terms and their definitions from that document are frequently used terms and their definitions from that document are
listed here. listed here.
skipping to change at line 138 skipping to change at line 138
2-leaf shortcuts and multiple level shortcuts are possible and 2-leaf shortcuts and multiple level shortcuts are possible and
described further in the document. described further in the document.
Crossbar: Crossbar:
Physical arrangement of ports in a switching matrix without Physical arrangement of ports in a switching matrix without
implying any further scheduling or buffering disciplines. implying any further scheduling or buffering disciplines.
Directed Acyclic Graph (DAG): Directed Acyclic Graph (DAG):
A finite directed graph with no directed cycles (loops). If links A finite directed graph with no directed cycles (loops). If links
in a Clos are considered as either being all directed towards the in a Clos are considered as either being all directed towards the
top or vice versa, each of two such graphs is a DAG. top or bottom, each of such two graphs is a DAG.
Disaggregation: Disaggregation:
The process in which a node decides to advertise more specific The process in which a node decides to advertise more specific
prefixes southwards, either positively to attract the prefixes southwards, either positively to attract the
corresponding traffic or negatively to repel it. Disaggregation corresponding traffic or negatively to repel it. Disaggregation
is performed to prevent traffic loss and suboptimal routing to the is performed to prevent traffic loss and suboptimal routing to the
more specific prefixes. more specific prefixes.
Leaf: Leaf:
A node without southbound adjacencies. Level 0 implies a leaf in A node without southbound adjacencies. Level 0 implies a leaf in
skipping to change at line 181 skipping to change at line 181
as links and address prefixes. A TIE always has a direction and a as links and address prefixes. A TIE always has a direction and a
type. North TIEs (sometimes abbreviated as N-TIEs) are used when type. North TIEs (sometimes abbreviated as N-TIEs) are used when
dealing with TIEs in the northbound representation, and South-TIEs dealing with TIEs in the northbound representation, and South-TIEs
(sometimes abbreviated as S-TIEs) are used for the southbound (sometimes abbreviated as S-TIEs) are used for the southbound
equivalent. TIEs have different types, such as node and prefix equivalent. TIEs have different types, such as node and prefix
TIEs. TIEs.
3. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks 3. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks
Clos [CLOS] topologies (commonly called a Fat Tree/network in modern Clos [CLOS] topologies (commonly called a Fat Tree/network in modern
IP fabric considerations as a homonym to the original definition of IP fabric considerations as a similar term for the original
the term Fat Tree [FATTREE]) have gained prominence in today's definition of the term Fat Tree [FATTREE]) have gained prominence in
networking, primarily as a result of the paradigm shift towards a today's networking, primarily as a result of the paradigm shift
centralized data-center-based architecture that delivers a majority towards a centralized data-center-based architecture that delivers a
of computation and storage services. majority of computation and storage services.
Current routing protocols were geared towards a network with an Current routing protocols were geared towards a network with an
irregular topology with isotropic properties and a low degree of irregular topology with isotropic properties and a low degree of
connectivity. When applied to Fat Tree topologies: connectivity. When applied to Fat Tree topologies:
* They tend to need extensive configuration or provisioning during * They tend to need extensive configuration or provisioning during
initialization and adding or removing nodes from the fabric. initialization and adding or removing nodes from the fabric.
* For link-state routing protocols, all nodes including spine-and- * For link-state routing protocols, all nodes including spine-and-
leaf nodes learn the entire network topology and routing leaf nodes learn the entire network topology and routing
skipping to change at line 276 skipping to change at line 276
v ++--++ +-+-++ ++--++ ++--++ + v ++--++ +-+-++ ++--++ ++--++ +
|LEAF| |LEAF| |LEAF| |LEAF| LEVEL 0 |LEAF| |LEAF| |LEAF| |LEAF| LEVEL 0
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
Figure 1: RIFT Overview Figure 1: RIFT Overview
A spine node only has information necessary for its level, which is A spine node only has information necessary for its level, which is
all destinations south of the node based on SPF calculation, the all destinations south of the node based on SPF calculation, the
default route, and potentially disaggregated routes. default route, and potentially disaggregated routes.
RIFT combines the advantages of both link-state and distance-vector: RIFT combines the advantages of both link-state and distance-vector
protocols:
* Fastest possible convergence * Fastest possible convergence
* Automatic detection of topology * Automatic detection of topology
* Minimal routes/information on Top-of-Rack (ToR) switches, aka leaf * Minimal routes/information on Top-of-Rack (ToR) switches, aka leaf
nodes nodes
* High degree of ECMP * High degree of ECMP
skipping to change at line 299 skipping to change at line 300
* Maximum propagation speed with flexible prefixes in an update * Maximum propagation speed with flexible prefixes in an update
There are two types of link-state databases that are "north There are two types of link-state databases that are "north
representation" North Topology Information Elements (N-TIEs) and representation" North Topology Information Elements (N-TIEs) and
"south representation" South Topology Information Elements (S-TIEs). "south representation" South Topology Information Elements (S-TIEs).
The N-TIEs contain a link-state topology description of lower levels, The N-TIEs contain a link-state topology description of lower levels,
and the S-TIEs simply carry default and disaggregated routes for the and the S-TIEs simply carry default and disaggregated routes for the
lower levels. lower levels.
RIFT also eliminates major disadvantages of link-state and distance- RIFT also eliminates major disadvantages of link-state and distance-
vector with the following: vector protocols with the following:
* Reduced and balanced flooding * Reduced and balanced flooding
* Level-constrained automatic neighbor discovery * Level-constrained automatic neighbor discovery
To achieve this, RIFT builds on the art of IGPs, such as OSPF, IS-IS, To achieve this, RIFT builds on the art of IGPs, such as OSPF, IS-IS,
Mobile Ad Hoc Network (MANET), and Internet of Things (IoT) to Mobile Ad Hoc Network (MANET), and Internet of Things (IoT) to
provide unique features: provide unique features:
* Automatic (positive or negative) route disaggregation of northward * Automatic (positive or negative) route disaggregation of northward
skipping to change at line 363 skipping to change at line 364
4.2.1. Horizontal Links 4.2.1. Horizontal Links
RIFT is not limited to pure Clos divided into PoD and multi-planes RIFT is not limited to pure Clos divided into PoD and multi-planes
but supports horizontal (East-West) links below the ToF level. Those but supports horizontal (East-West) links below the ToF level. Those
links are used only for last resort northbound forwarding when a links are used only for last resort northbound forwarding when a
spine loses all its northbound links or cannot compute a default spine loses all its northbound links or cannot compute a default
route through them. route through them.
A full-mesh connectivity between nodes on the same level can be A full-mesh connectivity between nodes on the same level can be
employed and that allows North SPF (N-SPF) to provide for any node deployed, which allows North SPF (N-SPF) to provide for any node
losing all its northbound adjacencies (as long as any of the other losing all its northbound adjacencies (as long as any of the other
nodes in the level are northbound connected) to still participate in nodes in the level are northbound connected) and still participate in
northbound forwarding. northbound forwarding.
Note that a "ring" of horizontal links at any level below ToF does Note that a "ring" of horizontal links at any level below ToF does
not provide a "ring-based protection" scheme since the SPF not provide a "ring-based protection" scheme since the SPF
computation would have to deal with breaking of "loops", an computation would have to deal with breaking of "loops", an
application for which RIFT is not intended. application for which RIFT is not intended.
4.2.2. Vertical Shortcuts 4.2.2. Vertical Shortcuts
Through relaxations of the specified adjacency forming rules, RIFT Through relaxations of the specified adjacency forming rules, RIFT
skipping to change at line 409 skipping to change at line 410
operation specified for East-West links and the southbound operation specified for East-West links and the southbound
reflection between nodes are not applicable. Also, ZTP will reflection between nodes are not applicable. Also, ZTP will
derive a sense of depth that will eliminate some links. derive a sense of depth that will eliminate some links.
Variations of ZTP could be derived to meet specific objectives, Variations of ZTP could be derived to meet specific objectives,
e.g., make it so that most routers have at least two parents to e.g., make it so that most routers have at least two parents to
reach the ToF. reach the ToF.
* RIFT applies to any Destination-Oriented DAG (DODAG) where there's * RIFT applies to any Destination-Oriented DAG (DODAG) where there's
only one ToF node and the problem of disaggregation does not only one ToF node and the problem of disaggregation does not
exist. In that case, RIFT operates very much like RPL [RFC6550], exist. In that case, RIFT operates very much like RPL [RFC6550],
but uses Link State for southbound routes (downwards in RPL's but uses link-state information for southbound routes (downwards
terms). For an arbitrary DAG with multiple destinations (ToFs), in RPL's terms). For an arbitrary DAG with multiple destinations
the way disaggregation happens has to be considered. (ToFs), the way disaggregation happens has to be considered.
* Positive Disaggregation expects that most of the ToF nodes reach * Positive Disaggregation expects that most of the ToF nodes reach
most of the leaves, so disaggregation is the exception as opposed most of the leaves, so disaggregation is the exception as opposed
to the rule. When this is no longer true, it makes sense to turn to the rule. When this is no longer true, it makes sense to turn
off disaggregation and route between the ToF nodes over a ring, a off disaggregation and route between the ToF nodes over a ring, a
full mesh, a transit network, or a form of area zero. Then again, full mesh, a transit network, or a form of area zero. Then again,
this operation is similar to RPL operating as a single DODAG with this operation is similar to RPL operating as a single DODAG with
a virtual root. a virtual root.
* In order to aggregate and disaggregate routes, RIFT requires that * In order to aggregate and disaggregate routes, RIFT requires that
skipping to change at line 433 skipping to change at line 434
fabric. This can be achieved with a ring as suggested by RIFT fabric. This can be achieved with a ring as suggested by RIFT
[RFC9692], by some preconfiguration, or by using a synchronization [RFC9692], by some preconfiguration, or by using a synchronization
with a common repository where all the active prefixes are with a common repository where all the active prefixes are
registered. registered.
4.2.4. Reachability of Internal Nodes in the Fabric 4.2.4. Reachability of Internal Nodes in the Fabric
RIFT does not require that nodes have reachable addresses in the RIFT does not require that nodes have reachable addresses in the
fabric, though it is clearly desirable for operational purposes. fabric, though it is clearly desirable for operational purposes.
Under normal operating conditions, this can be easily achieved by Under normal operating conditions, this can be easily achieved by
injecting the node's loopback address into North and South Prefix injecting the node's loopback address into Prefix North TIEs and
TIEs or other implementation-specific mechanisms. Prefix South TIEs or other implementation-specific mechanisms.
Special considerations arise when a node loses all northbound Special considerations arise when a node loses all northbound
adjacencies but is not at the top of the fabric. If a spine node adjacencies but is not at the top of the fabric. If a spine node
loses all northbound links, the spine node doesn't advertise a loses all northbound links, the spine node doesn't advertise a
default route. But if the level of the spine node is auto-determined default route. But if the level of the spine node is auto-determined
by ZTP, it will "fall down" as depicted in Figure 8. by ZTP, it will "fall down" as depicted in Figure 8.
4.3. Use Cases 4.3. Use Cases
4.3.1. Data Center Topologies 4.3.1. Data Center Topologies
4.3.1.1. Data Center Fabrics 4.3.1.1. Data Center Fabrics
RIFT is suited for applying in data center (DC) IP fabrics underlay RIFT is suited for applying underlay routing in data center (DC) IP
routing, vast majority of which seem to be currently (and for the fabrics, with the vast majority of these IP fabrics being Clos
foreseeable future) Clos architectures. It significantly simplifies architectures (and will be for the foreseeable future). It
operation and deployment of such fabrics as described in Section 5 significantly simplifies operation and deployment of such fabrics as
for environments compared to extensive proprietary provisioning and described in Section 5 for environments compared to extensive
operational solutions. proprietary provisioning and operational solutions.
4.3.1.2. Adaptations to Other Proposed Data Center Topologies 4.3.1.2. Adaptations to Other Proposed Data Center Topologies
. +-----+ +-----+ . +-----+ +-----+
. | | | | . | | | |
.+-+ S0 | | S1 | .+-+ S0 | | S1 |
.| ++---++ ++---++ .| ++---++ ++---++
.| | | | | .| | | | |
.| | +------------+ | .| | +------------+ |
.| | | +------------+ | .| | | +------------+ |
skipping to change at line 507 skipping to change at line 508
environments close to content producers (server farms connection via environments close to content producers (server farms connection via
DC fabrics) but in proximity to content consumers as well. Consumers DC fabrics) but in proximity to content consumers as well. Consumers
are often clustered in metro areas with their own network are often clustered in metro areas with their own network
architectures that can benefit from simplified, regular Clos architectures that can benefit from simplified, regular Clos
structures. Thus, they can also benefit from RIFT. structures. Thus, they can also benefit from RIFT.
4.3.3. Building Cabling 4.3.3. Building Cabling
Commercial edifices are often cabled in topologies that are either Commercial edifices are often cabled in topologies that are either
Clos or its isomorphic equivalents. The Clos can grow rather high Clos or its isomorphic equivalents. The Clos can grow rather high
with many levels. That presents a challenge for traditional routing with many levels. That presents a challenge for classical routing
protocols (except BGP [RFC4271] and Private Network-Network Interface protocols (except BGP [RFC4271] and Private Network-Network Interface
(PNNI) [PNNI], which is largely phased-out by now) that do not (PNNI) [PNNI], which is largely phased-out by now) that do not
support an arbitrary number of levels, which RIFT does naturally. support an arbitrary number of levels, which RIFT does naturally.
Moreover, due to the limited sizes of forwarding tables in network Moreover, due to the limited sizes of forwarding tables in network
elements of building cabling, the minimum FIB size RIFT maintains elements of building cabling, the minimum FIB size RIFT maintains
under normal conditions is cost-effective in terms of hardware and under normal conditions is cost-effective in terms of hardware and
operational costs. operational costs.
4.3.4. Internal Router Switching Fabrics 4.3.4. Internal Router Switching Fabrics
skipping to change at line 542 skipping to change at line 543
The Cloud Central Office (CloudCO) is a new stage of the telecom The Cloud Central Office (CloudCO) is a new stage of the telecom
Central Office. It takes the advantage of Software-Defined Central Office. It takes the advantage of Software-Defined
Networking (SDN) and Network Function Virtualization (NFV) in Networking (SDN) and Network Function Virtualization (NFV) in
conjunction with general purpose hardware to optimize current conjunction with general purpose hardware to optimize current
networks. The following figure illustrates this architecture at a networks. The following figure illustrates this architecture at a
high level. It describes a single instance or macro-node of CloudCO high level. It describes a single instance or macro-node of CloudCO
that provides a number of value-added services (VASes), a Broadband that provides a number of value-added services (VASes), a Broadband
Access Abstraction (BAA), and virtualized network services. An Access Abstraction (BAA), and virtualized network services. An
Access I/O module faces a CloudCO access node and the Customer Access I/O module faces a CloudCO access node and the Customer
Premises Equipment (CPE) behind it. A Network I/O module is facing Premises Equipment (CPE) behind it. A Network I/O module is facing
the core network. The two I/O modules are interconnected by a leaf the core network. The two I/O modules are interconnected by a spine-
and spine fabric [TR-384]. and-leaf fabric [TR-384].
+---------------------+ +----------------------+ +---------------------+ +----------------------+
| Spine | | Spine | | Spine | | Spine |
| Switch | | Switch | | Switch | | Switch |
+------+---+------+-+-+ +--+-+-+-+-----+-------+ +------+---+------+-+-+ +--+-+-+-+-----+-------+
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | | +-------------------------------+ | | | | | | +-------------------------------+ |
| | | | | | | | | | | | | | | | | | | | | | | |
| | | | +-------------------------+ | | | | | | | +-------------------------+ | | |
| | | | | | | | | | | | | | | | | | | | | | | |
skipping to change at line 615 skipping to change at line 616
scenarios. scenarios.
* RIFT automatically negotiates Bidirectional Forwarding Detection * RIFT automatically negotiates Bidirectional Forwarding Detection
(BFD) per link. This allows for IP and micro-BFD [RFC7130] to (BFD) per link. This allows for IP and micro-BFD [RFC7130] to
replace Link Aggregation Groups (LAGs) that hide bandwidth replace Link Aggregation Groups (LAGs) that hide bandwidth
imbalances in case of constituent failures. Further automatic imbalances in case of constituent failures. Further automatic
link validation techniques similar to those in [RFC5357] could be link validation techniques similar to those in [RFC5357] could be
supported as well. supported as well.
* RIFT inherently solves many problems associated with the use of * RIFT inherently solves many problems associated with the use of
traditional routing topologies with dense meshes and high degrees classical routing topologies with dense meshes and high degrees of
of ECMP by including automatic bandwidth balancing, flood ECMP by including automatic bandwidth balancing, flood reduction,
reduction, and automatic disaggregation on failures while and automatic disaggregation on failures while providing maximum
providing maximum aggregation of prefixes in default scenarios. aggregation of prefixes in default scenarios. ECMP in RIFT
ECMP in RIFT eliminates the need for more Loop-Free Alternate eliminates the need for more Loop-Free Alternate (LFA) procedures.
(LFA) procedures.
* RIFT reduces FIB size towards the bottom of the IP fabric where * RIFT reduces FIB size towards the bottom of the IP fabric where
most nodes reside and allows with that for cheaper hardware on the most nodes reside. This allows for cheaper hardware on the edges
edges and introduction of modern IP fabric architectures that and introduction of modern IP fabric architectures that encompass
encompass, e.g., server multihoming. server multihoming and other mechanisms.
* RIFT provides valley-free routing that is loop free. A valley- * RIFT provides valley-free routing that is loop free. A valley-
free path allows for reversal of direction at most once from a free path allows for reversal of direction at most once from a
packet heading northbound to southbound while permitting traversal packet heading northbound to southbound while permitting traversal
of horizontal links in the northbound phase. This allows for the of horizontal links in the northbound phase. This allows for the
use of any such valley-free path in bisectional fabric bandwidth use of any such valley-free path in bisectional fabric bandwidth
between two destinations irrespective of their metrics that can be between two destinations irrespective of their metrics that can be
used to balance load on the fabric in different ways. Valley-free used to balance load on the fabric in different ways. Valley-free
routing eliminates the need for any specific micro-loop avoidance routing eliminates the need for any specific micro-loop avoidance
procedures for RIFT. procedures for RIFT.
skipping to change at line 699 skipping to change at line 699
| +-----------+ | | + +---+linkSL7+-+ | + | +-----------+ | | + +---+linkSL7+-+ | +
| | | | | | | | | | | | | | | |
+-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0
+-+-----+ +-+-----+ +-----+-+ +-+-----+ +-+-----+ +-+-----+ +-----+-+ +-+-----+
+ + + + + + + +
Prefix111 Prefix112 Prefix121 Prefix122 Prefix111 Prefix112 Prefix121 Prefix122
Figure 4: Suboptimal Routing Upon Link Failure Use Case Figure 4: Suboptimal Routing Upon Link Failure Use Case
As shown in Figure 4, as the result of the south reflection between As shown in Figure 4, as the result of the south reflection, Spine121
Spine121-Leaf121-Spine122 and Spine121-Leaf122-Spine122, Spine121 and and Spine 122 know each other via Leaf121 or Leaf 122 at level 1.
Spine 122 know each other at level 1.
Without disaggregation mechanisms, the packet from leaf121 to Without disaggregation mechanisms, the packet from leaf121 to
prefix122 will probably go up through linkSL5 to linkTS3 when linkSL6 prefix122 will probably go up through linkSL5 to linkTS3 when linkSL6
fails. Then, the packet will go down through linkTS4 to linkSL8 to fails. Then, the packet will go down through linkTS4 to linkSL8 to
Leaf122 or go up through linkSL5 to linkTS6, then go down through Leaf122 or go up through linkSL5 to linkTS6, then go down through
linkTS8 and linkSL8 to Leaf122 based on the pure default route. This linkTS8 and linkSL8 to Leaf122 based on the pure default route. This
is the case of suboptimal routing or bow tying. is the case of suboptimal routing or bow tying.
With disaggregation mechanisms, Spine122 will detect the failure With disaggregation mechanisms, Spine122 will detect the failure
according to the reflected node S-TIE from Spine121 when linkSL6 according to the reflected node S-TIE from Spine121 when linkSL6
skipping to change at line 788 skipping to change at line 787
unique in the RIFT network and the level of the node in the Fat Tree, unique in the RIFT network and the level of the node in the Fat Tree,
which determines which peers are northward "parents" and which are which determines which peers are northward "parents" and which are
southward "children". southward "children".
ZTP is always on, but its decisions can be overridden when a network ZTP is always on, but its decisions can be overridden when a network
administrator prefers to impose its own configuration. In that case, administrator prefers to impose its own configuration. In that case,
it is the responsibility of the administrator to ensure that the it is the responsibility of the administrator to ensure that the
configured parameters are correct, i.e., ensure that the System ID of configured parameters are correct, i.e., ensure that the System ID of
each node is unique and that the administratively set levels truly each node is unique and that the administratively set levels truly
reflect the relative position of the nodes in the fabric. It is reflect the relative position of the nodes in the fabric. It is
recommended to let ZTP configure the network, and when not, it is recommended to let ZTP configure the network, and when ZTP does not
recommended to configure the level of all the nodes to avoid an configure the network, it is recommended to configure the level of
undesirable interaction between ZTP and the manual configuration. all the nodes to avoid an undesirable interaction between ZTP and the
manual configuration.
ZTP requires that the administrator points out the ToF nodes to set ZTP requires that the administrator points out the ToF nodes to set
the baseline from which the fabric topology is derived. The ToF the baseline from which the fabric topology is derived. The ToF
nodes are configured with the TOP_OF_FABRIC flag, which are initial nodes are configured with the TOP_OF_FABRIC flag, which are initial
'seeds' needed for other ZTP nodes to derive their level in the 'seeds' needed for other ZTP nodes to derive their level in the
topology. ZTP computes the level of each node based on the Highest topology. ZTP computes the level of each node based on the Highest
Available Level (HAL) of the potential parent closest to that Available Level (HAL) of the potential parent closest to that
baseline, which represents the superspine. In a fashion, RIFT can be baseline, which represents the superspine. In a fashion, RIFT can be
seen as a distance-vector protocol that computes a set of feasible seen as a distance-vector protocol that computes a set of feasible
successors towards the superspine and autoconfigures the rest of the successors towards the superspine and autoconfigures the rest of the
skipping to change at line 976 skipping to change at line 976
| | | +--------------------------------+ | | | +--------------------------------+
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
+ + + + + + + +
+-1--2--3--4--+ +-1--2--3--4--+
| Leaf1 | ...... | Leaf1 | ......
+-------------+ +-------------+
Figure 9: Fallen Spine Figure 9: Additional Cabling Constraint Example
RIFT allows implementations to provide programmable plug-ins that can RIFT allows implementations to provide programmable plug-ins that can
adjust ZTP operation or capture information during computation. adjust ZTP operation or capture information during computation.
While defining this is outside the scope of this document, such a While defining this is outside the scope of this document, such a
mechanism could be used to extend the miscabling functionality. mechanism could be used to extend the miscabling functionality.
For other protocols to achieve this, it would require additional For other protocols to achieve this, it would require additional
operational overhead. Consider a fabric that is using unnumbered operational overhead. Consider a fabric that is using unnumbered
OSPF links; it is still very likely that a miscabled link will form OSPF links; it is still very likely that a miscabled link will form
an adjacency. Each attempt to move cables to the correct port may an adjacency. Each attempt to move cables to the correct port may
skipping to change at line 1134 skipping to change at line 1134
way, the multiple routes are equally valid and should be conserved in way, the multiple routes are equally valid and should be conserved in
the case of anycast. Without further information from the the case of anycast. Without further information from the
redistributed routing protocol, it is impossible to sort out a redistributed routing protocol, it is impossible to sort out a
movement from a redistribution that happens asynchronously on movement from a redistribution that happens asynchronously on
different leaves. RIFT [RFC9692] expects that anycast addresses are different leaves. RIFT [RFC9692] expects that anycast addresses are
advertised within the timing precision, which is typically the case advertised within the timing precision, which is typically the case
with a low-precision timing and a multihomed node. Beyond that time with a low-precision timing and a multihomed node. Beyond that time
interval, RIFT interprets the lag as a mobility and only the freshest interval, RIFT interprets the lag as a mobility and only the freshest
route is retained. route is retained.
When using IPv6 [RFC8200], RIFT suggests to leverage [RFC8505] as the When using IPv6 [RFC8200], RIFT suggests leveraging 6LoWPAN ND
IPv6 ND interaction between the mobile node and the leaf. This not [RFC8505] as the IPv6 ND interaction between the mobile node and the
only provides a sequence counter but also a lifetime and a security leaf. This not only provides a sequence counter but also a lifetime
token that may be used to protect the ownership of an address and a security token that may be used to protect the ownership of an
[RFC8928]. When using [RFC8505], the parallel registration of an address [RFC8928]. When using 6LoWPAN ND [RFC8505], the parallel
anycast address to multiple leaves is done with the same sequence registration of an anycast address to multiple leaves is done with
counter, whereas the sequence counter is incremented when the point the same sequence counter, whereas the sequence counter is
of attachment changes. This way, it is possible to differentiate a incremented when the point of attachment changes. This way, it is
mobile node from a multihomed node, even when the mobility happens possible to differentiate a mobile node from a multihomed node, even
within the timing precision. It is also possible for a mobile node when the mobility happens within the timing precision. It is also
to be multihomed as well, e.g., to change only one of its points of possible for a mobile node to be multihomed as well, e.g., to change
attachment. only one of its points of attachment.
5.9. IPv4 over IPv6 5.9. IPv4 over IPv6
RIFT allows advertising IPv4 prefixes over an IPv6 RIFT network. An RIFT allows advertising IPv4 prefixes over an IPv6 RIFT network. An
IPv6 Address Family (AF) configures via the usual ND mechanisms and IPv6 Address Family (AF) configures via the usual ND mechanisms and
then V4 can use V6 next-hops analogous to [RFC8950]. It is expected then V4 can use V6 next-hops analogous to [RFC8950]. It is expected
that the whole fabric supports the same type of forwarding of AFs on that the whole fabric supports the same type of forwarding of AFs on
all the links. RIFT provides an indication whether a node is capable all the links. RIFT provides an indication whether a node is capable
of V4-forwarding and implementations are possible where different of V4-forwarding and implementations are possible where different
routing tables are computed per AF as long as the computation remains routing tables are computed per AF as long as the computation remains
skipping to change at line 1188 skipping to change at line 1188
+---+----+ +---+----+ +---+----+ +---+----+
| V4 | | V4 | | V4 | | V4 |
| subnet | | subnet | | subnet | | subnet |
+--------+ +--------+ +--------+ +--------+
Figure 10: IPv4 over IPv6 Figure 10: IPv4 over IPv6
5.10. In-Band Reachability of Nodes 5.10. In-Band Reachability of Nodes
RIFT doesn't precondition that nodes of the fabric have reachable RIFT doesn't precondition that nodes of the fabric have reachable
addresses, but the operational reasons to reach the internal nodes addresses, but operational reasons to reach the internal nodes may
may exist. Figure 11 shows an example that the network management exist. Figure 11 shows an example that the network management
station (NMS) attaches to Leaf1. station (NMS) attaches to Leaf1.
+-------+ +-------+ +-------+ +-------+
| ToF1 | | ToF2 | | ToF1 | | ToF2 |
++---- ++ ++-----++ ++---- ++ ++-----++
| | | | | | | |
| +----------+ | | +----------+ |
| +--------+ | | | +--------+ | |
| | | | | | | |
++-----++ +--+---++ ++-----++ +--+---++
skipping to change at line 1224 skipping to change at line 1224
If the NMS wants to access Leaf2, it simply works because the If the NMS wants to access Leaf2, it simply works because the
loopback address of Leaf2 is flooded in its Prefix North TIE. loopback address of Leaf2 is flooded in its Prefix North TIE.
If the NMS wants to access Spine2, it also works because a spine node If the NMS wants to access Spine2, it also works because a spine node
always advertises its loopback address in the Prefix North TIE. The always advertises its loopback address in the Prefix North TIE. The
NMS may reach Spine2 from Leaf1-Spine2 or Leaf1-Spine1-ToF1/ NMS may reach Spine2 from Leaf1-Spine2 or Leaf1-Spine1-ToF1/
ToF2-Spine2. ToF2-Spine2.
If the NMS wants to access ToF2, ToF2's loopback address needs to be If the NMS wants to access ToF2, ToF2's loopback address needs to be
injected into its Prefix South TIE. This TIE must be seen by all injected into its Prefix South TIE. This TIE must be seen by all
nodes at the level below -- the spine nodes in Figure 9 -- that must nodes at the level below -- the spine nodes in Figure 11 -- that must
form a ceiling for all the traffic coming from below (south). form a ceiling for all the traffic coming from below (south).
Otherwise, the traffic from the NMS may follow the default route to Otherwise, the traffic from the NMS may follow the default route to
the wrong ToF Node, e.g., ToF1. the wrong ToF Node, e.g., ToF1.
In the case of failure between ToF2 and spine nodes, ToF2's loopback In the case of failure between ToF2 and spine nodes, ToF2's loopback
address must be disaggregated recursively all the way to the leaves. address must be disaggregated recursively all the way to the leaves.
In a partitioned ToF, even with recursive disaggregation, a ToF node In a partitioned ToF, even with recursive disaggregation, a ToF node
is only reachable within its plane. is only reachable within its plane.
A possible alternative to recursive disaggregation is to use a ring A possible alternative to recursive disaggregation is to use a ring
that interconnects the ToF nodes to transmit packets between them for that interconnects the ToF nodes to transmit packets between them for
their loopback addresses only. The idea is that this is mostly their loopback addresses only. The idea is that this is mostly
control traffic and should not alter the load-balancing properties of control traffic and should not alter the load-balancing properties of
the fabric. the fabric.
5.11. Dual-Homing Servers 5.11. Dual-Homing Servers
Each RIFT node may operate in ZTP mode. It has no configuration Each RIFT node may operate in ZTP mode. It has no configuration
(unless it is a ToF at the top of the topology or the must operate in (unless it is a ToF node at the top of the topology or if it must
the topology as leaf and/or support leaf-2-leaf procedures), and it operate in the topology as a leaf and/or support leaf-2-leaf
will fully configure itself after being attached to the topology. procedures), and it will fully configure itself after being attached
to the topology.
+---+ +---+ +---+ +---+ +---+ +---+
|ToF| |ToF| |ToF| ToF |ToF| |ToF| |ToF| ToF
+---+ +---+ +---+ +---+ +---+ +---+
| | | | | | | | | | | |
| +----------------+ | | | +----------------+ | |
| +----------------+ | | +----------------+ |
| | | | | | | | | | | |
+----------+--+ +--+----------+ +----------+--+ +--+----------+
| ToR1 | | ToR2 | Spine | ToR1 | | ToR2 | Spine
skipping to change at line 1270 skipping to change at line 1271
| | | | | +-----------------+ | | | | | | +-----------------+ |
| | | | +--------------+ | | | | | | | +--------------+ | | |
| | | | | | | | | | | | | | | |
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
| | | | | | | | | | | | | | | |
+---+ +---+ ............. +---+ +---+ +---+ +---+ ............. +---+ +---+
SV(1) SV(2) SV(n-1) SV(n) Leaf SV(1) SV(2) SV(n-1) SV(n) Leaf
Figure 12: Dual-Homing Servers Figure 12: Dual-Homing Servers
Sometimes people may prefer to disaggregate from ToR to servers from Sometimes people may prefer to disaggregate from ToR nodes to servers
start on, i.e. the servers have couple tens of routes in FIB from from startup, i.e., the servers have multiple routes in the FIB from
start on beside default routes to avoid breakages at rack level. startup other than default routes to avoid breakages at the rack
Full disaggregation of the fabric could be achieved by configuration level. Full disaggregation of the fabric could be achieved by
supported by RIFT. configuration supported by RIFT.
5.12. Fabric with a Controller 5.12. Fabric with a Controller
There are many different ways to deploy the controller. One There are many different ways to deploy the controller. One
possibility is attaching a controller to the RIFT domain from ToF and possibility is attaching a controller to the RIFT domain from ToF and
another possibility is attaching a controller from the leaf. another possibility is attaching a controller from the leaf.
+------------+ +------------+
| Controller | | Controller |
++----------++ ++----------++
skipping to change at line 1326 skipping to change at line 1327
If the controller is attaching from a leaf to the fabric, no special If the controller is attaching from a leaf to the fabric, no special
provisions are needed. provisions are needed.
5.13. Internet Connectivity Within Underlay 5.13. Internet Connectivity Within Underlay
If global addressing is running without overlay, an external default If global addressing is running without overlay, an external default
route needs to be advertised through the RIFT fabric to achieve route needs to be advertised through the RIFT fabric to achieve
internet connectivity. For the purpose of forwarding of the entire internet connectivity. For the purpose of forwarding of the entire
RIFT fabric, an internal fabric prefix needs to be advertised in the RIFT fabric, an internal fabric prefix needs to be advertised in the
South Prefix TIE by ToF and spine nodes. Prefix South TIE by ToF and spine nodes.
5.13.1. Internet Default on the Leaf 5.13.1. Internet Default on the Leaf
In the case that the internet gateway is a leaf, the leaf node as the In the case that the internet gateway is a leaf, the leaf node as the
internet gateway needs to advertise a default route in its Prefix internet gateway needs to advertise a default route in its Prefix
North TIE. North TIE.
5.13.2. Internet Default on the ToFs 5.13.2. Internet Default on the ToFs
In the case that the internet gateway is a ToF, the ToF and spine In the case that the internet gateway is a ToF, the ToF and spine
skipping to change at line 1674 skipping to change at line 1675
Nanjing Nanjing
210012 210012
China China
Email: zhang.zheng@zte.com.cn Email: zhang.zheng@zte.com.cn
Dmitry Afanasiev Dmitry Afanasiev
Yandex Yandex
Email: fl0w@yandex-team.ru Email: fl0w@yandex-team.ru
Pascal Thubert Pascal Thubert
Cisco Systems, Inc Individual
Building D
45 Allee des Ormes - BP1200
06254 Mougins - Sophia Antipolis
France France
Phone: +33 497 23 26 34 Email: pascal.thubert@gmail.com
Email: pthubert@cisco.com
Tony Przygienda Tony Przygienda
Juniper Networks Juniper Networks
1194 N. Mathilda Ave 1194 N. Mathilda Ave
Sunnyvale, CA 94089 Sunnyvale, CA 94089
United States of America United States of America
Email: prz@juniper.net Email: prz@juniper.net
 End of changes. 26 change blocks. 
72 lines changed or deleted 69 lines changed or added

This html diff was produced by rfcdiff 1.48.