rfc9737v1.txt   rfc9737.txt 
Internet Engineering Task Force (IETF) T. Haynes Internet Engineering Task Force (IETF) T. Haynes
Request for Comments: 9737 T. Myklebust Request for Comments: 9737 T. Myklebust
Category: Standards Track Hammerspace Category: Standards Track Hammerspace
ISSN: 2070-1721 February 2025 ISSN: 2070-1721 February 2025
Reporting of Errors via LAYOUTRETURN in NFSv4.2 Reporting Errors in NFSv4.2 via LAYOUTRETURN
Abstract Abstract
The Parallel Network File System (pNFS) allows for a file's metadata The Parallel Network File System (pNFS) allows for a file's metadata
(MDS) and data (DS) to be on different servers. When the metadata and data to be on different servers (i.e., the metadata server (MDS)
server is restarted, the client can still modify the data file and the data server (DS)). When the MDS is restarted, the client can
component. During the recovery phase of startup, the metadata server still modify the data file component. During the recovery phase of
and the data servers work together to recover state (which files are startup, the MDS and the DSs work together to recover state. If the
open, last modification time, size, etc.). If the client has not client has not encountered errors with the data files, then the state
encountered errors with the data files, then the state can be can be recovered and the resilvering of the data files can be
recovered and the resilvering of the data files can be avoided. With avoided. With any errors, there is no means by which the client can
any errors, there is no means by which the client can report errors report errors to the MDS. As such, the MDS has to assume that a file
to the metadata server. As such, the metadata server has to assume needs resilvering. This document presents an extension to RFC 8435
that a file needs resilvering. This document presents an extension to allow the client to update the metadata via LAYOUTRETURN and avoid
to RFC 8435 to allow the client to update the metadata and avoid the resilvering.
resilvering.
Status of This Memo Status of This Memo
This is an Internet Standards Track document. This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841. Internet Standards is available in Section 2 of RFC 7841.
skipping to change at line 72 skipping to change at line 71
3. Security Considerations 3. Security Considerations
4. IANA Considerations 4. IANA Considerations
5. References 5. References
5.1. Normative References 5.1. Normative References
Acknowledgments Acknowledgments
Authors' Addresses Authors' Addresses
1. Introduction 1. Introduction
In the Network File System version 4 (NFSv4) with a Parallel NFS In the Network File System version 4 (NFSv4) with a Parallel NFS
(pNFS) Flexible File Layout [RFC8435] server, during recovery after a (pNFS) flexible file layout [RFC8435] server, during recovery after a
restart, there is no mechanism for the client to inform the metadata restart, there is no mechanism for the client to inform the metadata
server about an error that occurred during a WRITE operation (see server (MDS) about an error that occurred during a WRITE operation
Section 18.32 of [RFC8881]) to the data servers in the period of the (see Section 18.32 of [RFC8881]) to the data servers (DSs) in the
outage. period of the outage.
Using the process detailed in [RFC8178], the revisions in this Using the process detailed in [RFC8178], the revisions in this
document become an extension of NFSv4.2 [RFC7862]. They are built on document become an extension of NFSv4.2 [RFC7862]. They are built on
top of the External Data Representation (XDR) [RFC4506] generated top of the External Data Representation (XDR) [RFC4506] generated
from [RFC7863]. from [RFC7863].
1.1. Definitions 1.1. Definitions
See Section 1.1 of [RFC8435] for a set of definitions. See Section 1.1 of [RFC8435] for a set of definitions.
1.2. Requirements Language 1.2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
2. Layout State Recovery 2. Layout State Recovery
When a metadata server restarts, clients are provided a grace When an MDS restarts, clients are provided a grace recovery period
recovery period where they are allowed to recover any state that they where they are allowed to recover any state that they had
had established. With open files, the client can send an OPEN established. With open files, the client can send an OPEN operation
operation (see Section 18.16 of [RFC8881]) with a claim type of (see Section 18.16 of [RFC8881]) with a claim type of CLAIM_PREVIOUS
CLAIM_PREVIOUS (see Section 9.11 of [RFC8881]). The client uses the (see Section 9.11 of [RFC8881]). The client uses the
RECLAIM_COMPLETE operation (see Section 18.51 of [RFC8881]) to notify RECLAIM_COMPLETE operation (see Section 18.51 of [RFC8881]) to notify
the metadata server that it is done reclaiming state. the MDS that it is done reclaiming state.
The NFSv4 Flexible File Layout Type allows for the client to mirror The NFSv4 flexible file layout type allows for the client to mirror
files (see Section 8 of [RFC8435]). With client-side mirroring, it files (see Section 8 of [RFC8435]). With client-side mirroring, it
is important for the client to inform the metadata server of any I/O is important for the client to inform the MDS of any I/O errors
errors encountered with one of the mirrors. This is the only way for encountered with one of the mirrors. This is the only way for the
the metadata server to determine if one or more of the mirrors are MDS to determine if one or more of the mirrors are corrupt and then
corrupt and then repair the mirrors via resilvering (see Section 1.1 repair the mirrors via resilvering (see Section 1.1 of [RFC8435]).
of [RFC8435]). The client can use LAYOUTRETURN (see Section 18.44 of The client can use LAYOUTRETURN (see Section 18.44 of [RFC8881]) and
[RFC8881]) and the ff_ioerr4 structure (see Section 9.1.1 of the ff_ioerr4 structure (see Section 9.1.1 of [RFC8435]) to inform
[RFC8435]) to inform the metadata server of I/O errors. the MDS of I/O errors.
A problem arises when the metadata server restarts and the client has A problem arises when the MDS restarts and the client has errors it
errors it needs to report but cannot do so. Section 12.7.4 of needs to report but cannot do so. Section 12.7.4 of [RFC8881]
[RFC8881] requires that the client MUST stop using layouts. While requires that the client MUST stop using layouts. While the intent
the intent there is that the client MUST stop doing I/O to the there is that the client MUST stop doing I/O to the storage devices,
storage devices, it is also true that the layout stateids are no it is also true that the layout stateids are no longer valid. The
longer valid. The LAYOUTRETURN needs a layout stateid to proceed, LAYOUTRETURN needs a layout stateid to proceed, and the client cannot
and the client cannot get a layout during grace recovery (see get a layout during grace recovery (see Section 12.7.4 of [RFC8881])
Section 12.7.4 of [RFC8881]) to recover layout state. As such, to recover layout state. As such, clients have no choice but to not
clients have no choice but to not recover files with I/O errors. In recover files with I/O errors. In turn, the MDS MUST assume that the
turn, the metadata server MUST assume that the mirrors are mirrors are inconsistent and pick one for resilvering. It is a MUST
inconsistent and pick one for resilvering. It is a MUST because even because even if the MDS can determine that the client did modify data
if the metadata server can determine that the client did modify data
during the outage, it MUST NOT assume those modifications were during the outage, it MUST NOT assume those modifications were
consistent. consistent.
To fix this issue, the metadata server MUST accept the anonymous To fix this issue, the MDS MUST accept the anonymous stateid of all
stateid of all zeros (see Section 8.2.3 of [RFC8881]) for the zeros (see Section 8.2.3 of [RFC8881]) for the lrf_stateid in
lrf_stateid in LAYOUTRETURN (see Section 18.44.1 of [RFC8881]). The LAYOUTRETURN (see Section 18.44.1 of [RFC8881]). The client can use
client can use this anonymous stateid to inform the metadata server this anonymous stateid to inform the MDS of errors encountered. The
of errors encountered. The metadata server can then accurately MDS can then accurately resilver the file by picking the mirror(s)
resilver the file by picking the mirror(s) that does not have any that does not have any associated errors.
associated errors.
During the grace period, if the client sends an lrf_stateid in the During the grace period, if the client sends an lrf_stateid in the
LAYOUTRETURN with any value other than the anonymous stateid of all LAYOUTRETURN with any value other than the anonymous stateid of all
zeros, then the metadata server MUST respond with an error of zeros, then the MDS MUST respond with an error of NFS4ERR_GRACE (see
NFS4ERR_GRACE (see Section 15.1.9.2 of [RFC8881]). After the grace Section 15.1.9.2 of [RFC8881]). After the grace period, if the
period, if the client sends an lrf_stateid in the LAYOUTRETURN with a client sends an lrf_stateid in the LAYOUTRETURN with a value of the
value of the anonymous stateid of all zeros, then the metadata server anonymous stateid of all zeros, then the MDS MUST respond with an
MUST respond with an error of NFS4ERR_NO_GRACE (see Section 15.1.9.3 error of NFS4ERR_NO_GRACE (see Section 15.1.9.3 of [RFC8881]).
of [RFC8881]).
Also, when the metadata server builds the reply to the LAYOUTRETURN Also, when the MDS builds the reply to the LAYOUTRETURN with an
when an lrf_stateid with the value of the anonymous stateid of all lrf_stateid with the value of the anonymous stateid of all zeros, it
zeros it MUST NOT bump the seqid of the lorr_stateid. MUST NOT bump the seqid of the lorr_stateid.
If the metadata server detects that the layout being returned in the If the MDS detects that the layout being returned in the LAYOUTRETURN
LAYOUTRETURN does not match the current mirror instances found for does not match the current mirror instances found for the file, then
the file, then it MUST ignore the LAYOUTRETURN and resilver the file it MUST ignore the LAYOUTRETURN and resilver the file in question.
in question.
The metadata server MUST resilver any files that are neither The MDS MUST resilver any files that are neither explicitly recovered
explicitly recovered with a CLAIM_PREVIOUS nor have a reported error with a CLAIM_PREVIOUS nor have a reported error via a LAYOUTRETURN.
via a LAYOUTRETURN. The client has most likely restarted and lost The client has most likely restarted and lost any state.
any state.
2.1. When to Resilver 2.1. When to Resilver
A write intent occurs when a client opens a file and gets a A write intent occurs when a client opens a file and gets a
LAYOUTIOMODE4_RW from the metadata server. The metadata server MUST LAYOUTIOMODE4_RW from the MDS. The MDS MUST track outstanding write
track outstanding write intents, and when it restarts, it MUST track intents, and when it restarts, it MUST track recovery of those write
recovery of those write intents. The method that the metadata server intents. The method that the MDS uses to track write intents is
uses to track write intents is implementation specific, i.e., outside implementation specific, i.e., outside the scope of this document.
the scope of this document.
The decision to resilver a file depends on how the client recovers The decision to resilver a file depends on how the client recovers
the file before the grace period ends. If the client reclaims the the file before the grace period ends. If the client reclaims the
file and reports no errors, the metadata server MUST NOT resilver the file and reports no errors, the MDS MUST NOT resilver the file. If
file. If the client reports an error on the file, then the file MUST the client reports an error on the file, then the file MUST be
be resilvered. If the client does not reclaim or report an error resilvered. If the client does not reclaim or report an error before
before the grace period ends, then under the old behavior, the the grace period ends, then under the old behavior, the MDS MUST
metadata server MUST resilver the file. resilver the file.
The resilvering process is broadly to: The resilvering process is broadly to:
1. fence the file (see Section 2.2 of [RFC8435]), 1. fence the file (see Section 2.2 of [RFC8435]),
2. record the need to resilver, 2. record the need to resilver,
3. release the write intent, and 3. release the write intent, and
4. once there are no write intents on the file, start the 4. once there are no write intents on the file, start the
resilvering process. resilvering process.
The metadata server MUST NOT resilver a file if there are clients The MDS MUST NOT resilver a file if there are clients with
with outstanding write intents, i.e., multiple clients might have the outstanding write intents, i.e., multiple clients might have the file
file open with write intents. As the metadata server MUST track open with write intents. As the MDS MUST track write intents, it
write intents, it MUST also track the need to resilver, i.e., if the MUST also track the need to resilver, i.e., if the MDS restarts
metadata server restarts during the grace period, it MUST restart the during the grace period, it MUST restart the file recovery if it
file recovery if it replays the write intent, or else it MUST start replays the write intent, or else it MUST start the resilvering if it
the resilvering if it replays the resilvering intent. replays the resilvering intent.
Whether the metadata server prevents all I/O to the file until the Whether the MDS prevents all I/O to the file until the resilvering is
resilvering is done, forces all I/O to go through the metadata done, forces all I/O to go through the MDS, or allows a proxy server
server, or allows a proxy server to update the new data file as it is to update the new data file as it is being resilvered is all an
being resilvered is all an implementation choice. The constraint is implementation choice. The constraint is that the MDS is responsible
that the metadata server is responsible for the reconstruction of the for the reconstruction of the data file and for the consistency of
data file and for the consistency of the mirrors. the mirrors.
If the metadata server does allow the client access to the file If the MDS does allow the client access to the file during the
during the resilvering, then the client MUST have the same layout resilvering, then the client MUST have the same layout (set of mirror
(set of mirror instances) after the metadata server as before. One instances) after the MDS as before. One way that such a resilvering
way that such a resilvering can occur is for a proxy server to be can occur is for a proxy server to be inserted into the layout. That
inserted into the layout. That server will be copying a good mirror server will be copying a good mirror instance to a new instance. As
instance to a new instance. As it gets I/O via the layout, it will it gets I/O via the layout, it will be responsible for updating the
be responsible for updating the copy it is performing. This copy it is performing. This requirement is that the proxy server
requirement is that the proxy server MUST stay in the layout until MUST stay in the layout until the grace period is finished.
the grace period is finished.
2.2. Version Mismatch Considerations 2.2. Version Mismatch Considerations
The metadata server has no expectations for the client to use this The MDS has no expectations for the client to use this new
new functionality. Therefore, if the client does not use it, the functionality. Therefore, if the client does not use it, the MDS
metadata server will function normally. will function normally.
If the client does use the new functionality and the metadata server If the client does use the new functionality and the MDS does not
does not support it, then the metadata server MUST reply with a support it, then the MDS MUST reply with a NFS4ERR_BAD_STATEID to the
NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a LAYOUTRETURN. If the client detects a NFS4ERR_BAD_STATEID error in
NFS4ERR_BAD_STATEID error in this scenario, it should fall back to this scenario, it should fall back to the old behavior of not
the old behavior of not reporting errors. reporting errors.
3. Security Considerations 3. Security Considerations
There are no new security considerations beyond those in [RFC7862]. There are no new security considerations beyond those in [RFC7862].
4. IANA Considerations 4. IANA Considerations
This document has no IANA actions. This document has no IANA actions.
5. References 5. References
 End of changes. 21 change blocks. 
107 lines changed or deleted 99 lines changed or added

This html diff was produced by rfcdiff 1.48.