rfc9737.original   rfc9737.txt 
Network File System Version 4 T. Haynes Internet Engineering Task Force (IETF) T. Haynes
Internet-Draft T. Myklebust Request for Comments: 9737 T. Myklebust
Intended status: Standards Track Hammerspace Category: Standards Track Hammerspace
Expires: 25 May 2025 21 November 2024 ISSN: 2070-1721 February 2025
Reporting of Errors via LAYOUTRETURN in NFSv4.2 Reporting of Errors via LAYOUTRETURN in NFSv4.2
draft-ietf-nfsv4-layrec-04
Abstract Abstract
The Parallel Network File System (pNFS) allows for a file's metadata The Parallel Network File System (pNFS) allows for a file's metadata
(MDS) and data (DS) to be on different servers. When the metadata (MDS) and data (DS) to be on different servers. When the metadata
server is restarted, the client can still modify the data file server is restarted, the client can still modify the data file
component. During the recovery phase of startup, the metadata server component. During the recovery phase of startup, the metadata server
and the data servers work together to recover state (which files are and the data servers work together to recover state (which files are
open, last modification time, size, etc.). If the client has not open, last modification time, size, etc.). If the client has not
encountered errors with the data files, then the state can be encountered errors with the data files, then the state can be
recovered, avoiding resilvering of the data files. With any errors, recovered and the resilvering of the data files can be avoided. With
there is no means by which the client can report errors to the any errors, there is no means by which the client can report errors
metadata server. As such, the metadata server has to assume that to the metadata server. As such, the metadata server has to assume
file needs resilvering. This document presents an extension to that a file needs resilvering. This document presents an extension
RFC8435 to allow the client to update the metadata and avoid the to RFC 8435 to allow the client to update the metadata and avoid
resilvering. resilvering.
Note
This note is to be removed before publishing as an RFC.
Discussion of this draft takes place on the NFSv4 working group
mailing list (nfsv4@ietf.org), which is archived at
https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group
information can be found at https://datatracker.ietf.org/wg/nfsv4/
about/.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on 25 May 2025. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9737.
Copyright Notice Copyright Notice
Copyright (c) 2024 IETF Trust and the persons identified as the Copyright (c) 2025 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Definitions
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.2. Requirements Language
2. Layout State Recovery . . . . . . . . . . . . . . . . . . . . 3 2. Layout State Recovery
2.1. When to Resilver . . . . . . . . . . . . . . . . . . . . 4 2.1. When to Resilver
2.2. Version Mismatch Considerations . . . . . . . . . . . . . 5 2.2. Version Mismatch Considerations
3. Security Considerations . . . . . . . . . . . . . . . . . . . 6 3. Security Considerations
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 4. IANA Considerations
5. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 5. References
5.1. Normative References . . . . . . . . . . . . . . . . . . 6 5.1. Normative References
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 7 Acknowledgments
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 Authors' Addresses
1. Introduction 1. Introduction
In the Network File System version4 (NFSv4) with a Parallel NFS In the Network File System version 4 (NFSv4) with a Parallel NFS
(pNFS) Flexible File Layout ([RFC8435]) server, during recovery after (pNFS) Flexible File Layout [RFC8435] server, during recovery after a
a restart, there is no mechanism for the client to inform the restart, there is no mechanism for the client to inform the metadata
metadata server about an error which occurred during a WRITE (see server about an error that occurred during a WRITE operation (see
Section 18.32 of [RFC8881]) operation to the data servers in the Section 18.32 of [RFC8881]) to the data servers in the period of the
period of the outage. outage.
Using the process detailed in [RFC8178], the revisions in this Using the process detailed in [RFC8178], the revisions in this
document become an extension of NFSv4.2 [RFC7862]. They are built on document become an extension of NFSv4.2 [RFC7862]. They are built on
top of the external data representation (XDR) [RFC4506] generated top of the External Data Representation (XDR) [RFC4506] generated
from [RFC7863]. from [RFC7863].
1.1. Definitions 1.1. Definitions
See Section 1.1 of [RFC8435] for a set of definitions. See Section 1.1 of [RFC8435] for a set of definitions.
1.2. Requirements Language 1.2. Requirements Language
The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
'OPTIONAL' in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
2. Layout State Recovery 2. Layout State Recovery
When a metadata server restarts, clients are provided a grace When a metadata server restarts, clients are provided a grace
recovery period where they are allowed to recover any state that they recovery period where they are allowed to recover any state that they
had established. With open files, the client can send an OPEN (see had established. With open files, the client can send an OPEN
Section 18.16 of [RFC8881]) operation with a claim type of operation (see Section 18.16 of [RFC8881]) with a claim type of
CLAIM_PREVIOUS (see Section 9.11 of [RFC8881]). The client uses the CLAIM_PREVIOUS (see Section 9.11 of [RFC8881]). The client uses the
RECLAIM_COMPLETE (see Section 18.51 of [RFC8881]) operation to notify RECLAIM_COMPLETE operation (see Section 18.51 of [RFC8881]) to notify
the metadata server that it is done reclaiming state. the metadata server that it is done reclaiming state.
The NFSv4 Flexible File Layout Type allows for the client to mirror The NFSv4 Flexible File Layout Type allows for the client to mirror
files (see Section 8 of [RFC8435]). With client side mirroring, it files (see Section 8 of [RFC8435]). With client-side mirroring, it
is important for the client to inform the metadata server of any I/O is important for the client to inform the metadata server of any I/O
errors encountered with one of the mirrors. This is the only way for errors encountered with one of the mirrors. This is the only way for
the metadata server to determine one or more of the mirrors is the metadata server to determine if one or more of the mirrors are
corrupt and then repair the mirrors via resilvering (see Section 1.1 corrupt and then repair the mirrors via resilvering (see Section 1.1
of [RFC8435]). The client can use LAYOUTRETURN (see Section 18.44 of of [RFC8435]). The client can use LAYOUTRETURN (see Section 18.44 of
[RFC8881]) and the ff_ioerr4 (see Section 9.1.1 of [RFC8435]) [RFC8881]) and the ff_ioerr4 structure (see Section 9.1.1 of
structure to inform the metadata server of I/O errors. [RFC8435]) to inform the metadata server of I/O errors.
A problem is that when the metadata server restarts and the client A problem arises when the metadata server restarts and the client has
has errors it needs to report, it can not do so. Section 12.7.4 of errors it needs to report but cannot do so. Section 12.7.4 of
[RFC8881] requires that the client MUST stop using layouts. While [RFC8881] requires that the client MUST stop using layouts. While
the intent there is that the client MUST stop doing I/O to the the intent there is that the client MUST stop doing I/O to the
storage devices, it is also true that the layout stateids are no storage devices, it is also true that the layout stateids are no
longer valid. The LAYOUTRETURN needs a layout stateid to proceed and longer valid. The LAYOUTRETURN needs a layout stateid to proceed,
the client can not get a layout during grace recovery (see and the client cannot get a layout during grace recovery (see
Section 12.7.4 of [RFC8881]) to recover layout state. As such, Section 12.7.4 of [RFC8881]) to recover layout state. As such,
clients have no choice but to not recover files with I/O errors. In clients have no choice but to not recover files with I/O errors. In
turn, the metadata server MUST assume that the mirrors are turn, the metadata server MUST assume that the mirrors are
inconsistent and pick one for resilvering. It is a MUST because even inconsistent and pick one for resilvering. It is a MUST because even
if the metadata server can determine that the client did modify data if the metadata server can determine that the client did modify data
during the outage, it MUST NOT assume those modifications were during the outage, it MUST NOT assume those modifications were
consistent. consistent.
To fix this issue, the metadata server MUST accept for the To fix this issue, the metadata server MUST accept the anonymous
lrf_stateid in LAYOUTRETURN (see Section 18.44.1 of [RFC8881]) the stateid of all zeros (see Section 8.2.3 of [RFC8881]) for the
anonymous stateid of all zeros (see Section 8.2.3 of [RFC8881]). The lrf_stateid in LAYOUTRETURN (see Section 18.44.1 of [RFC8881]). The
client can use this anonymous stateid to inform the metadata server client can use this anonymous stateid to inform the metadata server
of errors encountered. The metadata server can then accurately of errors encountered. The metadata server can then accurately
resilver the file by picking the mirror(s) that do not have any resilver the file by picking the mirror(s) that does not have any
associated errors. associated errors.
During the grace period, if the client sends a lrf_stateid in the During the grace period, if the client sends an lrf_stateid in the
LAYOUTRETURN with any value other than the anonymous stateid of all LAYOUTRETURN with any value other than the anonymous stateid of all
zeros, then the metadata server MUST now respond with an error of zeros, then the metadata server MUST respond with an error of
NFS4ERR_GRACE (see Section of 15.1.9.2 [RFC8881]). After the grace NFS4ERR_GRACE (see Section 15.1.9.2 of [RFC8881]). After the grace
period, if the client sends a lrf_stateid in the LAYOUTRETURN with a period, if the client sends an lrf_stateid in the LAYOUTRETURN with a
value of the anonymous stateid of all zeros, then the metadata server value of the anonymous stateid of all zeros, then the metadata server
MUST now respond with an error of NFS4ERR_NO_GRACE (see MUST respond with an error of NFS4ERR_NO_GRACE (see Section 15.1.9.3
Section 15.1.9.3 of [RFC8881]). of [RFC8881]).
Also, when the metadata server builds the reply to the LAYOUTRETURN Also, when the metadata server builds the reply to the LAYOUTRETURN
when a lrf_stateid with the value of the anonymous stateid of all when an lrf_stateid with the value of the anonymous stateid of all
zeros it MUST NOT bump the seqid of the lorr_stateid. zeros it MUST NOT bump the seqid of the lorr_stateid.
If the metadata server detects that the layout being returned in the If the metadata server detects that the layout being returned in the
LAYOUTRETURN does not match the current mirror instances found for LAYOUTRETURN does not match the current mirror instances found for
the file, then it MUST ignore the LAYOUTRETURN and resilver the file the file, then it MUST ignore the LAYOUTRETURN and resilver the file
in question. in question.
The metadata server MUST resilver any files which are neither The metadata server MUST resilver any files that are neither
explicitly recovered with a CLAIM_PREVIOUS nor have a reported error explicitly recovered with a CLAIM_PREVIOUS nor have a reported error
via a LAYOUTRETURN. The client has most likely restarted and lost via a LAYOUTRETURN. The client has most likely restarted and lost
any state. any state.
2.1. When to Resilver 2.1. When to Resilver
A write intent occurs when a client opens a file and gets a A write intent occurs when a client opens a file and gets a
LAYOUTIOMODE4_RW from the metadata server. The metadata server MUST LAYOUTIOMODE4_RW from the metadata server. The metadata server MUST
track outstanding write intents and when it restarts, it MUST track track outstanding write intents, and when it restarts, it MUST track
recovery of those write intents. The method that the metadata server recovery of those write intents. The method that the metadata server
uses to track write intents is implementation specific, i.e., outside uses to track write intents is implementation specific, i.e., outside
of the scope of this document. the scope of this document.
The decision to resilver a file depends on how the client recovers The decision to resilver a file depends on how the client recovers
the file before the grace period ends. If the client reclaims the the file before the grace period ends. If the client reclaims the
file and reports no errors, the metadata server MUST NOT resilver the file and reports no errors, the metadata server MUST NOT resilver the
file. If the client reports an error on the file, then the file MUST file. If the client reports an error on the file, then the file MUST
be resilvered. If the client does not reclaim or report an error be resilvered. If the client does not reclaim or report an error
before the grace period ends, then under the old behavior, the before the grace period ends, then under the old behavior, the
metadata server MUST resilver the file. metadata server MUST resilver the file.
The resilvering process is broadly to: The resilvering process is broadly to:
skipping to change at page 5, line 20 skipping to change at line 190
1. fence the file (see Section 2.2 of [RFC8435]), 1. fence the file (see Section 2.2 of [RFC8435]),
2. record the need to resilver, 2. record the need to resilver,
3. release the write intent, and 3. release the write intent, and
4. once there are no write intents on the file, start the 4. once there are no write intents on the file, start the
resilvering process. resilvering process.
The metadata server MUST NOT resilver a file if there are clients The metadata server MUST NOT resilver a file if there are clients
with outstanding write intents. I.e., multiple clients might have with outstanding write intents, i.e., multiple clients might have the
the file open with write intents. As it MUST track write intents, it file open with write intents. As the metadata server MUST track
MUST also track the need to resilver. I.e., if the metadata server write intents, it MUST also track the need to resilver, i.e., if the
restarts during the grace period, it MUST restart the file recovery metadata server restarts during the grace period, it MUST restart the
if it replays the write intent else it MUST start the resilvering if file recovery if it replays the write intent, or else it MUST start
it replays the resilvering intent. the resilvering if it replays the resilvering intent.
Whether the metadata server prevents all I/O to the file until the Whether the metadata server prevents all I/O to the file until the
resilvering is done or forces all I/O to go through the metadata resilvering is done, forces all I/O to go through the metadata
server or allows a proxy server to update the new data file as it is server, or allows a proxy server to update the new data file as it is
being reslivered is all an implementation choice. The constraint is being resilvered is all an implementation choice. The constraint is
that the metadata server is responsible for the reconstruction of the that the metadata server is responsible for the reconstruction of the
data file and for the consistency of the mirrors. data file and for the consistency of the mirrors.
If the metadata server does allow the client access to the file If the metadata server does allow the client access to the file
during the resilvering, then the client MUST have the same layout during the resilvering, then the client MUST have the same layout
(set of mirror instances) after the metadata server as before. One (set of mirror instances) after the metadata server as before. One
way that such a resilvering can occur is for a proxy server to be way that such a resilvering can occur is for a proxy server to be
inserted into the layout. That server will be copying a good mirror inserted into the layout. That server will be copying a good mirror
instance to a new instance. As it gets I/O via the layout, it will instance to a new instance. As it gets I/O via the layout, it will
be responsible for updating the copy it is performing. This be responsible for updating the copy it is performing. This
skipping to change at page 6, line 17 skipping to change at line 232
NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a
NFS4ERR_BAD_STATEID error in this scenario, it should fall back to NFS4ERR_BAD_STATEID error in this scenario, it should fall back to
the old behavior of not reporting errors. the old behavior of not reporting errors.
3. Security Considerations 3. Security Considerations
There are no new security considerations beyond those in [RFC7862]. There are no new security considerations beyond those in [RFC7862].
4. IANA Considerations 4. IANA Considerations
There are no IANA considerations for this document. This document has no IANA actions.
5. References 5. References
5.1. Normative References 5.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
skipping to change at page 7, line 10 skipping to change at line 273
[RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible
File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018,
<https://www.rfc-editor.org/info/rfc8435>. <https://www.rfc-editor.org/info/rfc8435>.
[RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS)
Version 4 Minor Version 1 Protocol", RFC 8881, Version 4 Minor Version 1 Protocol", RFC 8881,
DOI 10.17487/RFC8881, August 2020, DOI 10.17487/RFC8881, August 2020,
<https://www.rfc-editor.org/info/rfc8881>. <https://www.rfc-editor.org/info/rfc8881>.
Appendix A. Acknowledgments Acknowledgments
Tigran Mkrtchyan, Jeff Layton, and Rick Macklem provided reviews of Tigran Mkrtchyan, Jeff Layton, and Rick Macklem provided reviews of
the document. the document.
Authors' Addresses Authors' Addresses
Thomas Haynes Thomas Haynes
Hammerspace Hammerspace
Email: loghyr@gmail.com Email: loghyr@gmail.com
 End of changes. 33 change blocks. 
99 lines changed or deleted 86 lines changed or added

This html diff was produced by rfcdiff 1.48.