rfc9737.original | rfc9737.txt | |||
---|---|---|---|---|
Network File System Version 4 T. Haynes | Internet Engineering Task Force (IETF) T. Haynes | |||
Internet-Draft T. Myklebust | Request for Comments: 9737 T. Myklebust | |||
Intended status: Standards Track Hammerspace | Category: Standards Track Hammerspace | |||
Expires: 25 May 2025 21 November 2024 | ISSN: 2070-1721 February 2025 | |||
Reporting of Errors via LAYOUTRETURN in NFSv4.2 | Reporting of Errors via LAYOUTRETURN in NFSv4.2 | |||
draft-ietf-nfsv4-layrec-04 | ||||
Abstract | Abstract | |||
The Parallel Network File System (pNFS) allows for a file's metadata | The Parallel Network File System (pNFS) allows for a file's metadata | |||
(MDS) and data (DS) to be on different servers. When the metadata | (MDS) and data (DS) to be on different servers. When the metadata | |||
server is restarted, the client can still modify the data file | server is restarted, the client can still modify the data file | |||
component. During the recovery phase of startup, the metadata server | component. During the recovery phase of startup, the metadata server | |||
and the data servers work together to recover state (which files are | and the data servers work together to recover state (which files are | |||
open, last modification time, size, etc.). If the client has not | open, last modification time, size, etc.). If the client has not | |||
encountered errors with the data files, then the state can be | encountered errors with the data files, then the state can be | |||
recovered, avoiding resilvering of the data files. With any errors, | recovered and the resilvering of the data files can be avoided. With | |||
there is no means by which the client can report errors to the | any errors, there is no means by which the client can report errors | |||
metadata server. As such, the metadata server has to assume that | to the metadata server. As such, the metadata server has to assume | |||
file needs resilvering. This document presents an extension to | that a file needs resilvering. This document presents an extension | |||
RFC8435 to allow the client to update the metadata and avoid the | to RFC 8435 to allow the client to update the metadata and avoid | |||
resilvering. | resilvering. | |||
Note | ||||
This note is to be removed before publishing as an RFC. | ||||
Discussion of this draft takes place on the NFSv4 working group | ||||
mailing list (nfsv4@ietf.org), which is archived at | ||||
https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group | ||||
information can be found at https://datatracker.ietf.org/wg/nfsv4/ | ||||
about/. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 25 May 2025. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9737. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
1.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1. Definitions | |||
1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.2. Requirements Language | |||
2. Layout State Recovery . . . . . . . . . . . . . . . . . . . . 3 | 2. Layout State Recovery | |||
2.1. When to Resilver . . . . . . . . . . . . . . . . . . . . 4 | 2.1. When to Resilver | |||
2.2. Version Mismatch Considerations . . . . . . . . . . . . . 5 | 2.2. Version Mismatch Considerations | |||
3. Security Considerations . . . . . . . . . . . . . . . . . . . 6 | 3. Security Considerations | |||
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 | 4. IANA Considerations | |||
5. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 5. References | |||
5.1. Normative References . . . . . . . . . . . . . . . . . . 6 | 5.1. Normative References | |||
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 7 | Acknowledgments | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
In the Network File System version4 (NFSv4) with a Parallel NFS | In the Network File System version 4 (NFSv4) with a Parallel NFS | |||
(pNFS) Flexible File Layout ([RFC8435]) server, during recovery after | (pNFS) Flexible File Layout [RFC8435] server, during recovery after a | |||
a restart, there is no mechanism for the client to inform the | restart, there is no mechanism for the client to inform the metadata | |||
metadata server about an error which occurred during a WRITE (see | server about an error that occurred during a WRITE operation (see | |||
Section 18.32 of [RFC8881]) operation to the data servers in the | Section 18.32 of [RFC8881]) to the data servers in the period of the | |||
period of the outage. | outage. | |||
Using the process detailed in [RFC8178], the revisions in this | Using the process detailed in [RFC8178], the revisions in this | |||
document become an extension of NFSv4.2 [RFC7862]. They are built on | document become an extension of NFSv4.2 [RFC7862]. They are built on | |||
top of the external data representation (XDR) [RFC4506] generated | top of the External Data Representation (XDR) [RFC4506] generated | |||
from [RFC7863]. | from [RFC7863]. | |||
1.1. Definitions | 1.1. Definitions | |||
See Section 1.1 of [RFC8435] for a set of definitions. | See Section 1.1 of [RFC8435] for a set of definitions. | |||
1.2. Requirements Language | 1.2. Requirements Language | |||
The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 'MAY', and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
'OPTIONAL' in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
2. Layout State Recovery | 2. Layout State Recovery | |||
When a metadata server restarts, clients are provided a grace | When a metadata server restarts, clients are provided a grace | |||
recovery period where they are allowed to recover any state that they | recovery period where they are allowed to recover any state that they | |||
had established. With open files, the client can send an OPEN (see | had established. With open files, the client can send an OPEN | |||
Section 18.16 of [RFC8881]) operation with a claim type of | operation (see Section 18.16 of [RFC8881]) with a claim type of | |||
CLAIM_PREVIOUS (see Section 9.11 of [RFC8881]). The client uses the | CLAIM_PREVIOUS (see Section 9.11 of [RFC8881]). The client uses the | |||
RECLAIM_COMPLETE (see Section 18.51 of [RFC8881]) operation to notify | RECLAIM_COMPLETE operation (see Section 18.51 of [RFC8881]) to notify | |||
the metadata server that it is done reclaiming state. | the metadata server that it is done reclaiming state. | |||
The NFSv4 Flexible File Layout Type allows for the client to mirror | The NFSv4 Flexible File Layout Type allows for the client to mirror | |||
files (see Section 8 of [RFC8435]). With client side mirroring, it | files (see Section 8 of [RFC8435]). With client-side mirroring, it | |||
is important for the client to inform the metadata server of any I/O | is important for the client to inform the metadata server of any I/O | |||
errors encountered with one of the mirrors. This is the only way for | errors encountered with one of the mirrors. This is the only way for | |||
the metadata server to determine one or more of the mirrors is | the metadata server to determine if one or more of the mirrors are | |||
corrupt and then repair the mirrors via resilvering (see Section 1.1 | corrupt and then repair the mirrors via resilvering (see Section 1.1 | |||
of [RFC8435]). The client can use LAYOUTRETURN (see Section 18.44 of | of [RFC8435]). The client can use LAYOUTRETURN (see Section 18.44 of | |||
[RFC8881]) and the ff_ioerr4 (see Section 9.1.1 of [RFC8435]) | [RFC8881]) and the ff_ioerr4 structure (see Section 9.1.1 of | |||
structure to inform the metadata server of I/O errors. | [RFC8435]) to inform the metadata server of I/O errors. | |||
A problem is that when the metadata server restarts and the client | A problem arises when the metadata server restarts and the client has | |||
has errors it needs to report, it can not do so. Section 12.7.4 of | errors it needs to report but cannot do so. Section 12.7.4 of | |||
[RFC8881] requires that the client MUST stop using layouts. While | [RFC8881] requires that the client MUST stop using layouts. While | |||
the intent there is that the client MUST stop doing I/O to the | the intent there is that the client MUST stop doing I/O to the | |||
storage devices, it is also true that the layout stateids are no | storage devices, it is also true that the layout stateids are no | |||
longer valid. The LAYOUTRETURN needs a layout stateid to proceed and | longer valid. The LAYOUTRETURN needs a layout stateid to proceed, | |||
the client can not get a layout during grace recovery (see | and the client cannot get a layout during grace recovery (see | |||
Section 12.7.4 of [RFC8881]) to recover layout state. As such, | Section 12.7.4 of [RFC8881]) to recover layout state. As such, | |||
clients have no choice but to not recover files with I/O errors. In | clients have no choice but to not recover files with I/O errors. In | |||
turn, the metadata server MUST assume that the mirrors are | turn, the metadata server MUST assume that the mirrors are | |||
inconsistent and pick one for resilvering. It is a MUST because even | inconsistent and pick one for resilvering. It is a MUST because even | |||
if the metadata server can determine that the client did modify data | if the metadata server can determine that the client did modify data | |||
during the outage, it MUST NOT assume those modifications were | during the outage, it MUST NOT assume those modifications were | |||
consistent. | consistent. | |||
To fix this issue, the metadata server MUST accept for the | To fix this issue, the metadata server MUST accept the anonymous | |||
lrf_stateid in LAYOUTRETURN (see Section 18.44.1 of [RFC8881]) the | stateid of all zeros (see Section 8.2.3 of [RFC8881]) for the | |||
anonymous stateid of all zeros (see Section 8.2.3 of [RFC8881]). The | lrf_stateid in LAYOUTRETURN (see Section 18.44.1 of [RFC8881]). The | |||
client can use this anonymous stateid to inform the metadata server | client can use this anonymous stateid to inform the metadata server | |||
of errors encountered. The metadata server can then accurately | of errors encountered. The metadata server can then accurately | |||
resilver the file by picking the mirror(s) that do not have any | resilver the file by picking the mirror(s) that does not have any | |||
associated errors. | associated errors. | |||
During the grace period, if the client sends a lrf_stateid in the | During the grace period, if the client sends an lrf_stateid in the | |||
LAYOUTRETURN with any value other than the anonymous stateid of all | LAYOUTRETURN with any value other than the anonymous stateid of all | |||
zeros, then the metadata server MUST now respond with an error of | zeros, then the metadata server MUST respond with an error of | |||
NFS4ERR_GRACE (see Section of 15.1.9.2 [RFC8881]). After the grace | NFS4ERR_GRACE (see Section 15.1.9.2 of [RFC8881]). After the grace | |||
period, if the client sends a lrf_stateid in the LAYOUTRETURN with a | period, if the client sends an lrf_stateid in the LAYOUTRETURN with a | |||
value of the anonymous stateid of all zeros, then the metadata server | value of the anonymous stateid of all zeros, then the metadata server | |||
MUST now respond with an error of NFS4ERR_NO_GRACE (see | MUST respond with an error of NFS4ERR_NO_GRACE (see Section 15.1.9.3 | |||
Section 15.1.9.3 of [RFC8881]). | of [RFC8881]). | |||
Also, when the metadata server builds the reply to the LAYOUTRETURN | Also, when the metadata server builds the reply to the LAYOUTRETURN | |||
when a lrf_stateid with the value of the anonymous stateid of all | when an lrf_stateid with the value of the anonymous stateid of all | |||
zeros it MUST NOT bump the seqid of the lorr_stateid. | zeros it MUST NOT bump the seqid of the lorr_stateid. | |||
If the metadata server detects that the layout being returned in the | If the metadata server detects that the layout being returned in the | |||
LAYOUTRETURN does not match the current mirror instances found for | LAYOUTRETURN does not match the current mirror instances found for | |||
the file, then it MUST ignore the LAYOUTRETURN and resilver the file | the file, then it MUST ignore the LAYOUTRETURN and resilver the file | |||
in question. | in question. | |||
The metadata server MUST resilver any files which are neither | The metadata server MUST resilver any files that are neither | |||
explicitly recovered with a CLAIM_PREVIOUS nor have a reported error | explicitly recovered with a CLAIM_PREVIOUS nor have a reported error | |||
via a LAYOUTRETURN. The client has most likely restarted and lost | via a LAYOUTRETURN. The client has most likely restarted and lost | |||
any state. | any state. | |||
2.1. When to Resilver | 2.1. When to Resilver | |||
A write intent occurs when a client opens a file and gets a | A write intent occurs when a client opens a file and gets a | |||
LAYOUTIOMODE4_RW from the metadata server. The metadata server MUST | LAYOUTIOMODE4_RW from the metadata server. The metadata server MUST | |||
track outstanding write intents and when it restarts, it MUST track | track outstanding write intents, and when it restarts, it MUST track | |||
recovery of those write intents. The method that the metadata server | recovery of those write intents. The method that the metadata server | |||
uses to track write intents is implementation specific, i.e., outside | uses to track write intents is implementation specific, i.e., outside | |||
of the scope of this document. | the scope of this document. | |||
The decision to resilver a file depends on how the client recovers | The decision to resilver a file depends on how the client recovers | |||
the file before the grace period ends. If the client reclaims the | the file before the grace period ends. If the client reclaims the | |||
file and reports no errors, the metadata server MUST NOT resilver the | file and reports no errors, the metadata server MUST NOT resilver the | |||
file. If the client reports an error on the file, then the file MUST | file. If the client reports an error on the file, then the file MUST | |||
be resilvered. If the client does not reclaim or report an error | be resilvered. If the client does not reclaim or report an error | |||
before the grace period ends, then under the old behavior, the | before the grace period ends, then under the old behavior, the | |||
metadata server MUST resilver the file. | metadata server MUST resilver the file. | |||
The resilvering process is broadly to: | The resilvering process is broadly to: | |||
skipping to change at page 5, line 20 ¶ | skipping to change at line 190 ¶ | |||
1. fence the file (see Section 2.2 of [RFC8435]), | 1. fence the file (see Section 2.2 of [RFC8435]), | |||
2. record the need to resilver, | 2. record the need to resilver, | |||
3. release the write intent, and | 3. release the write intent, and | |||
4. once there are no write intents on the file, start the | 4. once there are no write intents on the file, start the | |||
resilvering process. | resilvering process. | |||
The metadata server MUST NOT resilver a file if there are clients | The metadata server MUST NOT resilver a file if there are clients | |||
with outstanding write intents. I.e., multiple clients might have | with outstanding write intents, i.e., multiple clients might have the | |||
the file open with write intents. As it MUST track write intents, it | file open with write intents. As the metadata server MUST track | |||
MUST also track the need to resilver. I.e., if the metadata server | write intents, it MUST also track the need to resilver, i.e., if the | |||
restarts during the grace period, it MUST restart the file recovery | metadata server restarts during the grace period, it MUST restart the | |||
if it replays the write intent else it MUST start the resilvering if | file recovery if it replays the write intent, or else it MUST start | |||
it replays the resilvering intent. | the resilvering if it replays the resilvering intent. | |||
Whether the metadata server prevents all I/O to the file until the | Whether the metadata server prevents all I/O to the file until the | |||
resilvering is done or forces all I/O to go through the metadata | resilvering is done, forces all I/O to go through the metadata | |||
server or allows a proxy server to update the new data file as it is | server, or allows a proxy server to update the new data file as it is | |||
being reslivered is all an implementation choice. The constraint is | being resilvered is all an implementation choice. The constraint is | |||
that the metadata server is responsible for the reconstruction of the | that the metadata server is responsible for the reconstruction of the | |||
data file and for the consistency of the mirrors. | data file and for the consistency of the mirrors. | |||
If the metadata server does allow the client access to the file | If the metadata server does allow the client access to the file | |||
during the resilvering, then the client MUST have the same layout | during the resilvering, then the client MUST have the same layout | |||
(set of mirror instances) after the metadata server as before. One | (set of mirror instances) after the metadata server as before. One | |||
way that such a resilvering can occur is for a proxy server to be | way that such a resilvering can occur is for a proxy server to be | |||
inserted into the layout. That server will be copying a good mirror | inserted into the layout. That server will be copying a good mirror | |||
instance to a new instance. As it gets I/O via the layout, it will | instance to a new instance. As it gets I/O via the layout, it will | |||
be responsible for updating the copy it is performing. This | be responsible for updating the copy it is performing. This | |||
skipping to change at page 6, line 17 ¶ | skipping to change at line 232 ¶ | |||
NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a | NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a | |||
NFS4ERR_BAD_STATEID error in this scenario, it should fall back to | NFS4ERR_BAD_STATEID error in this scenario, it should fall back to | |||
the old behavior of not reporting errors. | the old behavior of not reporting errors. | |||
3. Security Considerations | 3. Security Considerations | |||
There are no new security considerations beyond those in [RFC7862]. | There are no new security considerations beyond those in [RFC7862]. | |||
4. IANA Considerations | 4. IANA Considerations | |||
There are no IANA considerations for this document. | This document has no IANA actions. | |||
5. References | 5. References | |||
5.1. Normative References | 5.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
skipping to change at page 7, line 10 ¶ | skipping to change at line 273 ¶ | |||
[RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible | [RFC8435] Halevy, B. and T. Haynes, "Parallel NFS (pNFS) Flexible | |||
File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, | File Layout", RFC 8435, DOI 10.17487/RFC8435, August 2018, | |||
<https://www.rfc-editor.org/info/rfc8435>. | <https://www.rfc-editor.org/info/rfc8435>. | |||
[RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) | [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) | |||
Version 4 Minor Version 1 Protocol", RFC 8881, | Version 4 Minor Version 1 Protocol", RFC 8881, | |||
DOI 10.17487/RFC8881, August 2020, | DOI 10.17487/RFC8881, August 2020, | |||
<https://www.rfc-editor.org/info/rfc8881>. | <https://www.rfc-editor.org/info/rfc8881>. | |||
Appendix A. Acknowledgments | Acknowledgments | |||
Tigran Mkrtchyan, Jeff Layton, and Rick Macklem provided reviews of | Tigran Mkrtchyan, Jeff Layton, and Rick Macklem provided reviews of | |||
the document. | the document. | |||
Authors' Addresses | Authors' Addresses | |||
Thomas Haynes | Thomas Haynes | |||
Hammerspace | Hammerspace | |||
Email: loghyr@gmail.com | Email: loghyr@gmail.com | |||
End of changes. 33 change blocks. | ||||
99 lines changed or deleted | 86 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |