rfc9737v1.txt | rfc9737.txt | |||
---|---|---|---|---|
Internet Engineering Task Force (IETF) T. Haynes | Internet Engineering Task Force (IETF) T. Haynes | |||
Request for Comments: 9737 T. Myklebust | Request for Comments: 9737 T. Myklebust | |||
Category: Standards Track Hammerspace | Category: Standards Track Hammerspace | |||
ISSN: 2070-1721 February 2025 | ISSN: 2070-1721 February 2025 | |||
Reporting of Errors via LAYOUTRETURN in NFSv4.2 | Reporting Errors in NFSv4.2 via LAYOUTRETURN | |||
Abstract | Abstract | |||
The Parallel Network File System (pNFS) allows for a file's metadata | The Parallel Network File System (pNFS) allows for a file's metadata | |||
(MDS) and data (DS) to be on different servers. When the metadata | and data to be on different servers (i.e., the metadata server (MDS) | |||
server is restarted, the client can still modify the data file | and the data server (DS)). When the MDS is restarted, the client can | |||
component. During the recovery phase of startup, the metadata server | still modify the data file component. During the recovery phase of | |||
and the data servers work together to recover state (which files are | startup, the MDS and the DSs work together to recover state. If the | |||
open, last modification time, size, etc.). If the client has not | client has not encountered errors with the data files, then the state | |||
encountered errors with the data files, then the state can be | can be recovered and the resilvering of the data files can be | |||
recovered and the resilvering of the data files can be avoided. With | avoided. With any errors, there is no means by which the client can | |||
any errors, there is no means by which the client can report errors | report errors to the MDS. As such, the MDS has to assume that a file | |||
to the metadata server. As such, the metadata server has to assume | needs resilvering. This document presents an extension to RFC 8435 | |||
that a file needs resilvering. This document presents an extension | to allow the client to update the metadata via LAYOUTRETURN and avoid | |||
to RFC 8435 to allow the client to update the metadata and avoid | the resilvering. | |||
resilvering. | ||||
Status of This Memo | Status of This Memo | |||
This is an Internet Standards Track document. | This is an Internet Standards Track document. | |||
This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
(IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
Internet Engineering Steering Group (IESG). Further information on | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | Internet Standards is available in Section 2 of RFC 7841. | |||
skipping to change at line 72 ¶ | skipping to change at line 71 ¶ | |||
3. Security Considerations | 3. Security Considerations | |||
4. IANA Considerations | 4. IANA Considerations | |||
5. References | 5. References | |||
5.1. Normative References | 5.1. Normative References | |||
Acknowledgments | Acknowledgments | |||
Authors' Addresses | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
In the Network File System version 4 (NFSv4) with a Parallel NFS | In the Network File System version 4 (NFSv4) with a Parallel NFS | |||
(pNFS) Flexible File Layout [RFC8435] server, during recovery after a | (pNFS) flexible file layout [RFC8435] server, during recovery after a | |||
restart, there is no mechanism for the client to inform the metadata | restart, there is no mechanism for the client to inform the metadata | |||
server about an error that occurred during a WRITE operation (see | server (MDS) about an error that occurred during a WRITE operation | |||
Section 18.32 of [RFC8881]) to the data servers in the period of the | (see Section 18.32 of [RFC8881]) to the data servers (DSs) in the | |||
outage. | period of the outage. | |||
Using the process detailed in [RFC8178], the revisions in this | Using the process detailed in [RFC8178], the revisions in this | |||
document become an extension of NFSv4.2 [RFC7862]. They are built on | document become an extension of NFSv4.2 [RFC7862]. They are built on | |||
top of the External Data Representation (XDR) [RFC4506] generated | top of the External Data Representation (XDR) [RFC4506] generated | |||
from [RFC7863]. | from [RFC7863]. | |||
1.1. Definitions | 1.1. Definitions | |||
See Section 1.1 of [RFC8435] for a set of definitions. | See Section 1.1 of [RFC8435] for a set of definitions. | |||
1.2. Requirements Language | 1.2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
2. Layout State Recovery | 2. Layout State Recovery | |||
When a metadata server restarts, clients are provided a grace | When an MDS restarts, clients are provided a grace recovery period | |||
recovery period where they are allowed to recover any state that they | where they are allowed to recover any state that they had | |||
had established. With open files, the client can send an OPEN | established. With open files, the client can send an OPEN operation | |||
operation (see Section 18.16 of [RFC8881]) with a claim type of | (see Section 18.16 of [RFC8881]) with a claim type of CLAIM_PREVIOUS | |||
CLAIM_PREVIOUS (see Section 9.11 of [RFC8881]). The client uses the | (see Section 9.11 of [RFC8881]). The client uses the | |||
RECLAIM_COMPLETE operation (see Section 18.51 of [RFC8881]) to notify | RECLAIM_COMPLETE operation (see Section 18.51 of [RFC8881]) to notify | |||
the metadata server that it is done reclaiming state. | the MDS that it is done reclaiming state. | |||
The NFSv4 Flexible File Layout Type allows for the client to mirror | The NFSv4 flexible file layout type allows for the client to mirror | |||
files (see Section 8 of [RFC8435]). With client-side mirroring, it | files (see Section 8 of [RFC8435]). With client-side mirroring, it | |||
is important for the client to inform the metadata server of any I/O | is important for the client to inform the MDS of any I/O errors | |||
errors encountered with one of the mirrors. This is the only way for | encountered with one of the mirrors. This is the only way for the | |||
the metadata server to determine if one or more of the mirrors are | MDS to determine if one or more of the mirrors are corrupt and then | |||
corrupt and then repair the mirrors via resilvering (see Section 1.1 | repair the mirrors via resilvering (see Section 1.1 of [RFC8435]). | |||
of [RFC8435]). The client can use LAYOUTRETURN (see Section 18.44 of | The client can use LAYOUTRETURN (see Section 18.44 of [RFC8881]) and | |||
[RFC8881]) and the ff_ioerr4 structure (see Section 9.1.1 of | the ff_ioerr4 structure (see Section 9.1.1 of [RFC8435]) to inform | |||
[RFC8435]) to inform the metadata server of I/O errors. | the MDS of I/O errors. | |||
A problem arises when the metadata server restarts and the client has | A problem arises when the MDS restarts and the client has errors it | |||
errors it needs to report but cannot do so. Section 12.7.4 of | needs to report but cannot do so. Section 12.7.4 of [RFC8881] | |||
[RFC8881] requires that the client MUST stop using layouts. While | requires that the client MUST stop using layouts. While the intent | |||
the intent there is that the client MUST stop doing I/O to the | there is that the client MUST stop doing I/O to the storage devices, | |||
storage devices, it is also true that the layout stateids are no | it is also true that the layout stateids are no longer valid. The | |||
longer valid. The LAYOUTRETURN needs a layout stateid to proceed, | LAYOUTRETURN needs a layout stateid to proceed, and the client cannot | |||
and the client cannot get a layout during grace recovery (see | get a layout during grace recovery (see Section 12.7.4 of [RFC8881]) | |||
Section 12.7.4 of [RFC8881]) to recover layout state. As such, | to recover layout state. As such, clients have no choice but to not | |||
clients have no choice but to not recover files with I/O errors. In | recover files with I/O errors. In turn, the MDS MUST assume that the | |||
turn, the metadata server MUST assume that the mirrors are | mirrors are inconsistent and pick one for resilvering. It is a MUST | |||
inconsistent and pick one for resilvering. It is a MUST because even | because even if the MDS can determine that the client did modify data | |||
if the metadata server can determine that the client did modify data | ||||
during the outage, it MUST NOT assume those modifications were | during the outage, it MUST NOT assume those modifications were | |||
consistent. | consistent. | |||
To fix this issue, the metadata server MUST accept the anonymous | To fix this issue, the MDS MUST accept the anonymous stateid of all | |||
stateid of all zeros (see Section 8.2.3 of [RFC8881]) for the | zeros (see Section 8.2.3 of [RFC8881]) for the lrf_stateid in | |||
lrf_stateid in LAYOUTRETURN (see Section 18.44.1 of [RFC8881]). The | LAYOUTRETURN (see Section 18.44.1 of [RFC8881]). The client can use | |||
client can use this anonymous stateid to inform the metadata server | this anonymous stateid to inform the MDS of errors encountered. The | |||
of errors encountered. The metadata server can then accurately | MDS can then accurately resilver the file by picking the mirror(s) | |||
resilver the file by picking the mirror(s) that does not have any | that does not have any associated errors. | |||
associated errors. | ||||
During the grace period, if the client sends an lrf_stateid in the | During the grace period, if the client sends an lrf_stateid in the | |||
LAYOUTRETURN with any value other than the anonymous stateid of all | LAYOUTRETURN with any value other than the anonymous stateid of all | |||
zeros, then the metadata server MUST respond with an error of | zeros, then the MDS MUST respond with an error of NFS4ERR_GRACE (see | |||
NFS4ERR_GRACE (see Section 15.1.9.2 of [RFC8881]). After the grace | Section 15.1.9.2 of [RFC8881]). After the grace period, if the | |||
period, if the client sends an lrf_stateid in the LAYOUTRETURN with a | client sends an lrf_stateid in the LAYOUTRETURN with a value of the | |||
value of the anonymous stateid of all zeros, then the metadata server | anonymous stateid of all zeros, then the MDS MUST respond with an | |||
MUST respond with an error of NFS4ERR_NO_GRACE (see Section 15.1.9.3 | error of NFS4ERR_NO_GRACE (see Section 15.1.9.3 of [RFC8881]). | |||
of [RFC8881]). | ||||
Also, when the metadata server builds the reply to the LAYOUTRETURN | Also, when the MDS builds the reply to the LAYOUTRETURN with an | |||
when an lrf_stateid with the value of the anonymous stateid of all | lrf_stateid with the value of the anonymous stateid of all zeros, it | |||
zeros it MUST NOT bump the seqid of the lorr_stateid. | MUST NOT bump the seqid of the lorr_stateid. | |||
If the metadata server detects that the layout being returned in the | If the MDS detects that the layout being returned in the LAYOUTRETURN | |||
LAYOUTRETURN does not match the current mirror instances found for | does not match the current mirror instances found for the file, then | |||
the file, then it MUST ignore the LAYOUTRETURN and resilver the file | it MUST ignore the LAYOUTRETURN and resilver the file in question. | |||
in question. | ||||
The metadata server MUST resilver any files that are neither | The MDS MUST resilver any files that are neither explicitly recovered | |||
explicitly recovered with a CLAIM_PREVIOUS nor have a reported error | with a CLAIM_PREVIOUS nor have a reported error via a LAYOUTRETURN. | |||
via a LAYOUTRETURN. The client has most likely restarted and lost | The client has most likely restarted and lost any state. | |||
any state. | ||||
2.1. When to Resilver | 2.1. When to Resilver | |||
A write intent occurs when a client opens a file and gets a | A write intent occurs when a client opens a file and gets a | |||
LAYOUTIOMODE4_RW from the metadata server. The metadata server MUST | LAYOUTIOMODE4_RW from the MDS. The MDS MUST track outstanding write | |||
track outstanding write intents, and when it restarts, it MUST track | intents, and when it restarts, it MUST track recovery of those write | |||
recovery of those write intents. The method that the metadata server | intents. The method that the MDS uses to track write intents is | |||
uses to track write intents is implementation specific, i.e., outside | implementation specific, i.e., outside the scope of this document. | |||
the scope of this document. | ||||
The decision to resilver a file depends on how the client recovers | The decision to resilver a file depends on how the client recovers | |||
the file before the grace period ends. If the client reclaims the | the file before the grace period ends. If the client reclaims the | |||
file and reports no errors, the metadata server MUST NOT resilver the | file and reports no errors, the MDS MUST NOT resilver the file. If | |||
file. If the client reports an error on the file, then the file MUST | the client reports an error on the file, then the file MUST be | |||
be resilvered. If the client does not reclaim or report an error | resilvered. If the client does not reclaim or report an error before | |||
before the grace period ends, then under the old behavior, the | the grace period ends, then under the old behavior, the MDS MUST | |||
metadata server MUST resilver the file. | resilver the file. | |||
The resilvering process is broadly to: | The resilvering process is broadly to: | |||
1. fence the file (see Section 2.2 of [RFC8435]), | 1. fence the file (see Section 2.2 of [RFC8435]), | |||
2. record the need to resilver, | 2. record the need to resilver, | |||
3. release the write intent, and | 3. release the write intent, and | |||
4. once there are no write intents on the file, start the | 4. once there are no write intents on the file, start the | |||
resilvering process. | resilvering process. | |||
The metadata server MUST NOT resilver a file if there are clients | The MDS MUST NOT resilver a file if there are clients with | |||
with outstanding write intents, i.e., multiple clients might have the | outstanding write intents, i.e., multiple clients might have the file | |||
file open with write intents. As the metadata server MUST track | open with write intents. As the MDS MUST track write intents, it | |||
write intents, it MUST also track the need to resilver, i.e., if the | MUST also track the need to resilver, i.e., if the MDS restarts | |||
metadata server restarts during the grace period, it MUST restart the | during the grace period, it MUST restart the file recovery if it | |||
file recovery if it replays the write intent, or else it MUST start | replays the write intent, or else it MUST start the resilvering if it | |||
the resilvering if it replays the resilvering intent. | replays the resilvering intent. | |||
Whether the metadata server prevents all I/O to the file until the | Whether the MDS prevents all I/O to the file until the resilvering is | |||
resilvering is done, forces all I/O to go through the metadata | done, forces all I/O to go through the MDS, or allows a proxy server | |||
server, or allows a proxy server to update the new data file as it is | to update the new data file as it is being resilvered is all an | |||
being resilvered is all an implementation choice. The constraint is | implementation choice. The constraint is that the MDS is responsible | |||
that the metadata server is responsible for the reconstruction of the | for the reconstruction of the data file and for the consistency of | |||
data file and for the consistency of the mirrors. | the mirrors. | |||
If the metadata server does allow the client access to the file | If the MDS does allow the client access to the file during the | |||
during the resilvering, then the client MUST have the same layout | resilvering, then the client MUST have the same layout (set of mirror | |||
(set of mirror instances) after the metadata server as before. One | instances) after the MDS as before. One way that such a resilvering | |||
way that such a resilvering can occur is for a proxy server to be | can occur is for a proxy server to be inserted into the layout. That | |||
inserted into the layout. That server will be copying a good mirror | server will be copying a good mirror instance to a new instance. As | |||
instance to a new instance. As it gets I/O via the layout, it will | it gets I/O via the layout, it will be responsible for updating the | |||
be responsible for updating the copy it is performing. This | copy it is performing. This requirement is that the proxy server | |||
requirement is that the proxy server MUST stay in the layout until | MUST stay in the layout until the grace period is finished. | |||
the grace period is finished. | ||||
2.2. Version Mismatch Considerations | 2.2. Version Mismatch Considerations | |||
The metadata server has no expectations for the client to use this | The MDS has no expectations for the client to use this new | |||
new functionality. Therefore, if the client does not use it, the | functionality. Therefore, if the client does not use it, the MDS | |||
metadata server will function normally. | will function normally. | |||
If the client does use the new functionality and the metadata server | If the client does use the new functionality and the MDS does not | |||
does not support it, then the metadata server MUST reply with a | support it, then the MDS MUST reply with a NFS4ERR_BAD_STATEID to the | |||
NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a | LAYOUTRETURN. If the client detects a NFS4ERR_BAD_STATEID error in | |||
NFS4ERR_BAD_STATEID error in this scenario, it should fall back to | this scenario, it should fall back to the old behavior of not | |||
the old behavior of not reporting errors. | reporting errors. | |||
3. Security Considerations | 3. Security Considerations | |||
There are no new security considerations beyond those in [RFC7862]. | There are no new security considerations beyond those in [RFC7862]. | |||
4. IANA Considerations | 4. IANA Considerations | |||
This document has no IANA actions. | This document has no IANA actions. | |||
5. References | 5. References | |||
End of changes. 21 change blocks. | ||||
107 lines changed or deleted | 99 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |