<?xml version='1.0'encoding='utf-8'?>encoding='UTF-8'?> <!DOCTYPErfc> <?xml-stylesheet type='text/xsl' href='rfc2629.xslt'?>rfc [ <!ENTITY nbsp " "> <!ENTITY zwsp "​"> <!ENTITY nbhy "‑"> <!ENTITY wj "⁠"> ]> <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category='std' docName='draft-ietf-nfsv4-layrec-04' number="9737" ipr='trust200902' obsoletes=''scripts='Common,Latin'updates="" sortRefs='true' submissionType='IETF' symRefs='true' tocDepth='3' tocInclude='true' consensus='true' version='3' xml:lang='en'> <front><title abbrev='LAYOUT_RECOVERY'><!--[rfced] Title and Short Title a) May we update the document title for conciseness by removing "of" and rephrasing the text to reflect that the errors are reported "in NFSv4" as shown below? b) May we update the short title that spans the header of the PDF file to more closely match the document title as shown below? c) We note that "LAYOUTRETURN" is mentioned in the title but not in the Abstract or Introduction. Should "LAYOUTRETURN" be included to those sections for consistency with the title? If so, please provide the desired text. Document Title Original: Reporting of Errors via LAYOUTRETURN in NFSv4.2</title>Perhaps: Reporting Errors in NFSv4.2 via LAYOUTRETURN ... Short Title Original: LAYOUT_RECOVERY Perhaps: Reporting Errors via LAYOUTRETURN --> <title abbrev='LAYOUT_RECOVERY'>Reporting of Errors via LAYOUTRETURN in NFSv4.2</title> <seriesInfoname='Internet-Draft' value='draft-ietf-nfsv4-layrec-04'/>name='RFC' value='9737'/> <author fullname='Thomas Haynes' initials='T.' surname='Haynes'> <organization abbrev='Hammerspace'>Hammerspace</organization> <address> <email>loghyr@gmail.com</email> </address> </author> <author fullname='Trond Myklebust' initials='T.' surname='Myklebust'> <organization abbrev='Hammerspace'>Hammerspace</organization> <address> <email>trondmy@hammerspace.com</email> </address> </author> <dateyear='2024' month='November' day='21'/> <area>Transport</area> <workgroup>Network File System Version 4</workgroup>year='2025' month='February'/> <area>WIT</area> <workgroup>nfsv4</workgroup> <keyword>NFSv4</keyword> <abstract> <t> <!--[rfced] We note that "MDS" and "DS" are expanded as "metadata server" and "data server", respectively, in RFC 8435. May we expand these terms in the Abstract as shown below (option A) to match RFC 8435? After these terms are expanded, would you like to use the abbreviations? There are 37 instances of "metadata server" and 2 instances of "data server". If not, and it is desired to have the term written out, should "MDS" and "DS" simply be removed since they are not used elsewhere in the document (option B)? Please let us know your preference. Original: The Parallel Network File System (pNFS) allows for a file's metadata (MDS) and data (DS) to be on different servers. When the metadata server is restarted, the client can still modify the data file component. During the recovery phase of startup, the metadata server and the data servers work together to recover state (which files are open, last modification time, size, etc.). Perhaps A: The Parallel Network File System (pNFS) allows for a file's metadata and data to be on different servers (i.e., the metadata server (MDS) and the data server (DS)). or Perhaps B: The Parallel Network File System (pNFS) allows for a file's metadata and data to be on different servers. --> The Parallel Network File System (pNFS) allows for a file's metadata (MDS) and data (DS) to be on different servers. When the metadata server is restarted, the client can still modify the data file component. <!--[rfced] Please clarify "which files are open, last modification time, size, etc.)". Are these files used by the servers during the recovery phase? Original: During the recovery phase of startup, the metadata server and the data servers work together to recover state (which files are open, last modification time, size, etc.). Perhaps: During the recovery phase of startup, the metadata server and the data servers work together to recover state (the files used are "open", "last modification time", "size", etc.). --> During the recovery phase of startup, the metadata server and the data servers work together to recover state (which files are open, last modification time, size, etc.). If the client has not encountered errors with the data files, then the state can berecovered, avoidingrecovered and the resilvering of the datafiles.files can be avoided. With any errors, there is no means by which the client can report errors to the metadata server. As such, the metadata server has to assume that a file needs resilvering. This document presents an extension toRFC8435RFC 8435 to allow the client to update the metadata and avoidtheresilvering. </t> </abstract><note removeInRFC='true'> <t> Discussion of this draft takes place on the NFSv4 working group mailing list (nfsv4@ietf.org), which is archived at <eref target='https://mailarchive.ietf.org/arch/browse/nfsv4/'/>. Working Group information can be found at <eref target='https://datatracker.ietf.org/wg/nfsv4/about/'/>. </t> </note></front> <middle> <section anchor='sec_intro' numbered='true'removeInRFC='false'toc='default'> <name>Introduction</name> <t> In the Network File Systemversion4version 4 (NFSv4) with a Parallel NFS (pNFS) Flexible File Layout(<xref<xref target='RFC8435' format='default'sectionFormat='of'/>)sectionFormat='of'/> server, during recovery after a restart, there is no mechanism for the client to inform the metadata server about an errorwhichthat occurred during a WRITE operation (seeSection 18.32 of<xref section="18.32" target='RFC8881' format='default' sectionFormat='of'/>)operationto the data servers in the period of the outage. </t> <t> Using the process detailed in <xref target='RFC8178' format='default' sectionFormat='of'/>, the revisions in this document become an extension of NFSv4.2 <xref target='RFC7862' format='default' sectionFormat='of'/>. They are built on top of theexternal data representationExternal Data Representation (XDR) <xref target='RFC4506' format='default' sectionFormat='of'/> generated from <xref target='RFC7863' format='default' sectionFormat='of'/>. </t> <section anchor='sec_defs' numbered='true'removeInRFC='false'toc='default'> <name>Definitions</name> <t> SeeSection 1.1 of<xref section="1.1" target='RFC8435' format='default' sectionFormat='of'/> for a set of definitions. </t> </section> <section numbered='true'removeInRFC='false'toc='default'> <name>Requirements Language</name> <t> The key words'<bcp14>MUST</bcp14>', '<bcp14>MUST NOT</bcp14>', '<bcp14>REQUIRED</bcp14>', '<bcp14>SHALL</bcp14>', '<bcp14>SHALL NOT</bcp14>', '<bcp14>SHOULD</bcp14>', '<bcp14>SHOULD NOT</bcp14>', '<bcp14>RECOMMENDED</bcp14>', '<bcp14>NOT RECOMMENDED</bcp14>', '<bcp14>MAY</bcp14>',"<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and'<bcp14>OPTIONAL</bcp14>'"<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as described inBCP 14BCP 14 <xreftarget='RFC2119' format='default' sectionFormat='of'/>target="RFC2119"/> <xreftarget='RFC8174' format='default' sectionFormat='of'/>target="RFC8174"/> when, and only when, they appear in all capitals, as shown here. </t> </section> </section> <section anchor='layout_state_recovery' numbered='true'removeInRFC='false'toc='default'> <name>Layout State Recovery</name> <t> When a metadata server restarts, clients are provided a grace recovery period where they are allowed to recover any state that they had established. With open files, the client can send an OPEN operation (seeSection 18.16 of<xref section="18.16" target='RFC8881' format='default' sectionFormat='of'/>)operationwith a claim type of CLAIM_PREVIOUS (seeSection 9.11 of<xref section="9.11" target='RFC8881' format='default' sectionFormat='of'/>). The client uses the RECLAIM_COMPLETE operation (seeSection 18.51 of<xref section="18.51" target='RFC8881' format='default' sectionFormat='of'/>)operationto notify the metadata server that it is done reclaiming state. </t> <t> The NFSv4 Flexible File Layout Type allows for the client to mirror files (seeSection 8 of<xref section="8" target='RFC8435' format='default' sectionFormat='of'/>). Withclient sideclient-side mirroring, it is important for the client to inform the metadata server of any I/O errors encountered with one of the mirrors. This is the only way for the metadata server to determine if one or more of the mirrorsisare corrupt and then repair the mirrors via resilvering (seeSection 1.1 of<xref section="1.1" target='RFC8435' format='default' sectionFormat='of'/>). The client can use LAYOUTRETURN (seeSection 18.44 of<xref section="18.44" target='RFC8881' format='default' sectionFormat='of'/>) and the ff_ioerr4 structure (seeSection 9.1.1 of<xref section="9.1.1" target='RFC8435' format='default' sectionFormat='of'/>)structureto inform the metadata server of I/O errors. </t> <t> A problemis thatarises when the metadata server restarts and the client has errors it needs toreport, it can notreport but cannot do so.Section 12.7.4 of<xref section="12.7.4" target='RFC8881' format='default' sectionFormat='of'/> requires that the client <bcp14>MUST</bcp14> stop using layouts. While the intent there is that the client <bcp14>MUST</bcp14> stop doing I/O to the storage devices, it is also true that the layout stateids are no longer valid. The LAYOUTRETURN needs a layout stateid toproceedproceed, and the clientcan notcannot get a layout during grace recovery (seeSection 12.7.4 of<xref section="12.7.4" target='RFC8881' format='default' sectionFormat='of'/>) to recover layout state. As such, clients have no choice but to not recover files with I/O errors. In turn, the metadata server <bcp14>MUST</bcp14> assume that the mirrors are inconsistent and pick one for resilvering. It is a <bcp14>MUST</bcp14> because even if the metadata server can determine that the client did modify data during the outage, it <bcp14>MUST NOT</bcp14> assume those modifications were consistent. </t> <t> To fix this issue, the metadata server <bcp14>MUST</bcp14> acceptforthelrf_stateid in LAYOUTRETURN (see Section 18.44.1anonymous stateid of all zeros (see <xref section="8.2.3" target='RFC8881' format='default' sectionFormat='of'/>) for theanonymous stateid of all zeroslrf_stateid in LAYOUTRETURN (seeSection 8.2.3 of<xref section="18.44.1" target='RFC8881' format='default' sectionFormat='of'/>). The client can use this anonymous stateid to inform the metadata server of errors encountered. The metadata server can then accurately resilver the file by picking the mirror(s) thatdodoes not have any associated errors. </t> <t> During the grace period, if the client sendsaan lrf_stateid in the LAYOUTRETURN with any value other than the anonymous stateid of all zeros, then the metadata server <bcp14>MUST</bcp14>nowrespond with an error of NFS4ERR_GRACE (seeSection of 15.1.9.2<xref section="15.1.9.2" target='RFC8881' format='default' sectionFormat='of'/>). After the grace period, if the client sendsaan lrf_stateid in the LAYOUTRETURN with a value of the anonymous stateid of all zeros, then the metadata server <bcp14>MUST</bcp14>nowrespond with an error of NFS4ERR_NO_GRACE (seeSection 15.1.9.3 of<xref section="15.1.9.3" target='RFC8881' format='default' sectionFormat='of'/>). </t> <t> <!--[rfced] We are having trouble parsing this sentence. Are words missing after "when a lrf_stateid with the value of the anonymous stateid of all zeros", or should "when a lrf_stateid" perhaps be "with an lrf_stateid"? Please review and let us know how we may clarify. Original: Also, when the metadata server builds the reply to the LAYOUTRETURN when a lrf_stateid with the value of the anonymous stateid of all zeros it MUST NOT bump the seqid of the lorr_stateid. Perhaps: Also, when the metadata server builds the reply to the LAYOUTRETURN with an lrf_stateid with an anonymous stateid value of all zeros, it MUST NOT bump the seqid of the lorr_stateid. --> Also, when the metadata server builds the reply to the LAYOUTRETURN when an lrf_stateid with the value of the anonymous stateid of all zeros it <bcp14>MUST NOT</bcp14> bump the seqid of the lorr_stateid. </t> <t> If the metadata server detects that the layout being returned in the LAYOUTRETURN does not match the current mirror instances found for the file, then it <bcp14>MUST</bcp14> ignore the LAYOUTRETURN and resilver the file in question. </t> <t> The metadata server <bcp14>MUST</bcp14> resilver any fileswhichthat are neither explicitly recovered with a CLAIM_PREVIOUS nor have a reported error via a LAYOUTRETURN. The client has most likely restarted and lost any state. </t> <section anchor='sec_when_to_resilver' numbered='true'removeInRFC='false'toc='default'> <name>When to Resilver</name> <t> A write intent occurs when a client opens a file and gets a LAYOUTIOMODE4_RW from the metadata server. The metadata server <bcp14>MUST</bcp14> track outstanding writeintentsintents, and when it restarts, it <bcp14>MUST</bcp14> track recovery of those write intents. The method that the metadata server uses to track write intents is implementation specific, i.e., outsideofthe scope of this document. </t> <t> The decision to resilver a file depends on how the client recovers the file before the grace period ends. If the client reclaims the file and reports no errors, the metadata server <bcp14>MUST NOT</bcp14> resilver the file. If the client reports an error on the file, then the file <bcp14>MUST</bcp14> be resilvered. If the client does not reclaim or report an error before the grace period ends, then under the old behavior, the metadata server <bcp14>MUST</bcp14> resilver the file. </t> <t> The resilvering process is broadly to: </t> <ol> <li> fence the file (seeSection 2.2 of<xref section="2.2" target='RFC8435' format='default' sectionFormat='of'/>), </li> <li> record the need to resilver, </li> <li> release the write intent, and </li> <li> once there are no write intents on the file, start the resilvering process. </li> </ol> <t> The metadata server <bcp14>MUST NOT</bcp14> resilver a file if there are clients with outstanding writeintents. I.e.,intents, i.e., multiple clients might have the file open with write intents. Asitthe metadata server <bcp14>MUST</bcp14> track write intents, it <bcp14>MUST</bcp14> also track the need toresilver. I.e.,resilver, i.e., if the metadata server restarts during the grace period, it <bcp14>MUST</bcp14> restart the file recovery if it replays the writeintentintent, or else it <bcp14>MUST</bcp14> start the resilvering if it replays the resilvering intent. </t> <t> Whether the metadata server prevents all I/O to the file until the resilvering isdone ordone, forces all I/O to go through the metadataserverserver, or allows a proxy server to update the new data file as it is beingresliveredresilvered is all an implementation choice. The constraint is that the metadata server is responsible for the reconstruction of the data file and for the consistency of the mirrors. </t> <t> If the metadata server does allow the client access to the file during the resilvering, then the client <bcp14>MUST</bcp14> have the same layout (set of mirror instances) after the metadata server as before. One way that such a resilvering can occur is for a proxy server to be inserted into the layout. That server will be copying a good mirror instance to a new instance. As it gets I/O via the layout, it will be responsible for updating the copy it is performing. This requirement is that the proxy server <bcp14>MUST</bcp14> stay in the layout until the grace period is finished. </t> </section> <section anchor='sec_vers_mismatch' numbered='true'removeInRFC='false'toc='default'> <name>Version Mismatch Considerations</name> <t> The metadata server has no expectations for the client to use this new functionality. Therefore, if the client does not use it, the metadata server will function normally. </t> <t> If the client does use the new functionality and the metadata server does not support it, then the metadata server <bcp14>MUST</bcp14> reply with a NFS4ERR_BAD_STATEID to the LAYOUTRETURN. If the client detects a NFS4ERR_BAD_STATEID error in this scenario, it should fall back to the old behavior of not reporting errors. </t> </section> </section> <section anchor='sec_security' numbered='true'removeInRFC='false'toc='default'> <name>Security Considerations</name> <t> There are no new security considerations beyond those in <xref target='RFC7862' format='default' sectionFormat='of'/>. </t> </section> <section anchor='sec_iana' numbered='true'removeInRFC='false'toc='default'> <name>IANA Considerations</name> <t>There areThis document has no IANAconsiderations for this document.actions. </t> </section> </middle> <back> <references> <name>References</name> <references> <name>Normative References</name> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4506.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4506.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7862.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7862.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7863.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7863.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8178.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8178.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8435.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8435.xml"/> <xi:includexmlns:xi='http://www.w3.org/2001/XInclude' href='https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8881.xml'/>href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8881.xml"/> </references> </references> <sectionnumbered='true' removeInRFC='false'numbered='false' toc='default'> <name>Acknowledgments</name><t> Tigran Mkrtchyan, Jeff Layton,<t><contact fullname="Tigran Mkrtchyan"/>, <contact fullname="Jeff Layton"/>, andRick Macklem<contact fullname="Rick Macklem"/> provided reviews of thedocument. </t>document.</t> </section> <!-- [rfced] We note that the following terms appear as lowercase in FCs 8435 and 8881. Should these terms be made lowercase to match se in those RFCs? Flexible File Layout Flexible File Layout Type --> <!-- [rfced] Please review the "Inclusive Language" portion of the online Style Guide <https://www.rfc-editor.org/styleguide/part2/#inclusive_language> and let us know if any changes are needed. Updates of this nature typically result in more precise language, which is helpful for readers. Note that our script did not flag any words in particular, but this should still be reviewed as a best practice. --> </back> </rfc>