Adding Support for Arbitrary File Sizes to the Single UNIX Specification

Last Update: 20Mar96
----------------------------------------------------------------------------

Table of Contents

Adding Support for Arbitrary File Sizes to the Single UNIX Specification
1.0 Overview
1.1 The Large File Problem
1.2 Requirements
1.3 Importance
1.4 Concepts
1.5 Changes and Additions
1.6 Conformance
2.0 Changes to the Single UNIX Specification
2.1 Changes to CAE Specification System Interface Definitions, Issue 4,
Version 2
2.2 Changes to CAE Specification System Interfaces and Headers, Issue 4,
Version 2
2.2.1 Changes to System Interfaces
2.2.2 Changes to Headers
2.3 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2
3.0 Transitional Extensions to the Single UNIX Specification
3.1 Transitional Extensions to CAE Specification System Interfaces and
Headers, Issue 4, Version 2
3.1.1 Transitional Extensions to System Interfaces
3.1.2 Transitional Extensions to Headers
3.2 Transitional Extensions to the mount Utility
3.3 Accessing the Extensions to the SUS

Appendix A: Rationale and Notes
A.1 Overview
A.1.1 Guiding Principles
A.1.2 Concepts
A.2 Changes to the Single UNIX Specification
A.2.1 Changes to CAE Specification System Interfaces and Headers, Issue 4,
Version 2
A.2.1.1 Changes to System Interfaces
A.2.2 Changes to CAE Specification Commands and Utilities, Issue 4, Version
2
A.3 Transitional Extensions to the Single UNIX Specification
A.3.1 Transitional Extensions to CAE Specification System Interfaces and
Headers, Issue 4, Version 2
A.3.1.1 Transitional Extensions to System Interfaces
A.3.1.2 Transitional Extensions to Headers
A.3.2 Accessing the Transitional Extensions to the SUS

Acknowledgements

Revision Information
23Feb96 Version 1.1
24Feb96 Version 1.2
01Mar96 Version 1.3
05Mar96 Version 1.4
20Mar96 Version 1.5
----------------------------------------------------------------------------

Acknowledgements

Even with the rise of 64-bit systems, the 32-bit operating system will be
with us for a while yet. However, the need for interoperability with 64-bit
systems, large applications, large databases, and cheap disks has created a
market imperative for the UNIX industry: support large files on 32-bit
systems. Most current UNIX systems support file sizes of at most 2^31-1
bytes. This is not enough for today's applications, which include files
containing videos, sounds, images, and large databases. Today's 32-bit
systems are quite capable of handling the computational needs of these
applications, but they need to be able to support maximum file sizes that
are many orders of magnitude larger.

This support must be compatible with the existing Single UNIX Specification,
and provide a path to conformance with following versions. It must allow
system vendors a cost effective approach to adding these features to their
existing products, and provide system vendors, software vendors, and users
with a clear path for future products. The independent software vendors
(ISVs) listed below gathered in a set of meetings with the UNIX systems
vendors to develop a common set of APIs and modifications to the Single UNIX
Specification to allow support for large files. We called these meetings the
Large File Summit. For details of the meetings, how the proposals were
developed, and the ISV requirements document, see
http://www.sas.com/standards/large.file.

This work is being sent to the X/Open Base System Working Group so they can
consider the changes that are suggested for the next generation of the
Single UNIX Specification.

The individuals who participated in the Large File Summit meetings and
on-line discussions were:

Amdahl Corp.:  Dennis Chapman, John Haines
Convex Computer Corp.:  Mike Carl, Peter Poorman, Tom White
Cray Research, Inc.:  Rick Matthews
Data General Corp.:  Dean Herington
Digital Equipment Corp.:  Fred Glover, Ray Lanza, Peter Smith
Fujitsu:  Chris Seabrook
HAL Computer Systems, Inc.:  Prashant Dholakia, Howard Gayle,
     David H. Yamada
Hewlett-Packard Co.:  Larry Dwyer, Hal Prince
IBM Corp.:  Bill Baker, Mark Brown
MacNeal-Schwendler Corp.:  David Lombard
NCR:  Kevin Brasche, Shawn Shealy
NEC Systems Laboratory, Inc.:  Jeff Forys
Novell:  Bill Cox, John Kiger, Seth Rosenthal
NOVON Research Inc.:  Brian Boyle
Oracle:  Mark Johnson
Programmed Logic Corp.:  Tim Williams, Steve Rago
Pyramid Technology Corporation:  Ralph Campbell, Henry Robinson
SAS Institute Inc.:  Mark Cates, Leigh Ihnen, Tom Truscott,
     Kelly Wyatt
Sequent Computer Systems:  Gerrit Huizenga, Mike Spitzer
Siemens Nixdorf Inc.:  Ralf Nolting, Klaus Thon
Silicon Graphics:  Steve Cobb, Adam Sweeney
Stratus Computer Inc.:  Tony Luck
Sun Microsystems, Inc.:  Steve Chessin
SunSoft Inc.:  Karen Barnes, Don Cragun, Karl Danz, Andy Roach,
     Glenn Skinner, Peter Van der Linden,
     Srinivasan Viswanathan
Sybase Inc.:  Marc Sugiyama
Syncsort Inc.:   Asokan
Tandem Computers:  David M. VomLehn
The Santa Cruz Operation, Inc.:  John Farley, Kurt Gollhardt,
     Art Herzog, Danielle Lahmani, Wen-Ling Lu, Dave Prosser
Unisoft:  Guy Hadland
Unisys Corp.:  Steve Beck, Bruce Jones, Scott Lurndal,
     Jim Soddy
UTG Inc.:  Michael Dortch, Mark Hatch, Larry Lytle
Veritas:  Craig Harmer, Michael Schmitz

Special thanks go to SAS Institute Inc., SunSoft, Silicon Graphics, and
Convex Computer (now HP) for providing meeting rooms and logistics support.
Hal Prince and Don Cragun provided technical guidance and kept us aware of
the details. Mark Brown helped us understand how important it was to comply
with existing standards. Bill Baker and Tom White worked hard typing early
drafts and providing alternative ways to organize the document. Adam Sweeney
and Howard Gayle kept us within reason. David VomLehn and Tom Truscott kept
good notes and provided the minutes. Ray Lanza gave us rousing encouragement
("Just make everything 64 bits!!"). Mark Johnson quipped excellent
summaries. Kelly Wyatt did the final edits and provided an excellent sanity
check during the endgame. And special thanks go to Mark Hatch (now with
Integrated Computer Solutions, Inc.) who organized the first meetings and
got this effort going.

I really enjoyed participating and would like to express my gratitude to the
members of the large file summit. In particular, I enjoyed participating
with people who were so honestly motivated to make the right technical
decisions. This was a great lesson in UNIX file system semantics and how the
Open Systems Process works.

There are a couple of interesting features of this specification. First, it
contains a method of supporting an industry wide transition to full 64-bit
APIs. Second, it specifies a set of changes to the Single UNIX Specification
that will allow unlimited file offsets. The transition includes a way to add
64-bit file indexing without breaking current compliance to standards, and
allow software developers to migrate existing sources and binaries to
systems that support 64-bit file indexing.

This document is the result of a collaborative process that was open to all
participants. The efforts of those who participated will best be rewarded by
having this work accepted and used. I believe that this specification is an
example of how well the industry can work together to solve problems that
affect our ability to produce products that compete in the market.

John Carl Zeigler, jcz@utg.org
VP Technology, UTG Inc.
Cary, NC
----------------------------------------------------------------------------

1.0 Overview

1.1 The Large File Problem

As UNIX systems have become increasingly powerful, a number of system
vendors and UNIX independent software vendors have developed a requirement
to access files that contain more information than can be addressed using a
signed long integer. One possible solution could be to convert every program
using files to a larger size for long integers, including the operating
system. However, the work to do this is undesirable for many vendors. A
number of major system vendors and users have been meeting at the "Large
File Summit" (LFS) for over a year to develop a set of changes to the
existing Single UNIX Specification (SUS) that allow both new and converted
programs to address files of arbitrary sizes. This set of changes will be
provided to X/Open for inclusion into the next version of the SUS. In
addition, a set of transitional extensions intended to permit users to
immediately implement large file support on typical 32-bit UNIX operating
systems is proposed. Both the changes and transitional extensions and the
rationale behind their definition is included in this document.

1.2 Requirements

The LFS has worked to develop a solution to the large file problem meeting
the following requirements:

Be implementable at a reasonable cost
     Several of the LFS members are leading efforts to develop and implement
     solutions. Results from their experiences have guided our decisions.
Protect existing programs
     This proposal allows for protection of existing programs. Many of the
     solutions considered would have caused existing programs to fail
     unexpectedly and silently. This proposal has been carefully crafted to
     reduce this possibility.
Provide access to files much larger than 2 gigabytes on 32-bit operating
systems
     This is the requirement that first motivated the LFS activity. The
     proposed changes implement a solution that allows file size and related
     sizes to be uncoupled from the size of the C language data types chosen
     for an operating environment. As a result, systems conforming to the
     proposed changes to the SUS can support files of arbitrary sizes.
Be fully compliant to the SUS
     Systems modified to support the proposed extensions can be configured
     to strictly conform to the existing SUS. These same systems will
     normally be configured to fully meet the proposed changes supporting
     arbitrary file sizes and remain compliant to the SUS with extensions.
     In addition, conforming systems can also support a transitional API
     extension designed to substantially reduce the difficulty of conversion
     to this proposed standard while remaining compliant to the existing
     SUS. This transitional interface is contained in section 3.0
     Transitional Extensions to the Single UNIX Specification.
Provide an extension to the SUS
     While the LFS would like to see this proposal included in the next
     version of the SUS, this specification provides extensions that system
     vendors and independent software vendors need to support this
     functionality in their current compliant products.

1.3 Importance

As noted earlier, several vendors have already begun or completed
implementation because of substantial market pressures. Independent software
vendors are already writing software dependent on large file functionality.
Rapid inclusion into the SUS is necessary to avoid repeating the existing
situation where over 20 different implementations of asynchronous I/O are
available on various UNIX systems. The LFS has chosen design alternatives to
facilitate the needed rapid process of standardization. We believe the
proposed changes will substantially enhance the value of the next revision
of the SUS if they are included.

1.4 Concepts

The proposed changes are motivated by a consistent implementation of a few
very basic technical concepts.

Mixed sizes of off_t
     During a period of transition from existing systems to systems able to
     support an arbitrarily large file size, most systems will need to
     support binaries with two or more sizes of the off_t data type (and
     related data types). This mixed off_t environment may occur on a system
     with an ABI that supports different sizes of off_t. It may occur on a
     system which has both a 64-bit and a 32-bit ABI. Finally, it may occur
     when using a distributed system where clients and servers have
     differing sizes of off_t. In effect, the period of transition will not
     end until we need 128-bit file sizes, requiring yet another transition!
     The proposed changes may also be used as a model for the 64 to 128-bit
     file size transition.
Offset maximum
     Most, but unfortunately not all, of the numeric values in the SUS are
     protected by opaque type definitions. In theory this allows programs to
     use these types rather than the underlying C language data types to
     avoid issues like overflow. However, most existing code maps these
     opaque data types like off_t to long integers that can overflow for the
     values needed to represent the offsets possible in large files.

     To protect existing binaries from arbitrarily large files, a new value
     (offset maximum) will be part of the open file description. An offset
     maximum is the largest offset that can be used as a file offset.
     Operations attempting to go beyond the offset maximum will return an
     error. The offset maximum is normally established as the size of the
     off_t "extended signed integral type" used by the program creating the
     file description.

     The open() function and other interfaces establish the offset maximum
     for a file description, returning an error if the file size is larger
     than the offset maximum at the time of the call. Returning errors when
     the offset maximum is (or is likely to be) exceeded protects existing
     binaries effectively.
EOVERFLOW
     In a system with binaries compiled to support different sizes of off_t,
     operations such as read() or write() can attempt to reach parts of a
     large file beyond the range of an off_t or other limit. The existing
     SUS does not define an error for this case. EOVERFLOW is an existing
     error type that must be added to a number of system interfaces to
     communicate the new error condition to applications.
Development models
     In addition to supporting environments requiring mixed sizes of off_t,
     the LFS also considered the development model. To maintain older
     programs that have not been converted to support arbitrary file sizes,
     it is necessary to specify the size of off_t and related data types.
     Two compilation models and the means to control them are specified in
     section 3.3 Accessing the Extensions to the SUS. A new set of
     transitional extensions will probably be needed when the next jump to
     larger file sizes occurs. The changes specified for the SUS, however,
     are size neutral.
     Selectable off_t
          In this model, the size of off_t is specified at compile time, and
          the appropriate set of libraries, headers and data types is chosen
          during the compilation and linking process. All existing binaries
          default to an off_t the size of a long integer.
     Explicit off_t
          In this model, the size of off_t is specified during application
          design. The system interface specified explicitly uses an off_t of
          a particular length. On a 32-bit system, for example, use of
          open() implies an off_t of 32 bits and use of open64() implies an
          off64_t of 64 bits. While the model is very useful for supporting
          incremental conversions and writing system software, it is not
          directly supported in the SUS. A proposed set of transitional
          extensions is described in section 3.0 Transitional Extensions to
          the Single UNIX Specification. These transitional interfaces
          support only the 32-bit to 64-bit file offset transition.

1.5 Changes and Additions

The requirements and concepts defined above have been consistently and
completely applied to the SUS to generate the changes and additions
specified in sections 2.0 Changes to the Single UNIX Specification and 3.0
Transitional Extensions to the Single UNIX Specification. The changes are
classified as:

Changes to System Interface Definitions
     The terms extended signed integral type, extended unsigned integral
     type, offset maximum and saved resource limits have been defined.
Changes to System Interfaces and Headers
     EOVERFLOW, EFBIG and EINVAL are added or updated wherever needed.

     The open() and fcntl() functions have been changed to support the
     offset maximum.

     The fseeko() and ftello() functions have been added because the
     existing fseek() and ftell() do not use the required opaque types.

     Data types, declarations and symbolic constants were added to or
     changed in headers.
Changes to Commands and Utilities
     Utilities needed to establish a minimally complete system that can
     support large files which require conversion are defined. A complete
     conversion is both expensive and unnecessary for effective use of large
     files.
Transitional Extensions
     The proposed transitional extensions including interfaces, macros and
     data types have been defined.

1.6 Conformance

A conforming implementation will supply all the interfaces that are
specified in 2.0 Changes to the Single UNIX Specification (except that
implementations need not provide the asynchronous I/O interfaces:
aio_read(), aio_write(), and lio_listio()) and will define _LFS_LARGEFILE to
be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that provides asynchronous I/O interfaces and
the extensions to them specified in 2.0 Changes to the Single UNIX
Specification will define _LFS_ASYNCHRONOUS_IO to be 1 (see 3.1.2.12
<unistd.h>).

A conforming implementation that provides the explicit 64-bit interfaces
will provide at least those interfaces specified in 3.1.1.1.3 Other
Interfaces, 3.1.1.2 fcntl(), 3.1.1.3 open(), and 3.1.2 Transitional
Extensions to Headers (except that changes specified in 3.1.2.2 <aio.h> and
3.1.2.6 <stdio.h> need not be supported) and will define _LFS64_LARGEFILE to
be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that defines _LFS64_LARGEFILE to be 1 and
provides the explicit 64-bit interfaces for asynchronous I/O specified in
3.1.1.1.1 Asynchronous I/O Interfaces will define _LFS64_ASYNCHRONOUS_IO to
be 1 (see 3.1.2.12 <unistd.h>).

A conforming implementation that defines _LFS64_LARGEFILE to be 1 and
provides the explicit 64-bit STDIO interfaces specified in 3.1.1.1.2 STDIO
Interfaces and 3.1.2.6 <stdio.h> will define _LFS64_STDIO to be 1 (see
3.1.2.12 <unistd.h>).

2.0 Changes to the Single UNIX Specification

2.1 Changes to CAE Specification System Interface Definitions, Issue 4,
Version 2

The following definitions will be added to System Interface Definitions,
Chapter 2, Glossary:

extended signed integral type
     a signed integral type or an implementation-specific type with similar
     properties.
extended unsigned integral type
     an unsigned integral type or an implementation-specific type with
     similar properties.
offset maximum
     an attribute of an open file description representing the largest value
     that can be used as a file offset.
saved resource limits
     an attribute of a process that provides some flexibility in the
     handling of unrepresentable resource limits, as described in the exec
     family of functions and setrlimit().

     (Note the attribute "resource limits" as used in the SUS is not
     defined.)

2.2 Changes to CAE Specification System Interfaces and Headers, Issue 4,
Version 2

2.2.1 Changes to System Interfaces

The following changes will be made to System Interfaces and Headers, Chapter
3, System Interfaces. The Asynchronous I/O interfaces (aio_read(),
aio_write() and lio_listio()) should be included when POSIX.1b is added in a
future revision to the SUS.

2.2.1.1 aio_read()

DESCRIPTION

     For regular files, no data transfer will occur past the offset
     maximum established in the open file description associated with
     aiocbp->aio_fildes.

ERRORS

     The following is an additional condition which may be detected
     synchronously or asynchronously:

     [EOVERFLOW]
          The file is a regular file, aiocbp->aio_nbytes is greater
          than 0 and the starting offset in aiocbp->aio_offset is
          before the end-of-file and is at or beyond the offset maximum
          in the open file description associated with
          aiocbp->aio_fildes.

     Note: This is a new error condition.

2.2.1.2 aio_write()

DESCRIPTION

     For regular files, no data transfer will occur past the offset
     maximum established in the open file description associated with
     aiocbp->aio_fildes.

ERRORS

     The following is an additional condition which may be detected
     synchronously or asynchronously:

     [EFBIG]
          The file is a regular file, aiocbp->aio_nbytes is greater
          than 0 and the starting offset in aiocbp->aio_offset is at or
          beyond the offset maximum in the open file description
          associated with aiocbp->aio_fildes.

     Note: This is an additional EFBIG error condition.

2.2.1.3 exec

DESCRIPTION

     The saved resource limits in the new process image are set to be a
     copy of the process's corresponding hard and soft resource limits.

2.2.1.4 fclose(), fflush(), fputwc(), fputws(), fseek(), putwc(), putwchar()

ERRORS

     These functions will fail if:

     [EFBIG]
          The file is a regular file and an attempt was made to write
          at or beyond the offset maximum associated with the
          corresponding stream.

     Note: This is an additional EFBIG error condition.

2.2.1.5 fcntl()

DESCRIPTION

     An unlock (F_UNLCK) request in which l_len is non-zero and the
     offset of the last byte of the requested segment is the maximum
     value for an object of type off_t, when the process has an
     existing lock in which l_len is 0 and which includes the last byte
     of the requested segment, will be treated as a request to unlock
     from the start of the requested segment with an l_len equal to 0.
     Otherwise an unlock (F_UNLCK) request will attempt to unlock only
     the requested segment.

ERRORS

     The fcntl() function will fail if:

     [EOVERFLOW]
          One of the values to be returned cannot be represented
          correctly.
     [EOVERFLOW]
          The cmd argument is F_GETLK, F_SETLK or F_SETLKW and the
          smallest or, if l_len is non-zero, the largest, offset of any
          byte in the requested segment cannot be represented correctly
          in an object of type off_t.

     Note: These are new error conditions.

2.2.1.6 fdopen()

DESCRIPTION

     The fdopen() function will preserve the offset maximum previously
     set for the open file description corresponding to fildes.

2.2.1.7 fgetc(), fgets(), fgetwc(), fgetws(), fread(), fscanf(), getc(),
getchar(), gets(), getw(), getwc(), getwchar(), scanf()

ERRORS

     These functions will fail if data needs to be read and:

     [EOVERFLOW]
          The file is a regular file and an attempt was made to read at
          or beyond the offset maximum associated with the
          corresponding stream.

     Note: This is a new error condition.

2.2.1.8 fgetpos()

ERRORS

     The fgetpos() function will fail if:

     [EOVERFLOW]
          The current value of the file position cannot be represented
          correctly in an object of type fpos_t.

     Note: This is a new error condition.

2.2.1.9 fopen(), freopen(), tmpfile()

DESCRIPTION

     The largest value that can be represented correctly in an object
     of type off_t will be established as the offset maximum in the
     open file description.

ERRORS

     The fopen() and freopen() functions will fail if:

     [EOVERFLOW]
          The named file is a regular file and the size of the file
          cannot be represented correctly in an object of type off_t.

     Note: This is a new error condition.

2.2.1.10 fpathconf() and pathconf()

DESCRIPTION

  Variable          Value of name          Notes
  FILESIZEBITS      _PC_FILESIZEBITS       3,4

2.2.1.11 fprintf(), fputc(), fputs(), fwrite(), printf(), putc(), putchar(),
puts(), putw(), vfprintf(), vprintf()

ERRORS

     These functions will fail if either the stream is unbuffered or
     the stream's buffer needed to be flushed and:

     [EFBIG]
          The file is a regular file and an attempt was made to write
          at or beyond the offset maximum.

     Note: This is an additional EFBIG error condition.

2.2.1.12 fseek()

ERRORS

     The fseek() function will fail if:

     [EOVERFLOW]
          The resulting file offset would be a value which cannot be
          represented correctly in an object of type long.

     Note: This is a new error condition.

2.2.1.13 fseeko()

DESCRIPTION

     The fseeko() function is identical to the modified fseek() except
     that the offset argument is of type off_t and the EOVERFLOW error
     is changed as follows:

ERRORS

     [EOVERFLOW]
          The resulting file offset would be a value which cannot be
          represented correctly in an object of type off_t.

     Note: This is a new function.

2.2.1.14 fstat(), lstat() and stat()

ERRORS

     These functions will fail if:

     [EOVERFLOW]
          The file size in bytes or the number of blocks allocated to
          the file or the file serial number cannot be represented
          correctly in the structure pointed to by buf.

     Note: This is an additional EOVERFLOW error condition.

2.2.1.15 fstatvfs() and statvfs()

ERRORS

     These functions will fail if:

     [EOVERFLOW]
          One of the values to be returned cannot be represented
          correctly in the structure pointed to by buf.

     Note: This is a new error condition.

2.2.1.16 ftell()

ERRORS

     The ftell() function will fail if:

     [EOVERFLOW]
          The current file offset cannot be represented correctly in an
          object of type long.

     Note: This is a new error condition.

2.2.1.17 ftello()

DESCRIPTION

     The ftello() function is identical to the modified ftell() except
     that the return value is of type off_t and the EOVERFLOW error is
     changed as follows:

ERRORS

     [EOVERFLOW]
          The current file offset cannot be represented correctly in an
          object of type off_t.

     Note: This is a new function.

2.2.1.18 ftruncate()

ERRORS

     The ftruncate() function will fail if:

     [EFBIG]
          The file is a regular file and length is greater than the
          offset maximum established in the open file description
          associated with fildes.

     Note: This is an additional EFBIG error condition.

2.2.1.19 getrlimit() and setrlimit()

DESCRIPTION

     When using the getrlimit() function, if a resource limit can be
     represented correctly in an object of type rlim_t then its
     representation is returned; otherwise if the value of the resource
     limit is equal to that of the corresponding saved hard limit the
     value returned is RLIM_SAVED_MAX; otherwise the value returned is
     RLIM_SAVED_CUR.

     When using the setrlimit() function, if the requested new limit is
     RLIM_INFINITY the new limit will be "no limit"; otherwise if the
     requested new limit is RLIM_SAVED_MAX the new limit will be the
     corresponding saved hard limit; otherwise if the requested new
     limit is RLIM_SAVED_CUR the new limit will be the corresponding
     saved soft limit; otherwise the new limit will be the requested
     value. In addition, if the corresponding saved limit can be
     represented correctly in an object of type rlim_t then it will be
     overwritten with the new limit.

     The result of setting a limit to RLIM_SAVED_MAX or RLIM_SAVED_CUR
     is unspecified unless a previous call to getrlimit() returned that
     value as the soft or hard limit for the corresponding resource
     limit.

     The determination of whether a limit can be correctly represented
     in an object of type rlim_t is implementation-dependent. For
     example, some implementations permit a limit whose value is
     greater than RLIM_INFINITY and others do not.

     The exec family of functions also cause resource limits to be
     saved. (See 2.2.1.3 exec).

2.2.1.20 lio_listio()

DESCRIPTION

     For regular files, no data transfer will occur past the offset
     maximum established in the open file description associated with
     aiocbp->aio_fildes.

ERRORS

     The following are additional error codes which may be set for each
     aiocb control block:

     [EOVERFLOW]
          The aiocbp->aio_lio_opcode is LIO_READ, the file is a regular
          file, aiocbp->aio_nbytes is greater than 0, and the
          aiocbp->aio_offset is before the end-of-file and is greater
          than or equal to the offset maximum in the open file
          description associated with aiocbp->aio_fildes.
     [EFBIG]
          The aiocbp->aio_lio_opcode is LIO_WRITE, the file is a
          regular file, aiocbp->aio_nbytes is greater than 0, and the
          aiocbp->aio_offset is greater than or equal to the offset
          maximum in the open file description associated with
          aiocbp->aio_fildes.

     Note: These are additional EFBIG and EOVERFLOW error conditions.

2.2.1.21 lockf()

DESCRIPTION

     An F_ULOCK request in which size is non-zero and the offset of the
     last byte of the requested section is the maximum value for an
     object of type off_t, when the process has an existing lock in
     which size is 0 and which includes the last byte of the requested
     section, will be treated as a request to unlock from the start of
     the requested section with a size equal to 0. Otherwise an F_ULOCK
     request will attempt to unlock only the requested section.

ERRORS

     The lockf() function will fail if:

     [EINVAL]
          The function argument is not one of F_LOCK, F_TLOCK, F_TEST
          or F_ULOCK; or size plus the current file offset is less than
          0.
     [EOVERFLOW]
          The offset of the first, or if size is not 0 then the last,
          byte in the requested section cannot be represented correctly
          in an object of type off_t.

     Note: This is a clarification of the EINVAL error condition.
     Note: EOVERFLOW is a new error condition.

2.2.1.22 lseek()

ERRORS

     The lseek() function will fail if:

     [EOVERFLOW]
          The resulting file offset would be a value which cannot be
          represented correctly in an object of type off_t.

     Note: This is a new error condition.

2.2.1.23 mmap()

ERRORS

     The mmap() function will fail if:

     [EOVERFLOW]
          The file is a regular file and the value of off plus len
          exceeds the offset maximum established in the open file
          description associated with fildes.

     Note: This is a new error condition.

2.2.1.24 open()

DESCRIPTION

     The largest value that can be represented correctly in an object
     of type off_t will be established as the offset maximum in the
     open file description.

ERRORS

     The open() function will fail if:

     [EOVERFLOW]
          The named file is a regular file and the size of the file
          cannot be represented correctly in an object of type off_t.

     Note: This is a new error condition.

2.2.1.25 read() and readv()

DESCRIPTION

     For regular files, no data transfer will occur past the offset
     maximum established in the open file description associated with
     fildes.

ERRORS

     The read() and readv() functions will fail if:

     [EOVERFLOW]
          The file is a regular file, nbyte is greater than 0, the
          starting position is before the end-of-file and the starting
          position is greater than or equal to the offset maximum
          established in the open file description associated with
          fildes.

     Note: This is a new error condition.

2.2.1.26 readdir()

ERRORS

     The readdir() function will fail if:

     [EOVERFLOW]
          One of the values in the structure to be returned cannot be
          represented correctly.

     Note: This is a new error condition.

2.2.1.27 write() and writev()

DESCRIPTION

     For regular files, no data transfer will occur past the offset
     maximum established in the open file description associated with
     fildes.

ERRORS

     These functions will fail if:

     [EFBIG]
          The file is a regular file, nbyte is greater than 0 and the
          starting position is greater than or equal to the offset
          maximum established in the open file description associated
          with fildes.

     Note: This is an additional EFBIG error condition.

2.2.2 Changes to Headers

The following changes will be made to System Interfaces and Headers, Chapter
4, Headers.

2.2.2.1 <limits.h>

The following symbolic constant is defined as a Pathname Variable Value:

Name             Description                Acceptable Value
FILESIZEBITS     Minimum number of bits             *
                 needed to represent,
                 as a signed integer
                 value, the maximum size
                 of a regular file
                 allowed in the
                 specified directory.

2.2.2.2 <stdio.h>

The following are declared as functions and may also be defined as macros:

int         fseeko(FILE *stream, off_t offset, int whence);
off_t       ftello(FILE *stream);

The type off_t is defined through typedef as described in <sys/types.h>.

2.2.2.3 <sys/resource.h>

The following symbolic constants are defined:

RLIM_SAVED_MAX     A value of type rlim_t indicating an
                   unrepresentable saved hard limit.
RLIM_SAVED_CUR     A value of type rlim_t indicating an
                   unrepresentable saved soft limit.

On implementations where all resource limits are representable in an object
of type rlim_t, RLIM_SAVED_MAX and RLIM_SAVED_CUR need not be distinct from
RLIM_INFINITY.

2.2.2.4 <sys/stat.h>

The type of st_blocks in the stat structure will be changed to:

blkcnt_t    st_blocks   number of blocks allocated for this
                        object.

2.2.2.5 <sys/statvfs.h>

The types of the fields below in the statvfs structure will be changed to:

fsblkcnt_t  f_blocks    total number of blocks in the file
                        system in units of f_frsize.
fsblkcnt_t  f_bfree     total number of free blocks.
fsblkcnt_t  f_bavail    number of free blocks available to
                        non-privileged process.
fsfilcnt_t  f_files     total number of file serial numbers.
fsfilcnt_t  f_ffree     total number of free file serial
                        numbers.
fsfilcnt_t  f_favail    number of free file serial numbers
                        available to non-privileged process.

2.2.2.6 <sys/types.h>

The following data types will be defined:

blkcnt_t                Used for file block counts.
fsblkcnt_t              Used for file system block counts.
fsfilcnt_t              Used for file system file counts.

The types blkcnt_t and off_t are defined as extended signed integral types.

The types fsblkcnt_t, fsfilcnt_t, and ino_t are defined as extended unsigned
integral types.

2.2.2.7 <unistd.h>

The following symbolic constant is defined for pathconf():

     _PC_FILESIZEBITS

2.3 Changes to CAE Specification Commands and Utilities, Issue 4, Version 2

The following changes will be made to Commands and Utilities, Chapter 3,
Utilities.

2.3.1 Considerations for Utilities in Support of Files of Arbitrary Size

Note: This is a new section and should be added to Commands and Utilities,
Issue 4, Version 2, Chapter 3 after section 1.2.1, Symbolic Links.

The following utilities will support files of any size up to the maximum
that can be created by the implementation. This support includes correct
writing of file size related values (such as file sizes and offsets, line
numbers, and block counts) and correct interpretation of command line
arguments that contain such values.

basename   return non-directory portion of pathname
cat        concatenate and print files
cd         change working directory
chgrp      change file group ownership
chmod      change file modes
chown      change file ownership
cksum      write file checksums and sizes
cmp        compare two files
cp         copy files
dd         convert and copy a file
df         report free disk space
dirname    return directory portion of pathname
du         estimate file space usage
find       find files
ln         link files
ls         list directory contents
mkdir      make directories
mv         move files
pathchk    check pathnames
pwd        return working directory name
rm         remove directory entries
rmdir      remove directories
sh         shell, the standard command language interpreter
sum        print checksum and block or byte count of a file
test       evaluate expression
touch      change file access and modification times
ulimit     set or report file size limit

Exceptions to the requirement that utilities support files of any size up to
the maximum are:

  1. Utilities such as tar and cpio cannot support arbitrary file sizes due
     to limitations imposed by fixed file formats.
  2. Uses of files as command scripts, or for configuration or control, are
     exempt. For example, it is not required that sh be able to read an
     arbitrarily large ".profile".
  3. Shell input and output redirection are exempt. For example, it is not
     required that the redirections sum < file or echo foo > file succeed
     for an arbitrarily large existing file.

2.3.2 The sh Utility

DESCRIPTION:

     Pathname expansion will not fail due to the size of a file.

     Shell input and output redirections will have an
     implementation-specific offset maximum that will be established in
     the open file description.

2.3.3 The pax Utility

APPLICATION USAGE

     The pax utility is not able to handle arbitrary file sizes. There
     is currently a proposal in ballot in IEEE Project 1003.2b to
     address this issue.

3.0 Transitional Extensions to the Single UNIX Specification

The interfaces, macros and data types in this section are explicitly 64-bit
instances of the corresponding SUS and POSIX.1b interfaces, macros and data
types. The function prototype and semantics of a transitional interface will
be equivalent to those of the SUS version of the call. Version test macros
announcing extensions to the SUS are also defined.

The transitional extensions in this section are intended to be temporary.
While an application using this specification may be using non-POSIX
conforming transitional extensions to operating system functions, this does
not require that system vendors break their POSIX compliance. This
specification is intended to be compatible with the standards. The
transitional extensions are provided so that system vendors may define a
common set of large file capable extensions to their current compliant
systems without violating that compliance.

3.1 Transitional Extensions to CAE Specification System Interfaces and
Headers, Issue 4, Version 2

3.1.1 Transitional Extensions to System Interfaces

3.1.1.1 64-bit Versions of Interfaces

The following interfaces are explicitly 64-bit versions of the corresponding
Single UNIX Specification and POSIX.1b interfaces. There is no functional
difference between these and the corresponding Single UNIX Specification and
POSIX.1b interfaces.

3.1.1.1.1 Asynchronous I/O Interfaces

aio_cancel64()         aio_error64()
aio_fsync64()          aio_read64()
aio_return64()         aio_suspend64()
aio_write64()          lio_listio64()

3.1.1.1.2 STDIO Interfaces

fgetpos64()            fopen64()
freopen64()            fseeko64()
fsetpos64()            ftello64()
tmpfile64()

3.1.1.1.3 Other Interfaces

creat64()             fstat64()
fstatvfs64()          ftruncate64()
ftw64()               getrlimit64()
lockf64()             lseek64()
lstat64()             mmap64()
nftw64()              open64()
readdir64()           setrlimit64()
stat64()              statvfs64()
truncate64()

3.1.1.2 fcntl()

DESCRIPTION

     The following additional value may be used in constructing oflag:

     O_LARGEFILE
          If set, the offset maximum in the open file description will
          be the largest value that can be represented correctly in an
          object of type off64_t.

     The behavior of the following additional values is equivalent to
     the corresponding Single UNIX Specification value (FGETLK, FSETLK,
     FSETLKW), but they take a struct flock64 argument rather than a
     struct flock argument.

     FGETLK64
     FSETLK64
     FSETLKW64

3.1.1.3 open()

DESCRIPTION

     The following additional value may be used in constructing oflag:

     O_LARGEFILE
          If set, the offset maximum in the open file description will
          be the largest value that can be represented correctly in an
          object of type off64_t.

ERRORS

     The open() function will fail if:

     [EOVERFLOW]
          The named file is a regular file and either O_LARGEFILE is
          not set and the size of the file cannot be represented
          correctly in an object of type off_t or O_LARGEFILE is set
          and the size of the file cannot be represented correctly in
          an object of type off64_t.

APPLICATION USAGE

     Note that using open64() is equivalent to using open() with
     O_LARGEFILE set in oflag.

Note: For the transitional extensions these changes to open() are in place
of the changes described in 2.2.1.24 open() relating to the changes to the
SUS.

3.1.2 Transitional Extensions to Headers

The modifications to the headers in this section are necessary to implement
the transitional extensions as described in 3.0 Transitional Extensions to
the Single UNIX Specification.

3.1.2.1 64-bit Versions of Headers

In summary, the changes to the headers involve the following data types,
structures and symbolic constants:

3.1.2.1.1 Data Types

blkcnt_t               fsblkcnt_t
fsfilcnt_t             fpos_t
ino_t                  off_t
rlim_t

3.1.2.1.2 Structures

struct dirent          struct flock
struct rlimit          struct stat
struct statvfs

3.1.2.1.3 Symbolic Constants

F_GETLK                F_SETLK
F_SETLKW               RLIM_INFINITY
RLIM_SAVED_MAX         RLIM_SAVED_CUR

3.1.2.2 <aio.h>

The aiocb64 structure is defined in the same way as the aiocb structure in
the POSIX.1b with the exception of the following member:

off64_t        aio_offset

The following are declared as functions and may be defined as macros.

int     aio_read64(struct aiocb64 *aiocbp);
int     aio_write64(struct aiocb64 *aiocbp);
int     lio_listio64(int mode, struct aiocb64 *const list[],
            int nent, struct sigevent *sig);
int     aio_error64(const struct aiocb64 *aiocbp);
ssize_t aio_return64(struct aiocb64 *aiocbp);
int     aio_cancel64(int fildes, struct aiocb64 *aiocbp);
int     aio_suspend64(const struct aiocb64 *const list[],
            int nent, const struct timespec *timeout);
int     aio_fsync64(int op, struct aiocb64 *aiocbp);

3.1.2.3 <dirent.h>

The dirent64 structure is defined in the same way as the dirent structure in
the Single UNIX Specification with the exception of the following member:

ino64_t       d_ino     file serial number.

The following is declared as a function and may also be defined as a macro:

struct dirent64 *readdir64(DIR *dirp);

3.1.2.4 <fcntl.h>

The flock64 structure is defined in the same way as the flock structure in
the Single UNIX Specification with the exception of the following members:

off64_t       l_start relative offset in bytes.
off64_t       l_len   size.

Additional values for cmd used by fcntl():

F_GETLK64     Get record locking information using struct
              flock64.
F_SETLK64     Establish a record lock using struct flock64.
F_SETLKW64    Establish a record lock, blocking, using struct
              flock64.

An additional file status flag, used by open() and fcntl(), is defined:

O_LARGEFILE     The offset maximum in the open file description
                is the largest value that can be represented
                correctly in an object of type off64_t.

The following are declared as functions and may also be defined as macros:

int     creat64(const char *path, mode_t mode);
int     open64(const char *path, int oflag, ...);

3.1.2.5 <ftw.h>

The following are declared as functions and may also be defined as macros:

int ftw64(const char *path,
    int (*fn)(const char *, const struct stat64 *, int),
    int ndirs);
int nftw64(const char *path,
    int (*fn)(const char *, const struct stat64 *, int,
               struct FTW *),
    int depth, int flags);

3.1.2.6 <stdio.h>

The following data type is defined through typedef:

fpos64_t  Type containing all information needed to specify
          uniquely every position within a file in which the
          largest offset can be represented in an object of type
          off64_t.

The following are declared as functions and may also be defined as macros:

int       fgetpos64(FILE *stream, fpos64_t *pos);
FILE     *fopen64(const char *filename, const char *mode);
FILE     *freopen64(const char *filename, const char *mode,
               FILE *stream);
int       fseeko64(FILE *stream, off64_t offset, int whence);
int       fsetpos64(FILE *stream, const fpos64_t *pos);
off64_t   ftello64(FILE *stream);
FILE     *tmpfile64(void);

3.1.2.7 <sys/mman.h>

The following is declared as a function and may also be defined as a macro:

void     *mmap64(void *addr, size_t len, int prot, int flags,
                int fd, off64_t offset);

3.1.2.8 <sys/resource.h>

The following data type is defined through typedef:

rlim64_t    type used for limit values.

The type rlim64_t must be an extended unsigned arithmetic type that can
represent correctly any non-negative value of an off64_t.

The following symbolic constants are defined:

RLIM64_INFINITY    A value of type rlim64_t indicating no limit.
RLIM64_SAVED_MAX   A value of type rlim64_t indicating an
                   unrepresentable saved hard limit.
RLIM64_SAVED_CUR   A value of type rlim64_t indicating an
                   unrepresentable saved soft limit.

On implementations where all resource limits are representable in an object
of type rlim64_t, RLIM64_SAVED_MAX and RLIM64_SAVED_CUR need not be distinct
from RLIM64_INFINITY.

The rlimit64 structure is defined in the same way as the rlimit structure in
the Single UNIX Specification with the exception of the following members:

rlim64_t  rlim_cur      the current (soft) limit.
rlim64_t  rlim_max      the hard limit.

The following are declared as functions and may also be defined as macros:

int       getrlimit64(int resource, struct rlimit64 *rlp);
int       setrlimit64(int resource, const struct rlimit64 *rlp);

3.1.2.9 <sys/stat.h>

The stat64 structure is defined in the same way as the stat structure in the
Single UNIX Specification with the exception of the following members:

ino64_t     st_ino      file serial number.
off64_t     st_size     file size in bytes.
blkcnt64_t  st_blocks   number of blocks allocated for this
                        object.

The following are declared as functions and may also be defined as macros:

int         fstat64(int fildes, struct stat64 *buf);
int         lstat64(const char *, struct stat64 *buf);
int         stat64(const char *, struct stat64 *buf);

3.1.2.10 <sys/statvfs.h>

The statvfs64 structure is defined in the same way as the statvfs structure
in the Single UNIX Specification with the exception of the following
members:

fsblkcnt64_t  f_blocks  total number of blocks in the file
                        system in units of f_frsize.
fsblkcnt64_t  f_bfree   total number of free blocks.
fsblkcnt64_t  f_bavail  number of free blocks available to
                        non-privileged process.
fsfilcnt64_t  f_files   total number of file serial numbers.
fsfilcnt64_t  f_ffree   total number of free file serial
                        numbers.
fsfilcnt64_t  f_favail  number of free file serial numbers
                        available to non-privileged process.

The following are declared as functions and may also be defined as macros:

int         statvfs64(const char *path, struct statvfs64 *buf);
int         fstatvfs64(int fildes, struct statvfs64 *buf);

3.1.2.11 <sys/types.h>

The following data types are defined through typedef:

blkcnt64_t      Used for file block counts.
fsblkcnt64_t    Used for file system block counts.
fsfilcnt64_t    Used for file system file counts.
ino64_t         Used for file serial numbers.
off64_t         Used for file sizes.

The types blkcnt64_t and off64_t are defined as extended signed integral
types.

The types fsblkcnt64_t, fsfilcnt64_t, and ino64_t are defined as extended
unsigned integral types.

3.1.2.12 <unistd.h>

The following are declared as functions and may also be defined as macros:

int         lockf64(int fildes, int function, off64_t size);
off64_t     lseek64(int fildes, off64_t offset, int whence);
int         ftruncate64(int fildes, off64_t length);
int         truncate64(const char *path, off64_t length);

Version Test Macros:
_LFS_LARGEFILE   is defined to be 1 if the implementation
                 supports the interfaces as specified in
                 2.2.1 Changes to System Interfaces
                 except that implementations need not provide
                 the asynchronous I/O interfaces: aio_read(),
                 aio_write(), and lio_listio().
_LFS_ASYNCHRONOUS_IO
                 is defined to be 1 if the implementation
                 supports the asynchronous IO interfaces:
                 aio_read(), aio_write(), and lio_listio() as
                 specified in 2.2.1 Changes to
                 System Interfaces.
_LFS64_ASYNCHRONOUS_IO
                 is defined to be 1 if the implementation
                 supports all the transitional extensions
                 listed in 3.1.1.1.1 Asynchronous I/O Interfaces
                 and 3.1.2.2 <aio.h>.
_LFS64_LARGEFILE is defined to be 1 if the implementation
                 supports all the transitional extensions
                 listed in 3.1.1.1.3 Other Interfaces,
                 3.1.1.2 fcntl(), 3.1.1.3 open() and
                 3.1.2 Transitional Extensions to Headers,
                 except changes specified in 3.1.2.2 <aio.h>
                 and 3.1.2.6 <stdio.h> need not be supported.
_LFS64_STDIO     is defined to be 1 if the implementation
                 supports all the transitional extensions
                 listed in 3.1.1.1.2 STDIO Interfaces
                 and 3.1.2.6 <stdio.h>.

                 If _LFS64_STDIO is not defined to be 1 and the
                 underlying file description associated with
                 stream has O_LARGEFILE set then the behavior
                 of the Standard I/O functions is unspecified.

Constants for Functions:
    _CS_LFS_CFLAGS       for confstr().
    _CS_LFS_LDFLAGS      for confstr().
    _CS_LFS_LIBS         for confstr().
    _CS_LFS_LINTFLAGS    for confstr().

    _CS_LFS64_CFLAGS     for confstr().
    _CS_LFS64_LDFLAGS    for confstr().
    _CS_LFS64_LIBS       for confstr().
    _CS_LFS64_LINTFLAGS  for confstr().

3.2 Transitional Extensions to the mount Utility

3.2.1 Optional Additional Option for the mount utility

If the -o nolargefiles option is specified and is supported by the file
system, then for the duration of the mount it is guaranteed that all regular
files in the file system have a file size that will fit in the smallest
object of type off_t supported by the system performing the mount. The mount
will fail if there are any files in the file system not meeting this
criterion.

If -o largefiles is specified then there is no such guarantee.

The default behavior is implementation-dependent.

3.3 Accessing the Extensions to the SUS

3.3.1 Compilation Environment - Visibility of Additions to the API

Applications which define the macro _LARGEFILE_SOURCE to be 1 before
inclusion of any header will enable at least the functionality described in
2.0 Changes to the Single UNIX Specification on implementations that support
these features. Implementations that support these features will define
_LFS_LARGEFILE to be 1 in <unistd.h>, as described in 3.1.2.12 <unistd.h>.

3.3.2 Compilation Environment - Visibility of Transitional API

Applications which define the macro _LARGEFILE64_SOURCE to be 1 before
inclusion of any header will enable at least the fseeko(), ftello()
extensions to the SUS (see 2.2.1.13 fseeko(), 2.2.1.17 ftello() and 2.2.2.2
<stdio.h>) and the transitional extensions described in 3.1 Transitional
Extensions to CAE Specification System Interfaces and Headers, Issue 4,
Version 2 on implementations that support these features. Implementations
that support these features will define _LFS64_LARGEFILE,
_LFS64_ASYNCHRONOUS_IO and _LFS64_STDIO to be 1 in <unistd.h>, as described
in 3.1.2.12 <unistd.h>.

3.3.3 Mixed API and Compile Environments Within a Single Process

It is permitted to use both the Single UNIX Specification and the
transitional APIs within the same executable, including within the same
source file, and to use both on the same file descriptor whether in the same
process or in different processes (when an open file descriptor is passed or
inherited).

3.3.4 Utilities: Optional Method for Specifying the Size of an off_t

For programs to take advantage of different environments, it is necessary to
compile them for each particular environment. For programs to make use of
the features described in this section they must be compiled with new
compiler and linker options. The getconf utility called with the new
arguments can be used to generate compiler and linker options.

Example 1:

An example of compiling a program with a "large" off_t and that uses
fseeko() and ftello() and uses yacc:

   c89 -D_LARGEFILE_SOURCE     -o foo      \
        $(getconf LFS_CFLAGS)  y.tab.c b.o \
        $(getconf LFS_LDFLAGS)             \
        -ly $(getconf LFS_LIBS)

Example 2:

An example of compiling a program with a "large" off_t and that does not use
fseeko() and ftello() and has no application specific libraries:

   c89  $(getconf LFS_CFLAGS)  a.c         \
        $(getconf LFS_LDFLAGS)             \
        $(getconf LFS_LIBS)

Example 3:

An example of compiling a program with a "default" off_t and that uses
fseeko() and ftello():

   c89 -D_LARGEFILE_SOURCE     a.c

Example 4:

An example of compiling a program using transitional versions of SUS
interfaces such as lseek64() and fopen64():

   c89  -D_LARGEFILE64_SOURCE              \
        $(getconf LFS64_CFLAGS)  a.c       \
        $(getconf LFS64_LDFLAGS)           \
        $(getconf LFS64_LIBS)

Example 5:

An example of running lint on a program with a "large" off_t:

   lint -D_LARGEFILE_SOURCE                \
        $(getconf LFS_LINTFLAGS) ...       \
        $(getconf LFS_LIBS)

Example 6: An example of running lint on a program using the transitional
API:

   lint -D_LARGEFILE64_SOURCE              \
        $(getconf LFS64_LINTFLAGS) ...     \
        $(getconf LFS64_LIBS)

These examples show the need for the additional variables LFS_CFLAGS,
LFS_LDFLAGS, LFS_LIBS, LFS_LINTFLAGS, LFS64_CFLAGS, LFS64_LDFLAGS,
LFS64_LIBS and LFS64_LINTFLAGS to be reported by getconf.

Implementations may permit the linking of object files that are compiled
with differing off_t environments. For example, an object module compiled
with a 32-bit off_t can be linked with an object module compiled with a
64-bit off_t. In such a case, both 32-bit off_t and 64-bit off_t API calls
may be used on the same file descriptor. Implementations may instead
disallow this linking.

Appendix A: Rationale and Notes

In a mixed environment the size of an off_t (and other types) might differ
from program to program, and in a transitional environment (see 3.0
Transitional Extensions to the Single UNIX Specification) it might differ
even from routine to routine within a single program. Each specific use of
an off_t has an invariant size that is determined by the compilation
environment. This is referred to below as the size which is "in use".

A.1 Overview

A.1.1 Guiding Principles

A.1.1.1 "No Lies" Rule

An error will be returned whenever a function cannot return the correct
result of an operation.

Returning a "lie" to allow for common uses of a function (e.g. use of stat()
to determine if a file exists) could inadvertently cause a correctly written
application to operate incorrectly.

It is conceivable that returning a "lie" could keep an incorrectly written
application from malfunctioning in a way that creates a serious problem, but
no such applications are known to exist. (Of course it would be easy to
contrive one.)

PASC Interpretation reference 1003.1-90 #38 completed by the POSIX.1
interpretations committee confirms that POSIX.1 conforming implementations
are not allowed to lie to applications. This interpretation explicitly
states that if the file size will not fit in an object of type off_t,
fstat() must fail. In addition, PASC Interpretation reference 1003.1-90 #75
went on to clarify that EOVERFLOW would be a legal extension to report this
condition.

A.1.1.2 "Open Protection" Rule

An open() will fail if the size of the (regular) file cannot be represented
correctly in an object of type off_t.

The size of file on which a program is able to operate is determined by the
off_t in use for the open(). The open protection rule ensures that old
binaries do not operate on files that are too large to handle correctly, and
prevents the binaries from generating incorrect results or corrupting the
data in the file.

An argument against open protection is that requiring opens to fail will
break some binaries that would have worked perfectly well otherwise. For
example, a cat program does a loop of open(), read()/write() pairs, and
close() for each input file. This program would unnecessarily break due to
open protection. But this "Let it Run" argument is flawed in that there is
no known utility which fails due to open protection but would work
"perfectly well" if only we "let it run". Real versions of the cat program
use fstat() to determine whether the input and output files are the same,
have a -n option (count newlines) which will fail on sufficiently large
files and so on.

Another argument against open protection is that it is unnecessary because
an error will be returned as soon as a function cannot return the correct
result of an operation ("No Lies" rule). However, most programs check for
the success of the open() call, but many do not check for overflow or error
after lseek() and other calls. An audit of the standard utilities uncovered
numerous examples.

An argument for open protection is that it increases the likelihood of an
immediate and informative error message. The error message is likely to
include the name of the file that could not be opened. It is much less
likely that an lseek() error message will be as immediate or as informative.
The delay in, or complete lack of, reporting such errors may result in
"silent failure".

Another argument for open protection is that there are numerous plausible
scenarios in which this rule avoids serious harm. It prevents typical
implementations of the touch utility from truncating large files to 0 length
(see A.2.1.1.4 creat()). It can prevent silent failure, which has been
demonstrated to occur in at least one commercial data management system.
With open protection a commercial backup/restore system will report errors
on files that might otherwise result in a corrupted backup tape. It prevents
typical implementations of dbm/ndbm from returning incorrect results from a
database whose size exceeds the off_t in use for the dbm routines.

A.1.1.3 "Read/Write Limit" Rule

For regular files, no data transfer will occur past the offset maximum
established in the open file description.

There are two separate issues for this rule, which are that there is an
application-dependent limit on read() and write(), and that the limit is
"the offset maximum established in the open file description". The second
issue is deferred to A.1.2.1 Offset Maximum. The first issue, that there be
an application-dependent limit, is considered here.

There are two assertions upon which many applications rely:

  1. A file can be read until end-of-file and written until the file system
     is full or some other implementation limit is reached.
  2. The current file offset can be stored correctly in an object of type
     off_t, and any file position that can be reached with read() and write
     can also be reached with lseek().

In a mixed off_t environment these assertions are true only for the largest
supported size of off_t. An audit of typical applications revealed that most
check return codes from read() and write() in order to guard against
end-of-file, full file systems, and the like, but that most do not check for
overflow of file offsets or errors returned by lseek(). This suggests that
it is more important to maintain the truth of the second assertion. In order
to maintain the second assertion, read() and write() must not be permitted
to move the file offset past the largest offset representable by the
application's off_t.

The write limit avoids the unintuitive situation in which a program could
create a file too large for it to open (due to open protection). This could
result in a serious problem. "Can you imagine the reaction of someone who
has 1.9G of data, and all of a sudden, the DBMS can no longer open the file?
I wouldn't want to be working in tech support that day."

An argument for the write limit is that it keeps a program from creating a
file too large for it to handle properly. An argument for the read limit is
that it is a simple way to cover the hole where a file grows after it is
opened.

An argument for the read/write limit rule is that generating an error at
this limit provides the earliest possible warning of an incompatibility
problem that could result in lost or corrupted data if the application was
to continue.

An argument against the read/write limit rule is that it results in
unnecessary breakage of binaries that would have worked perfectly well
otherwise. This is the "Let it Run" argument, but as noted earlier few if
any such programs exist.

Another argument against the read/write limit rule is that implementing it
is expensive and complex. But it has already been implemented and found not
to be either expensive or complex (an analysis appears in A.1.2.1 Offset
Maximum).

Another argument against the read/write limit rule is that it can result in
a truncated log file record (hence corrupting the log file). But this
truncation and corruption can also occur due to insufficient disk space or
RLIMIT_FSIZE, and indeed the standards require that this occur.

Another argument against the read/write limit rule is that instead one can
use the existing file size resource limit (RLIMIT_FSIZE). But this is not a
useful defense in a mixed off_t environment because it unnecessarily
restricts the size of files created by programs which support a larger
off_t. The practical effect will be that use of RLIMIT_FSIZE in this way
will inconvenience users and they will unlimit themselves and then there
will be no write limit. So this is a false, although attractive, argument.

Another argument against the read/write limit rule is that instead there can
be a mount option which limits the maximum size of a file created in the
file system. But regardless of other merits for such an option, it does not
provide a useful defense in a mixed off_t environment because it
unnecessarily restricts the size of files created by programs which support
a larger off_t. The practical effect will be that the system administrator
will be pressured into remounting the file system with no limit and then
there will be no write limit. So this is another false, although attractive,
argument.

A.1.1.4 Holes in the Protection Mechanism

The following holes in the protection mechanism are discussed in other
sections of this document:

   * While a "small" application has a file open another "large" application
     can extend the file (see A.1.2.1 Offset Maximum).
   * The fcntl() function may inadvertently clear O_LARGEFILE (see A.3.1.1.1
     fcntl()).
   * The lseek() failure may result in corruption of log file or database
     (see A.2.1.1.6 fgetpos(), fseek(), ftell(), lseek()).
   * An open file description with a "large" offset maximum may be inherited
     by a "small" application (see A.1.2.2 Inheritance).

A.1.2 Concepts

A.1.2.1 Offset Maximum

The offset maximum is used to implement the read/write limit (see A.1.1.3
"Read/Write Limit" Rule). It is basically a hack to avoid the need to
provide transitional versions of read()/write() and the numerous routines
which call them (getchar(), putchar(), printf(), etc.). For consistency it
also affects the semantics of ftruncate() and mmap().

The offset maximum is an unusual part of this specification as it is
associated with the file description whereas in all other cases the limit is
determined by the size of the type that is used for the call. But
determining the latter for read/write would be extremely difficult in an
environment in which a single process contains calls with differing sizes of
off_t in use (this environment is not part of this section of the
specification, but it is part of the transitional specification). In such an
environment it would be necessary to determine the size of off_t for every
function that might result in a read() or write(). That would include
putchar(), fwrite(), fputs(), fprintf(), puts(), etc. The number of the
routines that might potentially do a read() or write() is too large for such
an implementation to be practical.

It is possible that while a "small" application has a file open another
application with a larger off_t can extend the file beyond the size of the
small application's off_t. This leads to a situation where the small
application has a file descriptor which refers to a file too large for it to
be able to process correctly. That is, open protection has been lost. The
application will still have some protection due to "No Lies" and the
"Read/Write Limit", but these are less effective protections. It is believed
that this case is sufficiently unlikely that it may be safely ignored.

As an added protection, it has been suggested that all file calls should
fail whenever the size of the file cannot be represented correctly in an
object of type off_t. This would defend against the file growth scenario
described above. But checking file size on each read/write might hurt
performance in some cases and also it was not considered an important
defense. It would also have the putchar(), fwrite(), etc. implementation
problem.

It has been suggested that a file should not be permitted to be extended
beyond the size of the smallest offset maximum in any open file description
that refers to the file. It is believed that this is an unnecessary
complication, cannot be enforced for some distributed file systems and
applies only to a situation that it is believed may be safely ignored.

The value of the offset maximum in an open file description will not affect
the semantics of operations related to other open file descriptions or of
operations which create new open file descriptions, including other open
file descriptions which refer to the same file.

An argument against offset maximum is that it is expensive and complex. But
that is not the case. The only implementation that will matter for years is
for 64-bit off_t which

   * can be implemented as a open file flag (O_LARGEFILE -- see 3.1.2.4
     <fcntl.h>).
   * will require about 5 lines in headers (e.g. <sys/fcntl.h>).
   * will require about 0 lines to set it during a 64-bit open().
   * will require about 5 lines of code to check and enforce it in each of
     the kernel implementations of read() and write().
   * will require about 2 lines of code to display it in each of the
     programs which display file flags (e.g. pstat utility).

Documentation would add a dozen or so lines of text, but this part of the
specification does not require such documentation.

A.1.2.1.1 Offset Maximum and the 2G-1 File Size Limit

On implementations where type off_t is a 32-bit two's complement integer,
the maximum value that can be correctly represented in an object of type
off_t is 2^31-1 (2G-1). Because of this, the maximum file size and maximum
file offset of a small file are 2G-1, but the maximum offset of any byte
contained in a small file is 2G-2. An illustration of the offsets (0, 1,
...) of a file, with the bytes (b, B and L) shown as small boxes and the
offset shown as "^" is:

        <- "small" -> | <- "large" >-
    ----------   -----------------------
    | b | b | ::: | b | B | L | L | L | :::
    ^---^---^-   -^---^---^---^---^---^-
    0   1   2     2G  2G  2G
                  -2  -1

Although an lseek() can be done to the 2G-1 offset, a read() or write()
cannot be performed at that position because when B (counting number 2G, but
offset 2G-1) is read or written, the resulting pointer to the next offset
address and the file size itself would overflow.

A.1.2.2 Inheritance

The offset maximum will be inherited via fork(), the exec family of
functions, dup(), and fcntl() called with F_DUPFD, and its value will not be
altered by them. The value of the offset maximum will not affect any
semantics related to inheritance.

An application can inherit, via the exec family of functions, a file
descriptor that is associated with a file whose size exceeds the largest
value that can be represented correctly by the off_t that is in use by the
application. An example is if a shell that was compiled with a 64-bit off_t
does input or output redirection of a 10 gigabyte file and then executes a
program which was compiled with a 32-bit off_t. In such a case the large
file unaware application will function until attempting an operation from
which the results cannot be correctly returned.

Most inherited files are due to shell redirection, the other cases are rare
and typically under the complete control of a single application provider.
The cases that are of primary concern are:

     old_binary < large_file

and

     old_binary > large_file

In these cases a pre-existing application binary, old_binary, is given a
file descriptor to a file that it would not have been able to open for
itself and would be able to read and write past the limit that would have
been established by the open(). The concern is that the application will do
something destructive or generate incorrect results since it is not
expecting a file to be so large.

In comparison, consider the following cases:

     a.out | old_binary

and

     old_binary | a.out

There is no limit to the amount of data that may be passed through a pipe.
In the first case the application named a.out may push more data through the
pipe than can be contained in a small file. In the second case a.out may be
willing to read more data than can be contained in a small file. If a
pre-existing application binary has problems with inherited file descriptors
that refer to large files then it is likely to have a pre-existing problem
when using a pipe for large amounts of data. While it is true that the two
sets of cases are not completely equivalent, the above examples show that
pre-existing binaries have had the potential to see data streams larger than
the amount of data that can be contained in a small file.

Another reason it is believed that the inheritance of file descriptors does
not cause problems is that the majority of existing applications do not
perform seek operations on standard input or standard output.

A.1.2.3 Non-Requirements

Open protection and the read/write limit apply only to regular files, and
are not specified to apply to block or character special files such as raw
disk partitions.

A.1.2.4 Non-Changes

The following are to clarify, not to change, existing practice: Different
files may have different maximum permitted sizes even when they are on the
same system, or are on the same type of file system, or are on the same file
system. The maximum permitted file sizes are independent of the offset
maximum. The maximum permitted file sizes do not have specified minimum or
maximum values. Attempts to grow a file via write(), writev(), or truncate()
may fail even when statvfs() reports that space is available.

A.1.2.5 NFS Quality of Implementation Issue

NFS does not fall within the confines of this specification since there are
no relevant NFS interfaces. However, here are some suggestions for NFS
implementations.

The NFS version 2 protocol is effectively a 32-bit application since it
cannot handle file sizes larger than 2^31-1 bytes. Any attempt by an NFS V2
client to access a large file (read(), write(), stat(), etc.) should be
rejected by the server since the server knows the file is large and knows
the application (NFS V2) is not "large file aware". This test is trivial and
requires no more performance penalty than the tests for any other file
system type.

The NFS version 3 protocol is "large file aware" since it can handle file
sizes up to 2^63-1 bytes. An NFS V3 server would handle all requests without
change, even if the request involves a large file. It is up to the NFS V3
client code to determine if the application accessing a file is "large file
aware" or not. This should be handled in the standard fashion in the OS on
the client side machine using the attributes returned by the NFS operation
or the cached file attributes. While this does not provide perfect
protection or immediate detection of files that have grown beyond 2^31-1
bytes since being opened, it is no more broken than the rest of NFS. (See
below for more discussion of cached file attributes).

This does not address the issue of NFS V3 clients that are not prepared to
handle "large files". If they are carefully written and obey the NFS V3
protocol they should realize that files can be larger than 2^31-1 bytes and
handle this condition appropriately, probably by failing the operation (they
would know this when a stat(), read(), write(), etc. operation returned a
file size larger than 2^31). However, there are probably NFS V3 clients that
are not carefully written. We really can't do much about that.

Cached Attributes: with the NFS V3 protocol, clients are not required to
cache the file attributes, and servers are not required to return the file
attributes with each operation. If the file attributes are returned with
each operation, it is easy to determine if the file has grown past the large
file limit. If not, the cached attributes can be consulted.

If the client does not cache attributes, then it will either have to request
the attributes from the server over the wire (adversely affecting
performance) or assume the file has not grown in size since it was opened.
This specification pretty much requires the client code to check the file
size at open.

Because of the stateless nature of NFS, it is difficult to ensure that a
large-file unaware application cannot operate on a file that has grown from
small to large. This is for the same reasons that NFS cannot implement
standard UNIX file semantics. However, it is easy to ensure that a
large-file unaware application does not grow a small file to become large
(since the offset and length of each write are determined at the client, the
client can fail any operation where the offset plus length exceeds the small
file limit). It is also easy to insure that a large-file unaware application
does not read past the small file limit.

A.2 Changes to the Single UNIX Specification

A.2.1 Changes to CAE Specification System Interfaces and Headers, Issue 4,
Version 2

A.2.1.1 Changes to System Interfaces

A.2.1.1.1 Notes on Functions not Modified by this Proposal

The following functions do not require modification to meet the terms of
this proposal:

aio_error(), aio_cancel(), aio_return() and aio_suspend()
     No large file implications were identified for these functions.
aio_fsync()
     It is possible that an aio_fsync() could try to write out file blocks
     that are beyond the offset maximum, just as fsync() could. There is no
     compelling reason for either to fail. Clearly, the original write
     request had to be within the offset maximum for the file description
     used. The aio_fsync() function will not enforce the offset maximum on
     the blocks which it writes out.
glob() and wordexp()
     The subroutines that expand file name wild cards need to be large file
     capable.

A.2.1.1.2 aio_read()

The aio_read() function enforces the offset maximum rules for consistency
with read() and readv().

A.2.1.1.3 aio_write()

The aio_write() function enforces the offset maximum rules for consistency
with write() and writev().

A.2.1.1.4 creat()

The creat() function will fail if the named file is a regular file and the
size of the file cannot be represented correctly in an object of type off_t
(see 2.2.1.24 open()). This offers protection from the following coding
style:

     if (stat(path, ...) < 0) {
         /* assume file does not exist, so create it */
         if ((fd = creat(path, ...)) < 0) {
            /* print out error text */
         }
     }

In this example the stat() function is being used to determine the existence
of a file. But if the file size cannot be represented correctly in an object
of type off_t then stat() will fail (see 2.2.1.14 fstat(), lstat() and
stat()) and if creat() did not then fail it would have the unintended effect
of truncating the file to 0 length. Many applications and standard utilities
have code similar to this example, including typical implementations of the
touch utility.

A.2.1.1.5 fcntl() and lockf()

Unlock requests are sometimes "rounded to infinity" so that a process can
create a whole-file lock and then successfully issue a request to clip off
the beginning of the lock without leaving behind an unrepresentable lock.
This is to avoid breaking any existing 32-bit applications which might
happen to do this.

Several existing implementations of fcntl() permit locking the byte whose
offset is the maximum value that can be represented correctly in a object of
type off_t, even though write() cannot write to that offset. This
specification permits that behavior.

The fcntl() function will fail if the cmd argument is F_GETLK and the first
lock which blocks the lock description has a starting offset or length which
cannot be represented correctly in an object of type off_t. Information
about such a lock cannot be correctly returned.

Discussion of the semantics of fcntl() locks that cross the off_t boundary
resulted in six competing proposals:

  1. An unlock request fails if it would create an unrepresentable lock.
  2. If any lock request includes the byte whose offset is the maximum value
     that fits in an off_t, then the request is equivalent to a request
     where l_len is 0 and l_start refers to the first byte of the affected
     area.
  3. (proposal was dropped)
  4. If l_len is 0 then the lock is through and including the maximum value
     of off_t (and not beyond).
  5. Just no lies.
  6. If an unlock request includes the byte whose offset is the maximum
     value that fits in an off_t, and there is an existing lock with l_len
     equal to 0 which also includes that byte, then the request is
     equivalent to a request where l_len is 0 and l_start refers to the
     first byte of the affected area.

An advantage of 2, 4, and 6 is that they do not change existing behavior of
a 32-bit application.

Proposals 1 and 5 can result in a new type of failure in the case where the
program creates a lock with l_len equal to 0 and then clips off the
beginning leaving behind an unrepresentable lock.

Proposal 4 precludes truly "whole file" locking.

Proposal 6 was adopted because as it preserves existing 32-bit behavior and
is less disruptive than proposal 2 (which extends lock requests in addition
to unlock requests).

The fcntl() and lockf() functions will fail if the offset of the first byte
in the region, or if l_len (size) is non-zero then the offset of last byte
in the region, exceeds the largest possible value in an object of type
off_t. Otherwise the process could create a lock which would be "beyond" the
ability of the program to represent.

A.2.1.1.6 fgetpos(), fseek(), ftell(), lseek()

These functions will fail if the resulting file offset would exceed the
largest value that can be represented correctly in the related type which is
in use for the call, and will set errno to EOVERFLOW (permitted by PASC
Interpretation 1003.1-90 #75).

Programs typically, but incorrectly, fail to check the return value of these
functions, which renders the error return less useful. On the other hand,
returning an incorrect offset can result in serious malfunction as well.

An lseek() to the end of a file using

     lseek(fd, 0, SEEK_END);

is quite common. It is unfortunate that these fail on a too-large file since
the return value is usually ignored. One alternative that was considered was
for lseek() to move the file offset for all valid requests and then return
an error if the resulting offset is too large. That is, the call would
succeed for applications that do not check the return code, but also fail
for applications that do check. This option was deemed too bizarre to adopt.
For example, it might be difficult to implement using a remote procedure
call system that was constructed to return either results or an error, but
not both. In addition, the POSIX 1003.1 standard requires the file offset to
remain unchanged if an error is returned by lseek(). It was felt that the
open protection (see A.1.1.2 "Open Protection" Rule) and the read/write
limit (see A.1.1.3 "Read/Write Limit" Rule) are more effective defenses
against this problem.

Another potentially serious consequence of ignoring the return value of
lseek() is that programs which extend data files by attempting to seek
beyond the end-of-file and then writing may instead overwrite existing data.

For example, typical implementations of the dbm and ndbm libraries contain
code such as:

     (void) lseek(db->dbm_pagf, blkno*PBLKSIZ, L_SET);
     if (write(db->dbm_pagf, pagebuf, PBLKSIZ) != PBLKSIZ)
                ... error handling ...

The problem is that the return code of lseek() is not checked and so if
"blkno*PBLKSIZ" overflows the lseek() will fail (or will seek to an
unintended offset) and the data will be written to an unintended offset.

A.2.1.1.7 fpathconf() and pathconf()

The reference "See Note 3,4" refers to notes in the X/Open specification for
fpathconf() and pathconf(). These notes indicate that this option
(_PC_FILESIZEBITS) is valid only for a directory, and the results are for
files that exist or may be created in that directory.

The _PC_FILESIZEBITS option makes it possible for a process to determine how
large a file can be created in a given directory. It takes into account
implementation limitations in the file system (e.g. due to the size of file
size and block count variables), and it takes into account long term policy
limitations (e.g. due to the mount utility's -o nolargefiles option). It
does not take into account dynamic restrictions such as the RLIM_FSIZE
resource limit or the number of available file blocks, so the process must
perform appropriate checks.

When the current directory is on a typical large file capable file system
and is mounted with the -o nolargefiles option,

     pathconf(".", _PC_FILESIZEBITS);

will return 32. In general, if the maximum size file that could ever exist
on the mounted file system is maxsize then the returned value is 2 plus the
floor of the base 2 logarithm of maxsize.

A.2.1.1.8 fseeko() and ftello()

These functions are needed because fseek() and ftell() are limited by the
long offset type required by ISO C. The fsetpos() and fgetpos() functions,
although they do use an opaque offset type, are not complete replacements
for fseek() and ftell() because they do not allow relative seeks or
arithmetic on fpos_t values.

A.2.1.1.9 fsetpos()

Since fsetpos() sets an absolute file position, which is always legal
regardless of the implementation-supported sizes of off_t, there are no new
error returns or other new semantics.

A.2.1.1.10 fstatvfs() and statvfs()

These functions will fail if the total, or free, or available number of
blocks or files cannot be represented correctly in the structure to be
returned (f_blocks, f_bfree, f_bavail, f_files, f_ffree, f_favail).

A.2.1.1.11 ftruncate(), truncate(), unlink()

These functions are used only on pre-existing files and so do not have the
potential programming hazard as does creat() (see A.2.1.1.4 creat()).

When ftruncate() is used to increase the size of a file, the semantics are
similar to a write() of zeroes to the file. For consistency with write(),
the ftruncate() function will fail when the request is beyond the offset
maximum (even if the effect of the request would be to shorten the file).

A.2.1.1.12 ftw() and nftw()

The ftw() and nftw() functions may fail if a stat() in the underlying
implementation fails with EOVERFLOW. This is unfortunate because "small"
binaries using these functions cannot reasonably be used on file trees
containing "large" files. Some systems have a non-standard extension to
nftw() which permits it to continue when stat() fails (typical failures also
include ESTALE and ELOOP).

A.2.1.1.13 getrlimit() and setrlimit()

These functions map limits that they cannot represent correctly to and from
RLIM_SAVED_MAX and RLIM_SAVED_CUR. These values do not require any special
handling by programs. They may be thought of as tokens that the kernel hands
out to programs that can't handle the real answer, and that remind the
kernel, when the tokens come back from the user, of what value is really
meant.

If setrlimit() fails for any reason (for example, EPERM), the resource
limits and saved resource limits remain unchanged.

This proposal does not specify any particular value for RLIM_INFINITY,
RLIM_SAVED_MAX or RLIM_SAVED_CUR. Typical current implementations use the
value 0x7FFFFFFF for RLIM_INFINITY, and it is recommended that
RLIM_SAVED_MAX and RLIM_SAVED_CUR have similar large values.

Few, if any, programs will need to refer explicitly to RLIM_SAVED_MAX or
RLIM_SAVED_CUR. Those that do should not use them in C-language switch cases
since they may have the same value in some implementations (see 2.2.2.3
<sys/resource.h>).

A limit that can be represented correctly in an object of type rlim_t is
either "no limit", which is represented with RLIM_INFINITY, or has a value
not equal to any of RLIM_INFINITY or RLIM_SAVED_MAX or RLIM_SAVED_CUR and
which can be represented correctly in an object of type rlim_t and which
meets any additional implementation-specific criteria for correct
representation.

A rejected alternative proposal was to map limits that could not be
represented to and from RLIM_INFINITY. This would avoid the need for the new
symbols RLIM_SAVED_MAX and RLIM_SAVED_CUR. But such mapping would arguably
be a lie, and the resulting information loss would cause unintuitive program
behavior, especially in programs running with appropriate privileges needed
to raise hard limits.

A rejected alternative proposal was that if getrlimit() could not correctly
return a current limit then it should instead return -1 and set errno to
EOVERFLOW. But that would result in unnecessary breakage of programs. (Note
that this breakage occurs even when no large files are present.) It would
also result in malfunction of programs that assume that they are calling
getrlimit() properly and so failure "cannot happen". For example, in the 4.4
BSD-Lite distribution, there are at least 15 unchecked calls to getrlimit().
When the 4.4 BSD csh limit function is used to report the current limits,
there is no check of the return code and so the reported results can be
entirely incorrect. Also, non-superuser programs typically unlimit
themselves with:

     getrlimit(RLIMIT_STACK, &rl);
     rl.rlim_cur = rl.rlim_max;
     setrlimit(RLIMIT_STACK, &rl);

If the getrlimit() fails then garbage is passed to setrlimit() which may
result in an unwanted and extremely restricted limit. Several utilities that
are part of the GNU C compiler have this problem.

A.2.1.1.14 lio_listio()

The lio_list() function enforces the offset maximum rules since they are
logically equivalent to aio_read() and aio_write() which enforce it.

A.2.1.1.15 mmap()

For consistency with read() and write(), the mmap() function will fail when
the request extends beyond the offset maximum.

A.2.1.1.16 open()

The open() function called with O_TRUNC set will fail without truncation if
the named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t. (See A.2.1.1.4 creat()).

A.2.1.1.17 read(), readv(), write() and writev()

These functions may do a "partial read or write" due to the offset maximum.
That is, the value returned may be less than nbyte if the number of bytes
remaining which may be transferred is less than nbyte.

A.2.1.1.18 ulimit()

The ulimit() function will return an unspecified result if the result cannot
be represented correctly in an object of type long. As this function is
already obsolescent, the use of getrlimit() and setrlimit() is recommended
for getting and setting process limits.

A.2.2 Changes to CAE Specification Commands and Utilities, Issue 4, Version
2

A.2.2.1 General Porting Suggestions

When porting a program to be large file capable, general areas of concern in
addition to the issues mentioned in A.1.1.4 Holes in the Protection
Mechanism include:

   * command line arguments
   * API conversion
   * type conversion
   * output formatting
   * fixed format media issues
   * other languages

A.2.2.1.1 Command Line Arguments

Numeric arguments which are file size related, such as a file offset or
block count, need to be handled as an appropriately large type. Converting
arguments into an off_t that is larger than a long may need to be
accomplished with non-standard scanf() formats, if available, or with
portable user-written functions that convert ASCII to a large off_t
analogous to the strtol() function.

A.2.2.1.2 API Conversion

The program should be recompiled in a large off_t environment or,
alternatively, should be converted to use the transitional API. In either
case the source must be scanned for the functions listed in 3.1.1.1 64-bit
Versions of Interfaces and the data types listed in 3.1.2.1 64-bit Versions
of Headers to ensure that all types are properly converted.

A.2.2.1.3 Type Conversion

Whenever a new 64-bit function is used, the argument types and function
result will need to be converted as appropriate. Whenever a variable's type
is converted (whether via the large off_t compilation environment or the
transitional API), all uses of the variable must be checked to determine if
further type conversions are warranted. For example, wherever there is a
struct stat, all uses of st_size must be checked. If the st_size value is
assigned or compared with a variable "v" the variable "v" must be converted
if necessary and all uses of "v" must in turn be checked. This is also true
of type conversions required for command line arguments.

In addition, the program needs to be checked for file size related variables
such as offsets, line numbers, and block counts that must be converted to a
large off_t or related type. These variables typically appear inside loops
that are performing input and/or output.

A.2.2.1.4 Output Formatting

Output of types that have been converted will probably involve using a
different printf() format or using a revised user-written conversion
routine. Since there is a larger range of values which take up more space,
revision of the output layout may be required.

A.2.2.1.5 Fixed Format Media Issues

Current implementations of the tar and cpio utilities are defective in their
support of arbitrarily large files. The pax utility is also equally
defective, but is the subject of a proposal in ballot. (See 2.3.3 The pax
Utility for discussion of this topic.)

Vendor and third-party backup software is also unable to support large files
and will require modification in order to do so.

A.2.2.1.6 Other Languages

This specification is for the C language only. Other languages have
different support requirements. For example, the Fortran I/O API has a limit
on the number of records, not bytes.

A.2.2.2 Considerations for Utilities in Support of Files of Arbitrary Size

The utilities listed in 2.3.1 Considerations for Utilities in Support of
Files of Arbitrary Size are utilities which are used to perform
administrative tasks such as to create, move, copy, remove, change the
permissions, or measure the resources of a file. They are useful both as
end-user tools and as utilities invoked by applications during software
installation and operation.

Typical core utilities must be compiled in a "large" off_t compilation
environment or must use the transitional APIs. Using the compilation
environment reduces the number of editing changes required to port a
program, but it does not reduce the effort required to ensure the
correctness of the port.

The chgrp, chmod, chown, ln, and rm utilities probably require use of large
file capable versions of stat(), lstat(), ftw(), and the stat structure.

The cat, cksum, cmp, cp, dd, mv, sum, and touch utilities probably require
use of large file capable versions of creat(), open(), and fopen().

The cat, cksum, cmp, dd, df, du, ls, and sum utilities may require writing
large integer values. For example,

   * The cat utility might have a -n option which counts newlines.
   * The cksum and ls utilities report file sizes.
   * The cmp utility reports the line number at which the first difference
     occurs, and also has a -l option which reports file offsets.
   * The dd, df, du, ls, and sum utilities report block counts.

The dd, find and test utilities may need to interpret command arguments that
contain 64-bit values. For dd the arguments include skip=n, seek=n, and
count=n. For find the arguments include -size n. For test the arguments are
those associated with algebraic comparisons.

The df utility might need to access large file systems with statvfs().

The ulimit utility will need to use large file capable versions of
getrlimit() and setrlimit() and be able to read and write large integer
values.

Conversion between off_t (or other derived types) and ASCII is unspecified,
which is a significant practical deficiency. This is being considered by
other groups. For example, see:
ftp://ftp.dmk.com/DMK/sc22wg14/c9x/extended-integers/

A.2.2.3 Additional Requirements for the sh Utility - Porting Recommendations

Pathname expansion (e.g. expanding */foo.c to a/foo.c b/foo.c c/foo.c) and
pathname completion might in some cases use the stat() function which would
need to be large file capable.

The offset maximum used for shell input and output redirections is
implementation-specific. Some vendors prefer to use the smallest supported
off_t, others prefer the largest.

A.3 Transitional Extensions to the Single UNIX Specification

A.3.1 Transitional Extensions to CAE Specification System Interfaces and
Headers, Issue 4, Version 2

Prior experience with transitional access is reported by SGI, Convex,
(http://www.sas.com/standards/large.file/background) and Programmed Logic
Corporation
(http://www.sas.com/standards/large.file/proposals).

A.3.1.1 Transitional Extensions to System Interfaces

A.3.1.1.1 fcntl()

The O_LARGEFILE flag may be set or cleared with F_SETFL. An incorrectly
written program may inadvertently clear this flag. For example, some
programs put a file into append mode with:

      fcntl(fd, F_SETFL, O_APPEND);

This is incorrect because it turns off all the other open flags, including
O_LARGEFILE. Instead, to turn on append mode one should first use F_GETFL to
get the current flags:

     int oflag = fcntl(fd, F_GETFL, 0);

then include O_APPEND in the flags:

     oflag |= O_APPEND;

and then set the new flags:

     fcntl(fd, F_SETFL, oflag);

A more complete example would also check for fcntl() failures.

A.3.1.1.2 No fcntl64()

A rejected alternative to extending fcntl() with F_GETLK64 (and so on) would
be to specify fcntl64() with F_GETLK (and so on). The former has prior art
and less functional redundancy, whereas the latter is more consistent with
other transitional functions. This specification does not preclude vendors
from supplying an fcntl64().

A.3.1.2 Transitional Extensions to Headers

A.3.1.2.1 <aio.h>

The aio control block has an embedded offset which is of type off_t. A large
file enabled aio control block needs a 64-bit offset. For consistency with
the other transitional interfaces, a new control block with a 64-bit offset
is defined. The offset is of the type off64_t.

Since a new control block is needed, new interfaces are required for all of
the existing aio interfaces since every one takes a pointer to the control
block as an argument.

A.3.1.2.2 <sys/resource.h>

This proposal does not specify any particular value for RLIM64_INFINITY,
RLIM64_SAVED_MAX or RLIM64_SAVED_CUR. Typical implementations should use the
value 0x7FFFFFFFFFFFFFFF or 0xFFFFFFFFFFFFFFFF for RLIM_INFINITY, and it is
recommended that RLIM64_SAVED_MAX and RLIM64_SAVED_CUR have similar large
values. Even though all limit values will be represented in 64-bit types for
a few years, specifying them as distinct values now will reduce
compatibility problems in the future when the next transition to a still
larger type occurs.

A.3.1.2.3 <sys/types.h>

It is not required that ino64_t be a 64-bit type. However, the NFS version 3
protocol allows for 64-bit file serial numbers. For NFS interoperability
with systems making use of 64-bit file serial numbers, 64-bit ino_t support
is necessary. DCE also may make use of 64-bit file serial numbers.

A.3.2 Accessing the Transitional Extensions to the SUS

A.3.2.1 Compilation Environment - Visibility of Additions to the API

Applications which use the fseeko() and ftello() interfaces should define
_LARGEFILE_SOURCE to be 1, then include <unistd.h> and then test that
_LFS_LARGEFILE is 1 to determine if the additional functionality is indeed
available. This additional functionality may be available even when
_LARGEFILE_SOURCE is not defined, but it will not be available to strictly
conforming X/Open programs.

This macro does not affect the size of off_t (see 3.3.3 Mixed API and
Compile Environments Within a Single Process).

A.3.2.2 Visibility of Transitional API

Applications which wish to use this transitional functionality should define
_LARGEFILE64_SOURCE to be 1, then include <unistd.h>, and then test that
_LFS64_LARGEFILE, _LFS64_ASYNCHRONOUS_IO and _LFS64_STDIO are set to 1 to
determine if the corresponding transitional functionality is indeed
available. This transitional functionality may be available even when
_LARGEFILE64_SOURCE is not defined, but it will not be available to strictly
conforming X/Open programs.

This macro does not affect the size of off_t (see 3.3.3 Utilities: Optional
Method for Specifying the Size of an off_t).

If _LARGEFILE64_SOURCE is defined then _LARGEFILE_SOURCE is implied so it
need not also be defined (see 3.3.1 Compilation Environment - Visibility of
Additions to the API). Similarly, if _LFS64_LARGEFILE is defined then
_LFS_LARGEFILE will be defined so it need not also be tested.

A.3.2.3 Mixed API and Compile Environments within a Single Process

Mixing objects from differing compile environments can be dangerous, since
some types have different sizes in the differing environments. The types
might be used in a way where the size difference causes problems. A system
may disallow this mixing. To avoid these problems, don't mix such objects in
the same executable, or at least ensure that data shared between files
compiled differently does not use any of the types whose meaning may change.

Mixing the standard and transitional APIs is relatively safe, since data
types have the same meaning in every file. This mixing permits a smoother
and faster migration to a larger off_t environment, because it permits
asynchronous upgrades. For example, it permits libraries to be made large
file aware without requiring large file awareness in all the programs which
use the library or in all the libraries which the library uses. (This is
true both for static and for shared libraries.) This is particularly
beneficial for situations in which the system vendor, one or more
third-party suppliers, and the end user may all be supplying libraries or
other objects that are components of a complete program.

A.3.2.4 Utilities: Optional Method for Specifying the Size an off_t

The LFS_CFLAGS variable is used to obtain implementation- specific compiler
options, such as flags and preprocessor variable definitions, so that the
compiled program will be using a "large" off_t. Similarly the LFS_LDFLAGS
variable supplies link editor options, the LFS_LIBS variable supplies link
library names, and the LFS_LINTFLAGS variable supplies lint options.

If the size of off_t is controlled by a preprocessor macro variable then it
is recommended that the macro be named _FILE_OFFSET_BITS and be supported as
follows:

   * If this symbol is not defined then an implementation-defined default
     size will be used.
   * Otherwise, if this symbol has a decimal value equal to the number of
     bits in one of the implementation-supported sizes of off_t then that
     size of off_t will be used.
   * Otherwise, an error message will be written to the standard error and
     compilation will terminate with a non-zero status.

For POSIX compatibility this method must not be affected by the #undef
preprocessor or directive. For example:

     #undef lseek

must not alter the size of type off_t in use for a call to lseek().

The functions that might be affected by this option are listed in 3.1.1.1
64-bit Versions of Interfaces.

The types, structures and symbolic constants that might be affected by this
option are listed in 3.1.2.1 64-bit Versions of Headers.

It has been argued that there should be a new mode bit (or "magic number")
on executable images to indicate whether or not the application is large
file aware. This is not precluded by this specification. However, an
argument against it is that it requires significant work. Specifically,
kernel, compiler, loader, and library changes are needed. It is unclear how
the mode bit would support a large file aware application that makes calls
to a non-aware shared library.

----------------------------------------------------------------------------

Revision Information

23Feb96 Version 1.1

The 23Feb96 changes include:

  1. Unix changed to UNIX throughout
  2. Section 1.5 (Changes and Additions) second bullet (Changes to System
     Interfaces and Headers) added EFBIG
  3. Section 2.2.1 (Changes to System Interfaces) changed "as a future" to
     "in a future".
  4. Section 2.2.1.1 (aio_read), 2.2.1.1 (aio_write) and 2.2.1.20
     (lio_listio) changed nbyte to aiocbp->aio_nbytes; added "is before the
     end-of-file and" before "is at or beyond" in the EOVERFLOW error.
  5. Section 2.2.1.1 (aio_read), 2.2.1.1 (aio_write) and 2.2.1.20
     (lio_listio) changed "greater than or equal to" to "greater than".
  6. Section 2.2.1.4 (fclose, etc.), 2.2.1.7 (fgetc, etc.) and 2.2.1.11
     (fprintf, etc.) changed "write beyond" to "write at or beyond".
  7. Section 2.2.1.20 (lio_listio) prefixed lio_opcode with aiocbp->;
     changed order of phrases in EOVERFLOW and EFBIG (moved "the
     aiocbp->aio_lio_opcode is LIO_READ" to the front of the sentences);
     removed "before EOF" in the EOVERFLOW error condition; added "is before
     the end-of-file and" before "is greater than or equal to the offset
     maximum".
  8. Section 2.2.2.6 (sys/types.h) and 3.1.2.11 (sys/types.h) changed "must
     be" to "are defined as" in the sentences starting "The types..".
  9. Section 3.1.1.1 (64-bit Versions of SUS Interfaces) changed title of
     section to "64-bit Versions of Interfaces". Changed titles in
     references to match.
 10. Section A.1.1.4.1 (fcntl) moved into A.3.1.1.1 (fcntl).
 11. Section A.1.1.4 (Holes in the Protection Mechanism) body added.
 12. Section A.1.2.1.1 (Offset Maximum and the 2G-1 File Size Limit)
     boldfaced "B" in "byte line"; changed "a lseek" to "an lseek"; changed
     "the resulting pointer to the next offset address will overflow" to
     "the resulting pointer to the next offset address and the file size
     itself would overflow"; changed title from "Offset Maximum - 2G-1 File
     Size Limit" to "Offset Maximum and the 2G-1 File Size Limit"; changed
     "cannot be performed because" to "cannot be performed at that position
     because" in the last paragraph.
 13. Section A.2.1.1.4 (creat) changed sample code from if (creat(path, ...)
     < 0) { to if ((fd = creat(path, ...)) < 0) {.
 14. Section A.2.1.1.6 (fgetpos, etc.) changed "this function" to "these
     functions" in second paragraph; added paragraph beginning "Another
     potentially serious..." and all that follows to the end of the section.
 15. Section A.3.1.1 (Transitional Extensions...) changed "B.3.1.1.2" to
     "A.3.1.1.2" in subsection.
 16. Section A.3.1.1.1 (fcntl) Merged sentence "The O_LARGEFILE flag may be
     set..." with the sentence "The O_LARGEFILE flag can expose..." moved in
     from A.1.1.4.1 (fcntl).
 17. Section A.3.2.2 (Visibility of Transitional API) changed "Note that if"
     to "If" in fourth paragraph.
 18. Section A.3.2.4 (Utilities:...) corrected reference to 3.1.2 in the
     second to the last paragraph to 3.1.2.1 64-bit Version of Headers.
 19. Table of Contents corrected A.3 and A.3.1 heading titles.

24Feb96 Version 1.2

The 24Feb96 changes include:

  1. Added link to Foreword and section.
  2. Section 1.6 (Conformance) removed list, added text for section.
  3. Section 2.2.1.11 (fprintf) changed "needs" to "needed" in the error
     text.
  4. Section 3.1.2.12 (unistd.h) added LFS_ASYNCHRONOUS_IO version test
     macro.

01Mar96 Version 1.3

The 01Mar96 changes include:

  1. Changed "Foreword" to "Acknowledgements".
  2. Added body of Acknowledgements.
  3. Section 1.6 (Conformance) 1st paragraph changed "may fail to" to "need
     not".
  4. Section 3.3.4, Example 2 changed "had" to "has".
  5. Section A.1.2.1.1 (Offset Maximum...) swapped "-" and ">" in top line.
  6. Section A.2.1.1.4 (creat) corrected reference for fstat.
  7. Section 3.3.3 (Utilities:...) corrected reference for Compilation
     Environment...

05Mar96 Version 1.4

The 05Mar96 changes include:

  1. Changed Version 1.2 in 01Mar96 revision section to Version 1.3
  2. Added additional contributors in the Acknowledgements.

20Mar96 Version 1.5

The 20Mar96 changes include:

  1. Back by popular demand.... Larger fonts in the PostScript Version!
  2. Section 1.2 (Requirements) In the text for "Be fully compliant to the
     SUS" changed "conversion to the proposed standard" to "conversion to
     this proposed standard" in the second from the last paragraph.
  3. Section 1.4 (Concepts) Changed "file is larger" to "file size is
     larger" and changed "only support" to "support only".
  4. Section 1.6 (Conformance) LOTS of changes. In summary: each statement
     of conformance ("A conforming implementation...") was separated into
     individual paragraph and in each the phrases "described in" and "listed
     in" were changed to "specified in"; the version test macro required for
     each statement of conformance was added along with a reference to the
     section where the changes to the interfaces and/or headers is
     described; in the first statement of conformance parenthesis were added
     around "except...lio_listio()" for clarity. Also deleted the last
     paragraph (beginning "Implementations which provide...").
  5. Section 2.2.1.7 (fgetc) Removed extra period at end of EOVERFLOW
     description.
  6. Section 2.2.1.19 (getrlimit) Changed commas before "otherwise" to
     semicolons in first and second paragraphs; changed "permit" to "might
     permit" and "do not" to "might not" in the fourth paragraph.
  7. Section 3.0 (Transitional Extensions...) first paragraph: Added
     sentence beginning "Version test macros..." after the first sentence
     ("The interfaces...").
  8. Section 3.1.2.8 (sys/resource.h) Added period after description of
     RLIM64_INFINITY.
  9. Section 3.1.2.12 (unistd.h) In Version Test Macros section added to
     description of _LFS_ASYNCHRONOUS_IO beginning with "as specified
     in..."; added "and 3.1.2.2..." to description of
     _LFS64_ASYNCHRONOUS_IO; added "3.1.1.2 fcntl()..." to description of
     _LFS64_LARGEFILE; added "and 3.1.2.6..." to description of
     _LFS64_STDIO. The last paragraph of A.3.2.2 ("If _LFS64_STDIO...") was
     moved to 3.1.2.12 as a new paragraph in the description of
     _LFS64_STDIO. In the description of _LFS_LARGEFILE the phrase "the
     fseeko() and ftello()" was removed and the text beginning with "as
     specified in..." through the end of the sentence was added.
 10. Section 3.2.1 (Optional Additional...) Changed criteria to criterion
     (last word of first paragraph).
 11. Section A.1.1.2 (Open Protection...) Removed comma before "and so on"
     in the third paragraph.
 12. Section A.1.2.1 (Offset Maximum) Added "it" between "that" and "is
     believed" in last sentence of the fifth paragraph. Also in the fifth
     paragraph, changed "only applies" to "applies only".
 13. Section A.2.1.1.13 (getrlimit()...) Added text beginning "These values
     do not..." through "...is really meant." to the end of the first
     paragraph.
 14. Section A.2.2.1 (General Porting...) In the first paragraph removed the
     phrase "there are four" and added "include" at then end of the
     sentence.
 15. Section A.2.2.3 (Type Conversion...) Removed the last sentence of the
     last paragraph ("Utilities not directly...").
 16. Section A.3.2.2 (Visibility of...) Second paragraph: added missing
     parenthesis at end of the sentence. Also moved last paragraph ("If
     _LFS64_STDIO is not defined...") to section 3.1.2.12 (unistd.h) as an
     additional paragraph in the _LFS64_STDIO description.
 17. Acknowledgements first paragraph: changed "files sizes" to "file sizes"
     in two places and changed "at least 2**32-1" to "at most 2^31-1". In
     the list of contributors changed "Hewlett-Packard Inc." to
     "Hewlett-Packard Co."; changed "Sun Microsystems Corp." to "Sun
     Microsystems, Inc."; changed "Srimivasam" to "Srinivasan"; removed Art
     Herzog from Novell list; removed Carl Zeigler from SAS list; added The
     Santa Cruz Operation, Inc. contributors. Added "(now with Integrated
     Computer Solutions, Inc.)" after "Mark Hatch".
 18. General: Changed "define[s,d] XXX as 1" to "define[s,d] XXX to be 1".