Commit 0bb3e26a authored by Stewart Brodie's avatar Stewart Brodie
Browse files

fwrite performance improved significantly.

  Another getenv() bug fixed.
Detail:
  stdio.c contains a vastly improved implementation of fwrite.
  hostsys.h declares _terminate_getenv to remove build warnings.
  armsys.c contains fix to getenv() to stop Omni dying.
Admin:
  Tested on desktop machine for over a week without incident, including
    several heavy fwrite users (WebServe, C compiler)
  fwrite change is documented in Doc/fwrite
  getenv() bug is Bugzilla bug #28

Version 5.27. Tagged as 'RISC_OSLib-5_27'
parent 0f6d9975
I've played with the implementation of fwrite, and had several different
implementations to compare them:
1) "for (i=0; i<nbytes; ++i) putc(*ptr++, stream);" (roughly). This is what
current C libraries up to and including 5.26 do.
2) align write buffer using (1) or (4); OS_GBPB blocks_to_write * block_size
bytes in one go; write remainder using (1) or (4)
3) as (2), but call OS_GBPB to write block_size bytes blocks_to_write times.
In addition, some alternative strategies to do the buffer alignment:
4) if (space_in_write_buffer >= nbytes) memcpy the data into the write buffer
and exit (note, you don't flush the buffer until the next write is done)
else memcpy space_in_write_buffer bytes into the write buffer;
then force the buffer to be flushed.
When writing the remainder in (2) or (3), the first condition in (4) will
always be true.
Choice between methods (2) and (3) is controlled by the WRITE_ONCE macro
in stdio.c.
Methods (2) and (3) avoid the bytewise data copy from the caller's buffer
into the stdio buffer.
It's always going to be necessary to align the write buffer before attempting
the block write because there might be unflushed data in the write cache
already or this might be the first write in which case the internal data
structures are not set up to do writing.
The test was to write an 8M file to ADFS, NFS, ATAFS, SCSIFS and RAMFS.
Times in seconds. The block size is the parameter to fwrite used in the test
program. Times varied up to around 2%, using StrongARM RiscPC 200MHz, our
RISC OS 4 builds.
Strategy Block size ADFS NFS RAMFS ATAFS
1 5K 7.50 20.68 1.87 7.16
32K 7.51 20.76 1.94 7.32
8M 7.47 20.85 1.95 7.42
2 5K 7.44 19.89 0.76 6.40
32K 7.46 14.39 0.62 3.40
8M 7.45 10.29 0.52 2.40
3 5K 7.44 20.29 0.78 6.40
32K 7.47 19.35 0.70 6.10
8M 7.47 18.92 0.63 6.09
We tried SCSIFS on an A540, and the timings are of course completely
different from my StrongARM RiscPCs - but the relative stats were that
strategy 3 was twice as fast as strategy 1, and strategy 2 was twice as fast
3 for 5K blocks, but 9-10 times as fast for 32K and 8M blocks! We also tried
SCSIFS on a StrongARM Risc PC and found that 32K or larger blocks in strategy
(2) made a significant difference, but (3) didn't and neither did (2) at 5K.
Using or not using (4) didn't make any measurable difference. These are all
word-aligned writes.
Misaligning the stdio buffer but writing a few bytes before the large write
made no measurable difference.
The parallelism exploit in the NFS module where multiple transactions can be
run in parallel when >8K of data is to be transferred in a single call
through the filing system entry point accounts for the variation in NFS;
RAMFS benefits a great deal from removing the extra memory copy. The bulk
transfer really helps SCSIFS - but that may be down to the architecture of an
A540. Any or all changes seem to make no difference to ADFS
(ADFSBuffers didn't affect the speed measurably).
The major downside is that you lose TaskWindow multi-tasking during writes of
large blocks. If applications used setvbuf to set up a 32K buffer, then they
should benefit quite a bit - particularly on non-Risc PC IDE bus bound filing
systems.
--
Stewart Brodie, Senior Software Engineer
Pace Micro Technology PLC
645 Newmarket Road
Cambridge, CB5 8PB, United Kingdom WWW: http://www.pacemicro.com/
......@@ -11,14 +11,14 @@
GBLS Module_HelpVersion
GBLS Module_ComponentName
GBLS Module_ComponentPath
Module_MajorVersion SETS "5.26"
Module_Version SETA 526
Module_MajorVersion SETS "5.27"
Module_Version SETA 527
Module_MinorVersion SETS ""
Module_Date SETS "13 Dec 2000"
Module_ApplicationDate2 SETS "13-Dec-00"
Module_ApplicationDate4 SETS "13-Dec-2000"
Module_Date SETS "22 Feb 2001"
Module_ApplicationDate2 SETS "22-Feb-01"
Module_ApplicationDate4 SETS "22-Feb-2001"
Module_ComponentName SETS "RISC_OSLib"
Module_ComponentPath SETS "RiscOS/Sources/Lib/RISC_OSLib"
Module_FullVersion SETS "5.26"
Module_HelpVersion SETS "5.26 (13 Dec 2000)"
Module_FullVersion SETS "5.27"
Module_HelpVersion SETS "5.27 (22 Feb 2001)"
END
/* (5.26)
/* (5.27)
*
* This file is automatically maintained by srccommit, do not edit manually.
*
*/
#define Module_MajorVersion_CMHG 5.26
#define Module_MajorVersion_CMHG 5.27
#define Module_MinorVersion_CMHG
#define Module_Date_CMHG 13 Dec 2000
#define Module_Date_CMHG 22 Feb 2001
#define Module_MajorVersion "5.26"
#define Module_Version 526
#define Module_MajorVersion "5.27"
#define Module_Version 527
#define Module_MinorVersion ""
#define Module_Date "13 Dec 2000"
#define Module_Date "22 Feb 2001"
#define Module_ApplicationDate2 "13-Dec-00"
#define Module_ApplicationDate4 "13-Dec-2000"
#define Module_ApplicationDate2 "22-Feb-01"
#define Module_ApplicationDate4 "22-Feb-2001"
#define Module_ComponentName "RISC_OSLib"
#define Module_ComponentPath "RiscOS/Sources/Lib/RISC_OSLib"
#define Module_FullVersion "5.26"
#define Module_HelpVersion "5.26 (13 Dec 2000)"
#define Module_FullVersion "5.27"
#define Module_HelpVersion "5.27 (22 Feb 2001)"
......@@ -413,7 +413,7 @@ char *getenv(const char *name)
if (_getenv_value == NULL)
{
_getenv_size=256;
if ( (_getenv_value = malloc(_getenv_size)) == NULL)
if ( (_getenv_value = _kernel_RMAalloc(_getenv_size)) == NULL)
{
_getenv_size = 0;
return NULL; /* Could not allocate buffer */
......@@ -440,8 +440,8 @@ char *getenv(const char *name)
return NULL; /* It wasn't buffer overflow, so return NULL */
/* Buffer overflow occurred, so try to reallocate the buffer */
free(_getenv_value);
_getenv_value = malloc(_getenv_size += 256);
_kernel_RMAfree(_getenv_value);
_getenv_value = _kernel_RMAalloc(_getenv_size += 256);
if (_getenv_value == NULL) {
_getenv_size = 0;
return NULL;
......@@ -458,7 +458,7 @@ char *getenv(const char *name)
void _terminate_getenv(void)
{
if (_getenv_value)
free(_getenv_value);
_kernel_RMAfree(_getenv_value);
_getenv_value = NULL;
}
......
......@@ -28,6 +28,7 @@
/* the #include <stdio.h> imports macros getc/putc etc. Note that we
must keep the two files in step (more details in ctype.c).
NOTE (sb, 09/02/01): This includes the _write function
*/
#define __system_io 1 /* makes stdio.h declare more */
......@@ -42,7 +43,7 @@
#include "kernel.h" /* debug */
#include "hostsys.h"
#include "swis.h"
extern char *_kernel_getmessage(char *msg, char *tag);
extern int _fprintf_lf(FILE *fp, const char *fmt, ...);
extern int _sprintf_lf(char *buff, const char *fmt, ...);
......@@ -63,6 +64,15 @@ int __backspace(FILE *stream); /* strict right inverse of getc() */
#define NO_DEBUG
/* This macro is part of the fwrite performance improvement. It selects
* which strategy is being used for large block writes. If this macro
* is defined, then _writebuf is asked to write n*buffersize bytes in one
* go; if it is undefined, it writes buffersize bytes n times (which
* allows callbacks to go off during large writes ... but that's the only
* real benefit). Best performance is achieved by defining this macro.
*/
#define WRITE_ONCE
/* in the shared library world, __iob and _errno are generated by s.clib, */
/* in order to make it easier to keep their position fixed */
......@@ -802,16 +812,102 @@ dbmsg("fread %d\n", count);
static int _write(const char *ptr, int nbytes, FILE *stream)
{ int i;
for(i=0; i<nbytes; i++)
if (_sys_istty(stream) || (stream->__flag & _IONBF)) {
for(i=0; i<nbytes; i++)
if (putc(*ptr++, stream) == EOF) return 0;
/* H&S say 0 on error */
}
else if (nbytes > 0) {
if (stream->__ocnt >= nbytes) {
/* output will fit completely into our output buffer */
memcpy(stream->__ptr, ptr, nbytes);
stream->__ptr += nbytes;
stream->__ocnt -= nbytes;
ptr += nbytes;
}
else {
int so_far = 0;
/* Fill the existing buffer */
if (stream->__ocnt > 0) {
/* Space exists - write the data to the buffer en bloc */
memcpy(stream->__ptr, ptr, stream->__ocnt);
stream->__ptr += stream->__ocnt;
ptr += stream->__ocnt;
so_far += stream->__ocnt;
stream->__ocnt = 0;
}
/* To get here, there are more bytes to write, and the current write buffer is full
* OR non-existant. We need to write (nbytes - so_far) bytes, at ptr to 'stream'.
* Buffer does exist if __ocnt is zero. Thus the call to __flsbuf will flush any
* filled buffers AND/OR initialise the data structures to accept written data,
* so we can legally check __ocnt, __flags and __bufsiz.  This call to __flsbuf is
* basically simulating putc, so it MUST always decrement __ocnt before the call.
*/
--stream->__ocnt;
if (__flsbuf(*ptr++, stream) == EOF) return 0;
so_far++;
/* Now we have guaranteed a single character write, we can inspect the stream data
* to see the size of the buffer and start bypassing the buffering
*/
if (stream->__ocnt > 0 && (stream->__flag & _IOFBF) && stream->__bufsiz > 0) {
/* Looks like it is worth attempting a direct write to the file */
int nblocks, count, loop;
nblocks = (nbytes - so_far - 1) / stream->__bufsiz;
if (nblocks > 0) {
/* There cannot be any _IOLAZY pending as __flsbuf will have taken care of it.
* We want _writebuf to simply pass this data directly to _sys_write (and under
* RISC OS to OS_GBPB)
*/
/* Wind back one character so we can attempt block writes */
--ptr;
--so_far;
++stream->__ocnt;
--stream->__ptr;
--stream->__extrap->__extent;
#ifdef WRITE_ONCE
count = nblocks * stream->__bufsiz;
(void) loop;
#else
count = stream->__bufsiz;
for (loop = 0; loop < nblocks; ++loop) {
#endif
if (_writebuf((unsigned char *)ptr, count, stream)) {
return 0;
}
so_far += count;
ptr += count;
#ifndef WRITE_ONCE
}
#endif
}
i = nbytes - so_far; /* Number of bytes remaining for partial buffer */
if (i > 0 && stream->__ocnt >= i) {
/* Second condition above *should* be always true */
memcpy(stream->__ptr, ptr, i);
stream->__ptr += i;
ptr += i;
so_far += i;
stream->__ocnt -= i;
}
}
/* Finally, the last few bytes are written in the naive loop */
for (i=so_far; i<nbytes; ++i) {
if (putc(*ptr++, stream) == EOF) return 0;
}
}
}
return nbytes;
}
size_t fwrite(const void *ptr, size_t itemsize, size_t count, FILE *stream)
{
/* The comments made about fread apply here too */
dbmsg("fwrite %d\n", count);
dbmsg_noNL("fwrite %d ", count);
dbmsg("itemsize %d\n", itemsize);
return itemsize == 0 ? count
: _write(ptr, itemsize*count, stream) / itemsize;
}
......
......@@ -100,6 +100,7 @@ extern void _init_user_alloc(void);
extern void _terminate_user_alloc(void);
extern void _sys_msg(const char *);
extern void _exit(int n);
extern void _terminate_getenv(void);
#ifdef __ARM
typedef int FILEHANDLE;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment