Commit 3b1e56ec authored by Stewart Brodie's avatar Stewart Brodie
Browse files

Import from SrcFiler of Browser fetchers

parents
*,ffb gitlab-language=bbcbasic linguist-language=bbcbasic linguist-detectable=true
c/** gitlab-language=c linguist-language=c linguist-detectable=true
h/** gitlab-language=c linguist-language=c linguist-detectable=true
cmhg/** gitlab-language=cmhg linguist-language=cmhg linguist-detectable=true
*,fe1 gitlab-language=make linguist-language=make linguist-detectable=true
This diff is collapsed.
Implementation Details
======================
Author: Stewart Brodie
Date: 29th October '97
1. Introduction
2. "Session.headers"
2.1 During Requests
2.2 During Responses
3. Reading data
4. Chunked data transfers
5. Data buffers
6. Counters
7. Errors
1. Introduction
The HTTP module is one of the more complex fetcher modules. It performs many
different operations required by the HTTP specifications and attempts to
fulfil them reliably and in a manner which is non-destructive to remote
servers. This document details *some* of the implementation decisions which
somebody maintaining this module may need to be aware of.
HTTP fetches are started with a call to SWI HTTP_GetData (via URL_GetURL).
Then the client application repeatedly calls ReadData and Status (again
via the URL module) until all the data is returned, and then HTTP_Stop (via
URL_DeregisterURL or URL_Stop) to clean up.
This is the standard fetcher interface, however, many things are done
internally to optimise use of resources, which are described below.
2. "Session.headers"
The Session data structure contains a "headers" member which is used for
accumulating HTTP headers. This is used for BOTH the request headers and
the response headers.
2.1 During Requests
When the request is being formulated (during HTTP_GetData), headers are added
to the list with "http_add_header". This function adds headers to a header
list (the function itself is a generic list builder). The header list can be
thought of as an associative array (duplicate indices allowed) which is kept
in the order in which items were added. Routines are provided for locating a
header, deleting a header, deleting all the headers. Once all the standard
headers have been added, attention turns to the client-supplied data (pointer
in R4 on entry). These are also parsed and any entity-body in the client data
is then duplicated into Session.data (length in Session.data_len).
Once the list is complete, the headers are all examined and inappropriate
ones are removed. If the method is an idempotent read (GET or HEAD), any
entity body is discarded completely and related headers are also ditched
(Content-Length, Content-Type, Transfer-Encoding). This can help clients
which "forget" to zero R4 (eg. !Browse (as of October 1997) on requests
generated by a redirect respose to a POST).
If the method was a POST or PUT, then a content-length header is sought, and
is added if it was missing. If it was already there but indicated an
amount of data exceeding Session.data_len, it is rewritten to match
Session.data_len.
Finally, a buffer is created and filled with the complete request buffer
and stored in Session.fullreq. The act of pushing everything into the buffer
also EMPTIES the Session.headers list. The first "header" line is separated
from its "value" by a space, subsequent lines use a colon. The first
"header" is actually the HTTP method being used (eg. "GET" or "POST") and the
value is the URI plus a space plus the HTTP version token - hence the reason
for the special treatment of the first line.
2.2 During Responses
A similar process is followed when reading the response. Whilst the headers
are still being retrieved (Session.donehead is zero or Session.chunking is
non-zero and Session.chunkstate is reading headers or footers) a COPY of the
data ready on an incoming socket is requested (recv is called with MSG_PEEK
in the fourth parameter) and written into an internally allocated buffer.
Headers are parsed from this just like they were from the client data on a
request - it uses the same routine. Content-length is trapped and parsed,
the value stored and the header then DISCARDED.
Once all the headers are known, then they are filtered as appropriate.
Content-length is ignored if there was a transfer-encoding. The HTTP version
is reset to 1.0 for the benefit of the client. Transfer-Encoding is parsed
to look for the string "chunked" which sets Session.chunking to non-zero and
the header is then removed. Connection headers are removed and any headers
matching tokens in the connection header are also removed (they do not
concern our client, as they refer solely to characteristics of the connection
between this module and the remote server). Incoming headers are sanitised
(continuation lines are rejoined, and leading/trailing spaces are removed and
generally tidied up). Cookies, if compiled in (need -DCOOKIE in Makefile),
are then parsed and stored if appropriate. However, they are not removed
from the header list.
After all of that, if not chunking, the content-length header is added back
to the response headers but at the end of the header list, and a buffer is
generated containing the complete response header. Thus not all of the
headers will be seen by the client, and those that are present are not
necessarily in the same order that the server sent them. This should not
concern the client.
As each header line is read from the buffer (including reconstructing
split lines), the data representing that header is read and discarded from
the TCP/IP stack (recv is called again but without MSG_PEEK). (Actually, the
header parser keeps a running total of how much data has been "consumed", and
only discards the data in one go when it has processed as many headers as it
could.
3. Reading data
Headers are read into a private buffer (Session.buffer). When reading body
data, data is read directly into the client's buffer. The function
"http_write_data_to_client" must always called in order to update R2, R3 and
R4 when client data has been generated and that function copies data into the
client buffer only if it needs to.
R2, R3 and R4 are updated so that future calls to write more data are
appended to the buffer (if they occur during the execution of the same SWI
call, of course). This can be done because the wrapper function
http_readdata enforces the spec requirement that R2 and R3 be preserved
across the call. Currently, the only time when data will be written to the
client buffer in multiple calls to this function is when the end of a "chunk"
has been reached and there is room to also store the start of the next chunk
too.
4. Chunked data transfers
The header reading & parsing code is used to parse the chunk-size
declarations although the Session.headers structure is not used to store the
data. When in "data reading" mode, data is read directly to the client
buffer, although the size of the transfer is limited to the minimum of the
buffer size and the amount of data left in the current chunk. This
simplifies the processing of chunked transfers greatly, since you never have
to worry about having to not put chunking headers in the client's buffer.
The conditions in the ReadData handler are made much more complex by the
chunking checks, but unless lots of code is to be duplicated, this cannot be
helped.
5. Data buffers
All of the data buffers allocated by the module are kept in the RMA. They
are managed with malloc/free/module_realloc. module_realloc is a wrapper to
the C library's realloc which is broken in the RISC OS 3.1 ROM build. The
code expects "broken_os" to be set non-zero if it needs to call OS_Module 13
directly instead of relying on the library. "broken_os" is initialised with
the truth value of "R1 result of OS_Byte 129,0,255" <= &A4
6. Counters
Session.sent indicates the amount of entity-body sent to the client, and NOT
the total number of bytes provided to the client. This is critical otherwise
you can accidentally cut off some transfers prematurely as explained here:
(All lengths are arbitrarily chosen numbers to illustrate the problem). A
client requests a URL, goes through all the startup and then calls the
ReadData SWI with details of its buffer - which is 8192 bytes long. The
module receives a block of data from the remote server which is 8192 bytes
long. The headers take up the first 1000 bytes, and the next 7192 bytes are
the first part of the object being transferred. The content-length of the
object being downloaded specifies its size as 8000 bytes. If we write the
headers and first 7192 bytes of the object to the clients buffer, then if we
include the header size in the Session.sent count, then when the exit value
for R5 is calculated, it will notice that the 8192 bytes which have been
"sent" exceeds (or matches) the object's content-length, hence it will
determine that the transfer is complete and set R5 to zero - which the client
is watching for to indicate completion. Thus we "lose" 808 bytes off the end
of the object. By ignoring the header size, R5 will be set correctly to 808
and the browser will continue to poll for the remaining data.
7. Errors
Are all stored in the Messages file and read into a static _kernel_oserror
buffer at run-time.
This diff is collapsed.
&83F80 SWI "HTTP_GetData"
&83F81 SWI "HTTP_Status"
&83F82 SWI "HTTP_ReadData"
&83F83 SWI "HTTP_Stop"
&83FBF SWI "HTTP_EnumerateCookies"
&83FBE SWI "HTTP_ConsumeCookie"
&83FBD SWI "HTTP_AddCookie"
#&83F88 SWI "HTTP_UserAgent" ## WITHDRAWN ##
Protocol specific information
-----------------------------
SWI HTTP_GetData
================
Meaning of flags in R0 on entry to this SWI are:
bit 0 set: R6 points to alternative user agent identifier
bit 1 set: R5 is the length of data pointed to by R4 (if R4 not zero)
Methods in bottom 8 bits of R2 are (NOT a bitfield)
1: GET
2: HEAD
4: POST
8: PUT
Bits 8-15 of R2 are the type of data wanted (if R0:1 set, else in bits 0-31 of R5)
0: Body only
1: Head only
2: Both
Bits 16-31 are reserved and should be zero
SWI "HTTP_EnumerateCookies"
===========================
On entry:
R0 = flags (bit 0 set means action, unset means info only)
ie only reset bit 16 if bit zero is set
(bit 1 set means cookies in list, unset means in queue)
R1 = unique cookie handle, 0 for initial call
On exit:
R0 = flags (bit 0 set means secure channel should be used)
R1 = unique cookie handle, 0 for no more cookies
R2 = total number of cookies created
R3 = number of cookies not read
R4 = pointer to domain name string
R5 = pointer to NAME string
R6 = pointer to VALUE string
R7 = pointer to path string
SWI "HTTP_ConsumeCookie"
========================
On entry:
R0 - flags, bit zero set for accept or unset for reject
R1 - session, forget what this is for
R2 - cookie handle
No exit defined
SWI "HTTP_AddCookie"
====================
On entry:
R0 - flags, bit 0 set for secure
R1 - name
R2 - value
R3 - expires
R4 - path
R5 - domain
No exit defined
The following SWI has been withdrawn in favour of an additional parameter
supplied to HTTP_Start.
## SWI "HTTP_UserAgent"
## ====================
## On entry:
## R0 - flags (currently reserved)
## R1 - session ID (ie pollword address, as per all other normal SWIs)
## R2 - pointer to a null terminated string giving the user agent string
##
## No exit defined
OPTIONS * HTTP/1.1
Host: bran
Connection: close
HTTP/1.1 200 OK
Date: Wed, 22 Oct 1997 13:35:26 GMT
Server: Apache/1.2.4
Content-Length: 0
Allow: GET, HEAD, OPTIONS, TRACE
Connection: close
POST /cgi-bin/post-query HTTP/1.1
Host: dsse.ecs.soton.ac.uk:8080
Content-Length: 1
Content-Type: application/x-www-form-urlencoded
Connection: close
1
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Date: Wed, 22 Oct 1997 14:20:32 GMT
Server: Apache/1.2.4
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
69
<H1>Query Results</H1>You submitted the following name/value pairs:<p>
<ul>
<li> <code>1 = </code>
</ul>
0
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# Project: httptest
# Toolflags:
CCflags = -c -depend !Depend -IC: -throwback
C++flags = -c -depend !Depend -IC: -throwback
Linkflags = -aif -c++ -o $@
ObjAsmflags = -throwback -NoCache -depend !Depend
CMHGflags =
LibFileflags = -c -o $@
Squeezeflags = -o $@
# Final targets:
@.test: @.o.test c:stubs.o
link $(linkflags) @.o.test c:stubs.o
# User-editable dependencies:
@.dates: @.c.dates
cc $(ccflags) -DTEST -o @.o.testdates
link $(linkflags) @.o.testdates c:stubs.o
# Static dependencies:
@.o.test: @.c.test
cc $(ccflags) -o @.o.test @.c.test
# Dynamic dependencies:
# Copyright 1998 Acorn Computers Ltd
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Project: HTTPMOD
# Toolflags:
CCflags = -c -depend !Depend -ITCPIPLibs:,C:,Debug: -ffahu -throwback -DCOMPAT_INET4 -DCOOKIE -UTRACE -UTML -zps1 -zM -Wp
CCtflags = -c -depend !Depend -ITCPIPLibs:,C:,Debug: -fahn -throwback -DCOMPAT_INET4 -DCOOKIE -DTRACE -UTML -zps1 -zM -Wp
C++flags = -c -depend !Depend -IC: -throwback
Linkflags = -rmf -c++ -o $@
ObjAsmflags = -throwback -NoCache -depend !Depend
CMHGflags = -depend !Depend -p -throwback -IC: -d HTTP.h
LibFileflags = -c -o $@
Squeezeflags = -o $@
ExtraLibs = o.syslog-lib
SyslogExtraLibs = <syslog$dir>.c-veneer.o.syslog
# Final targets:
@.HTTP: @.o.module @.o.readdata @.o.ses_ctrl @.o.start @.o.status @.o.stop @.o.protocol \
@.o.writedata @.o.hosttrack @.o.connpool @.o.res @.o.utils @.o.generic \
@.o.URLclient @.o.config @.o.dates \
@.o.connect @.o.dns @.o.cookie @.o.header @.o.HttpHdr C:o.Stubs TCPIPLibs:o.inetlibzm \
TCPIPLibs:o.socklibzm
link $(linkflags) @.o.module @.o.readdata @.o.ses_ctrl @.o.start @.o.protocol \
@.o.writedata @.o.hosttrack @.o.connpool @.o.res @.o.utils @.o.generic \
@.o.URLclient @.o.config @.o.dates \
@.o.status @.o.dns @.o.cookie @.o.stop @.o.HttpHdr @.o.connect @.o.header C:o.stubs \
TCPIPLibs:o.inetlibzm TCPIPLibs:o.socklibzm
Access $@ WR/R
# User-editable dependencies:
@.o.res: @.Res.List @.Res.Messages
ResGen messages_file @.o.res -via @.Res.List
normal: @.HTTP
@|
trace: @.HTTP-tr
@|
@.HTTP-tr: @.od.module @.od.readdata @.od.ses_ctrl @.od.start @.od.status @.od.stop @.od.protocol \
@.od.writedata @.od.hosttrack @.od.connpool @.o.res @.od.utils @.od.generic \
$(ExtraLibs) @.od.URLclient @.od.config @.od.dates \
@.od.connect @.od.dns @.od.cookie @.od.header @.o.HttpHdr C:o.Stubs TCPIPLibs:o.inetlibzm \
TCPIPLibs:o.socklibzm
link $(linkflags) @.od.module @.od.readdata @.od.ses_ctrl @.od.start @.od.protocol \
@.od.writedata @.od.hosttrack @.od.connpool @.o.res @.od.utils @.od.generic \
$(ExtraLibs) @.od.URLclient @.od.config @.od.dates \
@.od.status @.od.dns @.od.cookie @.od.stop @.o.HttpHdr @.od.connect @.od.header C:o.Stubs \
TCPIPLibs:o.inetlibzm TCPIPLibs:o.socklibzm
.SUFFIXES: .c .o .od
.c.od:; cc $(cctflags) -o od.$* $*.c
.c.o:; cc $(ccflags) -o o.$* $*.c