FundamentalsHTTP messagesStructure
A HTTP message consists of a head and an optional body. The message head of an HTTP
request consists of a request line and a collection of header fields. The message head
of an HTTP response consists of a status line and a collection of header fields. All
HTTP messages must include the protocol version. Some HTTP messages can optionally
enclose a content body.
HttpCore defines the HTTP message object model that closely follows the definition and
provides an extensive support for serialization (formatting) and deserialization
(parsing) of HTTP message elements.
Basic operationsHTTP request message
HTTP request is a message sent from the client to the server. The first line of
that message includes the method to be applied to the resource, the identifier of
the resource, and the protocol version in use.
stdout >HTTP response message
HTTP response is a message sent by the server back to the client after having
received and interpreted a request message. The first line of that message
consists of the protocol version followed by a numeric status code and its
associated textual phrase.
stdout >HTTP message common properties and methods
An HTTP message can contain a number of headers describing properties of the
message such as the content length, content type and so on. HttpCore provides
methods to retrieve, add, remove and enumerate headers.
stdout >
There is an efficient way to obtain all headers of a given type using the
HeaderIterator interface.
stdout >
It also provides convenience methods to parse HTTP messages into individual
header elements.
stdout >
HTTP headers get tokenized into individual header elements only on demand. HTTP
headers received over an HTTP connection are stored internally as an array of
chars and parsed lazily only when their properties are accessed.
HTTP entity
HTTP messages can carry a content entity associated with the request or response.
Entities can be found in some requests and in some responses, as they are optional.
Requests that use entities are referred to as entity enclosing requests. The HTTP
specification defines two entity enclosing methods: POST and PUT. Responses are
usually expected to enclose a content entity. There are exceptions to this rule such
as responses to HEAD method and 204 No Content, 304 Not Modified, 205 Reset Content
responses.
HttpCore distinguishes three kinds of entities, depending on where their content
originates:
streamed:
The content is received from a stream, or generated on the fly. In particular,
this category includes entities being received from a connection. Streamed
entities are generally not repeatable.
self-contained:
The content is in memory or obtained by means that are independent from
a connection or other entity. Self-contained entities are generally repeatable.
wrapping:
The content is obtained from another entity.
This distinction is important for connection management with incoming entities. For
entities that are created by an application and only sent using the HttpCore framework,
the difference between streamed and self-contained is of little importance. In that
case, it is suggested to consider non-repeatable entities as streamed, and those that
are repeatable as self-contained.
Repeatable entities
An entity can be repeatable, meaning its content can be read more than once. This
is only possible with self contained entities (like
ByteArrayEntity or StringEntity).
Using HTTP entities
Since an entity can represent both binary and character content, it has support
for character encodings (to support the latter, ie. character content).
The entity is created when executing a request with enclosed content or when the
request was successful and the response body is used to send the result back to
the client.
To read the content from the entity, one can either retrieve the input stream via
the HttpEntity#getContent() method, which returns an
java.io.InputStream, or one can supply an output stream to
the HttpEntity#writeTo(OutputStream) method, which will
return once all content has been written to the given stream.
The EntityUtils class exposes several static methods to
more easily read the content or information from an entity. Instead of reading
the java.io.InputStream directly, one can retrieve the whole
content body in a string / byte array by using the methods from this class.
When the entity has been received with an incoming message, the methods
HttpEntity#getContentType() and
HttpEntity#getContentLength() methods can be used for
reading the common metadata such as Content-Type and
Content-Length headers (if they are available). Since the
Content-Type header can contain a character encoding for text
mime-types like text/plain or text/html,
the HttpEntity#getContentEncoding() method is used to
read this information. If the headers aren't available, a length of -1 will be
returned, and NULL for the content type. If the
Content-Type header is available, a Header object will be
returned.
When creating an entity for a outgoing message, this meta data has to be supplied
by the creator of the entity.
stdout >Ensuring release of system resources
In order to ensure proper release of system resources one must close the content
stream associated with the entity.
Please note that HttpEntity#writeTo(OutputStream)
method is also required to ensure proper release of system resources once the
entity has been fully written out. If this method obtains an instance of
java.io.InputStream by calling
HttpEntity#getContent(), it is also expected to close
the stream in a finally clause.
When working with streaming entities, one can use the
EntityUtils#consume(HttpEntity) method to ensure that
the entity content has been fully consumed and the underlying stream has been
closed.
Creating entities
There are a few ways to create entities. The following implementations are provided
by HttpCore:
BasicHttpEntityByteArrayEntityStringEntityInputStreamEntityFileEntityEntityTemplateHttpEntityWrapperBufferedHttpEntityBasicHttpEntity
This is exactly as the name implies, a basic entity that represents an underlying
stream. This is generally used for the entities received from HTTP messages.
This entity has an empty constructor. After construction it represents no content,
and has a negative content length.
One needs to set the content stream, and optionally the length. This can be done
with the BasicHttpEntity#setContent(InputStream) and
BasicHttpEntity#setContentLength(long) methods
respectively.
ByteArrayEntityByteArrayEntity is a self contained, repeatable entity
that obtains its content from a given byte array. This byte array is supplied
to the constructor.
StringEntityStringEntity is a self contained, repeatable entity that
obtains its content from a java.lang.String object. It has
three constructors, one simply constructs with a given java.lang.String
object; the second also takes a character encoding for the data in the
string; the third allows the mime type to be specified.
env = System.getenv();
for (Entry envEntry : env.entrySet()) {
sb.append(envEntry.getKey()).append(": ")
.append(envEntry.getValue()).append("\n");
}
// construct without a character encoding (defaults to ISO-8859-1)
HttpEntity myEntity1 = new StringEntity(sb.toString());
// alternatively construct with an encoding (mime type defaults to "text/plain")
HttpEntity myEntity2 = new StringEntity(sb.toString(), "UTF-8");
// alternatively construct with an encoding and a mime type
HttpEntity myEntity3 = new StringEntity(sb.toString(), "text/html", "UTF-8");
]]>InputStreamEntityInputStreamEntity is a streamed, non-repeatable entity that
obtains its content from an input stream. It is constructed by supplying the input
stream and the content length. The content length is used to limit the amount of
data read from the java.io.InputStream. If the length matches
the content length available on the input stream, then all data will be sent.
Alternatively a negative content length will read all data from the input stream,
which is the same as supplying the exact content length, so the length is most
often used to limit the length.
FileEntityFileEntity is a self contained, repeatable entity that
obtains its content from a file. Since this is mostly used to stream large files
of different types, one needs to supply the content type of the file, for
instance, sending a zip file would require the content type
application/zip, for XML application/xml.
EntityTemplate
This is an entity which receives its content from a
ContentProducer interface. Content producers are
objects which produce their content on demand, by writing it out to an output
stream. They are expected to be able produce their content every time they are
requested to do so. So creating a EntityTemplate, one is
expected to supply a reference to a content producer, which effectively creates
a repeatable entity.
There are no standard content producers in HttpCore. It is basically just a
convenience interface to allow wrapping up complex logic into an entity. To use
this entity one needs to create a class that implements
ContentProducer and override the
ContentProducer#writeTo(OutputStream) method. Then, an instance of
custom ContentProducer will be used to write the
full content body to the output stream. For instance, an HTTP server would serve
static files with the FileEntity, but running CGI programs
could be done with a ContentProducer, inside which
one could implement custom logic to supply the content as it becomes available.
This way one does not need to buffer it in a string and then use a
StringEntity or ByteArrayEntity.
stdout >HttpEntityWrapper
This is the base class for creating wrapped entities. The wrapping entity holds
a reference to a wrapped entity and delegates all calls to it. Implementations
of wrapping entities can derive from this class and need to override only those
methods that should not be delegated to the wrapped entity.
BufferedHttpEntityBufferedHttpEntity is a subclass of
HttpEntityWrapper. It is constructed by supplying another entity. It
reads the content from the supplied entity, and buffers it in memory.
This makes it possible to make a repeatable entity, from a non-repeatable entity.
If the supplied entity is already repeatable, calls are simply passed through to the
underlying entity.
Blocking HTTP connections
HTTP connections are responsible for HTTP message serialization and deserialization. One
should rarely need to use HTTP connection objects directly. There are higher level protocol
components intended for execution and processing of HTTP requests. However, in some cases
direct interaction with HTTP connections may be necessary, for instance, to access
properties such as the connection status, the socket timeout or the local and remote
addresses.
It is important to bear in mind that HTTP connections are not thread-safe. It is strongly
recommended to limit all interactions with HTTP connection objects to one thread. The only
method of HttpConnection interface and its sub-interfaces
which is safe to invoke from another thread is HttpConnection#shutdown()
.
Working with blocking HTTP connections
HttpCore does not provide full support for opening connections because the process of
establishing a new connection - especially on the client side - can be very complex
when it involves one or more authenticating or/and tunneling proxies. Instead, blocking
HTTP connections can be bound to any arbitrary network socket.
HTTP connection interfaces, both client and server, send and receive messages in two
stages. The message head is transmitted first. Depending on properties of the message
head it may be followed by a message body. Please note it is very important to always
close the underlying content stream in order to signal that the processing of
the message is complete. HTTP entities that stream out their content directly from the
input stream of the underlying connection must ensure the content of the message body
is fully consumed for that connection to be potentially re-usable.
Over-simplified process of client side request execution may look like this:
Over-simplified process of server side request handling may look like this:
Please note that one should rarely need to transmit messages using these low level
methods and should use appropriate higher level HTTP service implementations instead.
Content transfer with blocking I/O
HTTP connections manage the process of the content transfer using the
HttpEntity interface. HTTP connections generate an entity object that
encapsulates the content stream of the incoming message. Please note that
HttpServerConnection#receiveRequestEntity() and
HttpClientConnection#receiveResponseEntity() do not retrieve or buffer any
incoming data. They merely inject an appropriate content codec based on the properties
of the incoming message. The content can be retrieved by reading from the content input
stream of the enclosed entity using HttpEntity#getContent().
The incoming data will be decoded automatically completely transparently for the data
consumer. Likewise, HTTP connections rely on
HttpEntity#writeTo(OutputStream) method to generate the content of an
outgoing message. If an outgoing messages encloses an entity, the content will be
encoded automatically based on the properties of the message.
Supported content transfer mechanisms
Default implementations of HTTP connections support three content transfer mechanisms
defined by the HTTP/1.1 specification:
Content-Length delimited:
The end of the content entity is determined by the value of the
Content-Length header. Maximum entity length:
Long#MAX_VALUE.
Identity coding:
The end of the content entity is demarcated by closing the underlying
connection (end of stream condition). For obvious reasons the identity encoding
can only be used on the server side. Max entity length: unlimited.
Chunk coding:
The content is sent in small chunks. Max entity length: unlimited.
The appropriate content stream class will be created automatically depending on
properties of the entity enclosed with the message.
Terminating HTTP connections
HTTP connections can be terminated either gracefully by calling
HttpConnection#close() or forcibly by calling
HttpConnection#shutdown(). The former tries to flush all buffered data
prior to terminating the connection and may block indefinitely. The
HttpConnection#close() method is not thread-safe. The latter terminates
the connection without flushing internal buffers and returns control to the caller as
soon as possible without blocking for long. The HttpConnection#shutdown()
method is thread-safe.
HTTP exception handling
All HttpCore components potentially throw two types of exceptions: IOException
in case of an I/O failure such as socket timeout or an socket reset and
HttpException that signals an HTTP failure such as a violation of
the HTTP protocol. Usually I/O errors are considered non-fatal and recoverable, whereas
HTTP protocol errors are considered fatal and cannot be automatically recovered from.
Protocol exceptionProtocolException signals a fatal HTTP protocol violation that
usually results in an immediate termination of the HTTP message processing.
HTTP protocol processors
HTTP protocol interceptor is a routine that implements a specific aspect of the HTTP
protocol. Usually protocol interceptors are expected to act upon one specific header or a
group of related headers of the incoming message or populate the outgoing message with one
specific header or a group of related headers. Protocol interceptors can also manipulate
content entities enclosed with messages, transparent content compression / decompression
being a good example. Usually this is accomplished by using the 'Decorator' pattern where
a wrapper entity class is used to decorate the original entity. Several protocol
interceptors can be combined to form one logical unit.
HTTP protocol processor is a collection of protocol interceptors that implements the
'Chain of Responsibility' pattern, where each individual protocol interceptor is expected
to work on the particular aspect of the HTTP protocol it is responsible for.
Usually the order in which interceptors are executed should not matter as long as they do
not depend on a particular state of the execution context. If protocol interceptors have
interdependencies and therefore must be executed in a particular order, they should be
added to the protocol processor in the same sequence as their expected execution order.
Protocol interceptors must be implemented as thread-safe. Similarly to servlets, protocol
interceptors should not use instance variables unless access to those variables is
synchronized.
Standard protocol interceptors
HttpCore comes with a number of most essential protocol interceptors for client and
server HTTP processing.
RequestContentRequestContent is the most important interceptor for
outgoing requests. It is responsible for delimiting content length by adding
Content-Length or Transfer-Content headers
based on the properties of the enclosed entity and the protocol version. This
interceptor is required for correct functioning of client side protocol processors.
ResponseContentResponseContent is the most important interceptor for
outgoing responses. It is responsible for delimiting content length by adding
Content-Length or Transfer-Content headers
based on the properties of the enclosed entity and the protocol version. This
interceptor is required for correct functioning of server side protocol processors.
RequestConnControlRequestConnControl is responsible for adding
Connection header to the outgoing requests, which is essential
for managing persistence of HTTP/1.0 connections. This
interceptor is recommended for client side protocol processors.
ResponseConnControlResponseConnControl is responsible for adding
Connection header to the outgoing responses, which is essential
for managing persistence of HTTP/1.0 connections. This
interceptor is recommended for server side protocol processors.
RequestDateRequestDate is responsible for adding
Date header to the outgoing requests. This interceptor is
optional for client side protocol processors.
ResponseDateResponseDate is responsible for adding
Date header to the outgoing responses. This interceptor is
recommended for server side protocol processors.
RequestExpectContinueRequestExpectContinue is responsible for enabling the
'expect-continue' handshake by adding Expect header. This
interceptor is recommended for client side protocol processors.
RequestTargetHostRequestTargetHost is responsible for adding
Host header. This interceptor is required for client side
protocol processors.
RequestUserAgentRequestUserAgent is responsible for adding
User-Agent header. This interceptor is recommended for client
side protocol processors.
ResponseServerResponseServer is responsible for adding
Server header. This interceptor is recommended for server side
protocol processors.
Working with protocol processors
Usually HTTP protocol processors are used to pre-process incoming messages prior to
executing application specific processing logic and to post-process outgoing messages.
Send the request to the target host and get a response.
Please note the BasicHttpProcessor class does not synchronize
access to its internal structures and therefore may be thread-unsafe.
HTTP context
Protocol interceptors can collaborate by sharing information - such as a processing
state - through an HTTP execution context. HTTP context is a structure that can be
used to map an attribute name to an attribute value. Internally HTTP context
implementations are usually backed by a HashMap. The primary
purpose of the HTTP context is to facilitate information sharing among various
logically related components. HTTP context can be used to store a processing state for
one message or several consecutive messages. Multiple logically related messages can
participate in a logical session if the same context is reused between consecutive
messages.
HttpContext instances can be linked together to form a
hierarchy. In the simplest form one context can use content of another context to
obtain default values of attributes not present in the local context.
stdout >HTTP parametersHttpParams interface represents a collection of immutable
values that define a runtime behavior of a component. In many ways HttpParams
is similar to HttpContext. The main
distinction between the two lies in their use at runtime. Both interfaces represent a
collection of objects that are organized as a map of textual names to object values, but
serve distinct purposes:
HttpParams is intended to contain simple objects:
integers, doubles, strings, collections and objects that remain immutable at
runtime. HttpParams is expected to be used in the
'write once - ready many' mode. HttpContext is
intended to contain complex objects that are very likely to mutate in the course of
HTTP message processing.
The purpose of HttpParams is to define a behavior of
other components. Usually each complex component has its own
HttpParams object. The purpose of HttpContext
is to represent an execution state of an HTTP process. Usually
the same execution context is shared among many collaborating objects.
HttpParams, like HttpContext
can be linked together to form a hierarchy. In the simplest form one set of parameters can
use content of another one to obtain default values of parameters not present in the local
set.
stdout >
Please note the BasicHttpParams class does not synchronize access to
its internal structures and therefore may be thread-unsafe.
HTTP parameter beansHttpParams interface allows for a great deal of
flexibility in handling configuration of components. Most importantly, new parameters
can be introduced without affecting binary compatibility with older versions. However,
HttpParams also has a certain disadvantage compared to
regular Java beans: HttpParams cannot be assembled using
a DI framework. To mitigate the limitation, HttpCore includes a number of bean classes
that can be used in order to initialize HttpParams objects
using standard Java bean conventions.
stdout >Blocking HTTP protocol handlersHTTP serviceHttpService is a server side HTTP protocol handler based on the
blocking I/O model that implements the essential requirements of the HTTP protocol for
the server side message processing as described by RFC 2616.
HttpService relies on HttpProcessor
instance to generate mandatory protocol headers for all outgoing
messages and apply common, cross-cutting message transformations to all incoming and
outgoing messages, whereas HTTP request handlers are expected to take care of
application specific content generation and processing.
HTTP request handlers
The HttpRequestHandler interface represents a
routine for processing of a specific group of HTTP requests. HttpService
is designed to take care of protocol specific aspects, whereas
individual request handlers are expected to take care of application specific HTTP
processing. The main purpose of a request handler is to generate a response object
with a content entity to be sent back to the client in response to the given
request.
Request handler resolver
HTTP request handlers are usually managed by a
HttpRequestHandlerResolver that matches a request URI to a request
handler. HttpCore includes a very simple implementation of the request handler
resolver based on a trivial pattern matching algorithm:
HttpRequestHandlerRegistry supports only three formats:
*, <uri>* and
*<uri>.
Users are encouraged to provide more sophisticated implementations of
HttpRequestHandlerResolver - for instance, based on
regular expressions.
Using HTTP service to handle requests
When fully initialized and configured, the HttpService can
be used to execute and handle requests for active HTTP connections. The
HttpService#handleRequest() method reads an incoming
request, generates a response and sends it back to the client. This method can be
executed in a loop to handle multiple requests on a persistent connection. The
HttpService#handleRequest() method is safe to execute from
multiple threads. This allows processing of requests on several connections
simultaneously, as long as all the protocol interceptors and requests handlers used
by the HttpService are thread-safe.
HTTP request executorHttpRequestExecutor is a client side HTTP protocol handler based
on the blocking I/O model that implements the essential requirements of the HTTP
protocol for the client side message processing, as described by RFC 2616.
HttpRequestExecutor relies on on HttpProcessor
instance to generate mandatory protocol headers for all outgoing
messages and apply common, cross-cutting message transformations to all incoming and
outgoing messages. Application specific processing can be implemented outside
HttpRequestExecutor once the request has been executed and a
response has been received.
Methods of HttpRequestExecutor are safe to execute from multiple
threads. This allows execution of requests on several connections simultaneously, as
long as all the protocol interceptors used by the HttpRequestExecutor
are thread-safe.
Connection persistence / re-use
The ConnectionReuseStrategy interface is intended to
determine whether the underlying connection can be re-used for processing of further
messages after the transmission of the current message has been completed. The default
connection re-use strategy attempts to keep connections alive whenever possible.
Firstly, it examines the version of the HTTP protocol used to transmit the message.
HTTP/1.1 connections are persistent by default, while
HTTP/1.0 connections are not. Secondly, it examines the value of the
Connection header. The peer can indicate whether it intends to
re-use the connection on the opposite side by sending Keep-Alive or
Close values in the Connection header. Thirdly,
the strategy makes the decision whether the connection is safe to re-use based on the
properties of the enclosed entity, if available.