------ Guide to Advanced HTTP Wagon Configuration ------ John Casey ------ 2011-12-12 ------ ~~ Licensed to the Apache Software Foundation (ASF) under one ~~ or more contributor license agreements. See the NOTICE file ~~ distributed with this work for additional information ~~ regarding copyright ownership. The ASF licenses this file ~~ to you under the Apache License, Version 2.0 (the ~~ "License"); you may not use this file except in compliance ~~ with the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, ~~ software distributed under the License is distributed on an ~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY ~~ KIND, either express or implied. See the License for the ~~ specific language governing permissions and limitations ~~ under the License. ~~ NOTE: For help with the syntax of this file, see: ~~ http://maven.apache.org/doxia/references/apt-format.html Advanced Configuration of the HttpClient HTTP Wagon %{toc} *Notice on Maven Versioning and Availability **Maven 2.2.0 Starting in <>, the HttpClient wagon was the implementation in use. The remainder of this document deals specifically with the differences between the HttpClient- and Sun-based HTTP wagons. **Maven 2.2.1 Due to several critical issues introduced by the HttpClient-based HTTP wagon, <> reverted back to the Sun implementation (a.k.a. 'lightweight') of the HTTP wagon as the default for HTTP/HTTPS transfers. The issues with the HttpClient-based wagon were mainly related to checksums, transfer timeouts, and NTLM proxies, and served as the primary cause for the release of <<2.2.1>> in the first place. <>, starting in <> you have a choice: you can use the default wagon implementation for a given protocol, or you can select an alternative wagon <<>> you'd like to use on a per-protocol basis. For more information, see the {{{./guide-wagon-providers.html}Guide to Wagon Providers}} \[3\]. **Maven 3.0.4 With 3.0.4, the default wagon http(s) is now the HttpClient based on {{{http://hc.apache.org/httpcomponents-client-ga}Apache Http Client 4.1.2}}. There is now a http connection pooling to prevent reopening http(s) to remote server for each requests. This pool feature is configurable with some parameters \[4\]. This new defaut wagon comes with some default configuration: * http(s) connection pool: default to 20. * readTimeout: default to 1800000ms (~30 minutes) (see section <<>> below) * default Preemptive Authentication only with PUT (GET doesn't use anymore default Preemptive Authentication) *Introduction Using the HttpClient-based HTTP wagon, you have a lot more control over the configuration used to access HTTP-based Maven repositories. For starters, you have fine-grained control over what HTTP headers are used when resolving artifacts. In addition, you can also configure a wide range of parameters to control the behavior of HttpClient itself. Best of all, you have the ability to control these headers and parameters for all requests, or individual request types (Maven issues GET, HEAD, and PUT requests for different parts of the artifact-management subsystem). *The Basics Without any special configuration, Maven's HTTP wagon will use some default HTTP headers and client parameters when managing artifacts. The default headers are: +---+ Cache-control: no-cache Cache-store: no-store Pragma: no-cache Expires: 0 Accept-Encoding: gzip +---+ In addition, PUT requests made with the HTTP wagon will use the following HttpClient parameter: +---+ http.protocol.expect-continue=true +---+ From the HttpClient documentation\[2\], this parameter provides the following functionality: ----- Activates 'Expect: 100-Continue' handshake for the entity enclosing methods. The 'Expect: 100-Continue' handshake allows a client that is sending a request message with a request body to determine if the origin server is willing to accept the request (based on the request headers) before the client sends the request body. The use of the 'Expect: 100-continue' handshake can result in noticeable performance improvement for entity enclosing requests (such as POST and PUT) that require the target server's authentication. 'Expect: 100-continue' handshake should be used with caution, as it may cause problems with HTTP servers and proxies that do not support HTTP/1.1 protocol. ----- Without this setting, PUT requests that require authentication will transfer their entire payload to the server before that server issues an authentication challenge. In order to complete the PUT request, the client must then re-send the payload with the proper credentials specified in the HTTP headers. This results in twice the bandwidth usage, and twice the time to transfer each artifact. Another option to avoid this double transfer is what's known as preemptive authentication, which involves sending the authentication headers along with the original PUT request. However, there are a few potential issues with this approach. For one thing, in the event you have an unused <<<\>>> entry that specifies an invalid username/password combination, some servers may respond with a <<<401 Unauthorized>>> even if the server doesn't actually require any authentication for the request. In addition, blindly sending authentication credentials with every request regardless of whether the server has made a challenge can result in a security hole, since the server may not make provisions to secure credentials for paths that don't require authentication. We'll discuss preemptive authentication in another example, below. *Configuring GET, HEAD, PUT, or All of the Above In all of the examples below, it's important to understand that we can configure the HTTP settings for all requests made to a given server, or for only one method. To configure all methods for a server, you'd use the following section of the <<>> file: +---+ [...] the-server [ Your configuration here. ] +---+ On the other hand, if you can live with the default configuration for most requests - say, HEAD and GET requests, which are used to check for the existence of a file and retrieve a file respectively - maybe you only need to configure the PUT method: +---+ [...] the-server [ Your configuration here. ] +---+ For clarity, the other two sections are <<<\>>> for GET requests, and <<<\>>> for HEAD requests. I know that's going to be hard to remember... *Taking Control of Your HTTP Headers As you may have noticed above, the default HTTP headers do have the potential to cause problems. For instance, some websites set the encoding for downloading GZipped files as <<>>, in spite of the fact that the HTTP request itself isn't being sent using GZip compression. If the client is using the <<>> header, this can result in the client itself decompressing the GZipped file and writing the decompressed file to the local disk with the original filename. This can be misleading to say the least, and can use up an inordinate amount of disk space on the local computer. To turn off this default behavior, we'll simply disable the default headers. Then, we'll need to respecify the other headers that we are still interested in, like this: +---+ [...] openssl false
Cache-control no-cache
Cache-store no-store
Pragma no-cache
Expires 0
Accept-Encoding *
[...]
[...]
+---+ *Fine-Tuning HttpClient Parameters Going beyond the power of HTTP request parameters, HttpClient provides a host of other configuration options. In most cases, you won't need to customize these. But in case you do, Maven provides access to specify your own fine-grained configuration for HttpClient. Again, you can specify these parameter customizations per-method (HEAD, GET, or PUT), or for all methods of interacting with a given server. For a complete list of supported parameters, see the link\[2\] in Resources section below. **Non-String Parameter Values Many of the configuration parameters for HttpClient have simple string values; however, there are important exceptions to this. In some cases, you may need to specify boolean, integer, or long values. In others, you may even need to specify a collection of string values. You can specify these using a simple formatting syntax, as follows: [[1]] <> <<<%b,\>>> [[2]] <> <<<%i,\>>> [[3]] <> <<<%l,\>>> (yes, that's an 'L', not a '1') [[4]] <> <<<%d,\>>> [[5]] <> <<<%c,\,\,\,...>>>, which could also be specified as: +---+ %c, , , , ... +---+ [] As you may have noticed, this syntax is similar to the format-and-data strategy used by functions like <<>> in many languages. The syntax has been chosen with this similarity in mind, to make it a little more intuitive to use. **Example: Using Preemptive Authentication Using the above syntax, we can configure preemptive authentication for PUT requests using the boolean HttpClient parameter <<>>, like this: +---+ my-server http.authentication.preemptive %b,true +---+ **Ignoring Cookies Like the example above, telling the HttpClient to ignore cookies for all methods of request is a simple matter of configuring the <<>> parameter (it uses a regular string value, so no special syntax is required): +---+ my-server http.protocol.cookie-policy ignore +---+ The configuration above can be useful in cases where the repository is using cookies - like the session cookies that are often mistakenly turned on or left on in appservers - alongside HTTP redirection. In these cases, it becomes far more likely that the cookie issued by the appserver will use a <<>> that is inconsistent with the one used by the client to access the server. If you have this problem, and know that you don't need to use this session cookie, you can ignore cookies from this server with the above configuration. *Support for General-Wagon Configuration Standards It should be noted that configuration options previously available in the HttpClient-driven HTTP wagon are still supported in addition to this new, fine-grained approach. These include the configuration of HTTP headers and connection timeouts. Let's examine each of these briefly: **HTTP Headers In all HTTP Wagon implementations, you can add your own HTTP headers like this: +---+ my-server Foo Bar +---+ It's important to understand that the above approach doesn't allow you to turn off all of the default HTTP headers; nor does it allow you to specify headers on a per-method basis. However, this configuration remains available in both the lightweight and httpclient-based Wagon implementations. **Connection Timeouts All wagon implementations that extend the <<>> class, including those for SCP, HTTP, FTP, and more, allow the configuration of a connection timeout, to allow the user to tell Maven how long to wait before giving up on a connection that has not responded. This option is preserved in the HttpClient-based wagon, but this wagon also provides a fine-grained alternative configuration that can allow you to specify timeouts per-method for a given server. The old configuration option - which is still supported - looks like this: +---+ my-server 6000 +---+ ...while the new configuration option looks more like this: +---+ my-server 10000 +---+ If all you need is a per-server timeout configuration, you still have the option to use the old <<<\>>> parameter. If you need to separate timeout preferences according to HTTP method, you can use one more like that specified directly above. ** Read time out With Wagon 2.0 and Apache Maven 3.0.4, a default timeout of 30 minutes comes by default. If you want to change this value, you can add the following setup in your settings: +---+ my-server 120000 +---+ *Resources [[1]] {{{http://hc.apache.org/httpclient-3.x/}HttpClient website}} [[2]] {{{http://hc.apache.org/httpclient-3.x/preference-api.html}HttpClient preference architecture and configuration guide}} [[3]] {{{./guide-wagon-providers.html}Guide to Wagon Providers}} [[4]] {{{http://maven.apache.org/wagon/wagon-providers/wagon-http/}Wagon Http}} []