FEATURE INTRODUCTION
====================

Feature Name:
-------------
	Congestion Control

Synopsis of Feature: 
-------------------- 

The main purpose of this congestion control feature is to keep track
of which hosts are congested so that TS will not forward requests to
those congested hosts; instead, TS will send back to clients a
Retry-After response to tell them to retry congested hosts at a later
time.

CASE 1.  Connection Failures:
------------------------------
(1) For each request to a live (non-congested) server, TS will try at most m
    times to connect to the server, and the timeout is n seconds for each try.
    If TS does not succeed with m tries, then one connection failure is counted
    towards the server.

    Note that if a client aborts a request before a timeout occurs, it does not
    count as a connection failure.

(2) A server is marked congested if there are more than M connection failures
    within N seconds.

(3) If a server is marked congested, then TS will not forward requests to it 
    until Proxy Retry After Time (PRAT) (which is current time + t)

(4) For a request to a congested server before the server's PRAT time, TS sends
    a Retry-After response to tell the client to retry the request after
    Client Retry After Time (CRAT) (= PRAT - current time + T + a random 
    interger from 0 to alpha).

(5) For a request to a congested server after the server's PRAT time, TS will
    try at most m' times to connect to the server, and the timeout is n' 
    seconds for each try.

(6) A congested server will stay dead if TS cannot make a successful 
    connection; otherwise, the server becomes live again.

CASE 2: Maximum Number of Connections
-------------------------------------
TS will temporarily mark a server as congested if a "max_connection" number
to the server is reached. If a new client request comes in and needs a new 
connection to the server, the client will get 503 Retry-After back. 
There is no PRAT on the "max_connection" reached servers.

Here a server can be identified by IP address (per_ip) or by host name 
(per_host). For example, www.inktomi.com has two IP addresses, 
209.131.63.206 and 209.131.63.207. If per_ip is used, then each IP address
has its own number of connection failures, and each IP address will be marked
congested or not by itself. That is, if 206 is marked congested (but 207 is 
not), requests can still be forwarded to 207. On the other hand, if per_host is
used, then one connection failure to either 206 or 207 will be counted to the
number if connection failures of host www.inktomi.com. If the host 
www.inktomi.com is marked congested, then essentially both 206 and 207 are
marked congested. 

We can also use prefix as a secondary specifier to specify the scope of 
congestion control to sub-host (service) area. For example, 
   dest_host=www.inktomi.com prefix=/cgi/search.exe
This rule can detect the stop of the cgi program or database it depends on.
Each specification has an independent counter. The error of requests to 
www.inktomi.com/index.html will count independently to the counter of this 
line. The prefix=/cgi/ means all requests to the objects under /cgi/ have
one common counter with specified parameters. It does not mean each URI 
under the directory has its own counter.

The TS administrator will be able to specify the customizeable error_page, 
the error_page can be customized to return the reason (for example: "The
site is under maintenance") of congestion for '503'. In the error_page, the
URL of the page congested, and the Retry-After time can be returned.

ENGINEERING DESCRIPTION:
========================

Risk points of feature:
-----------------------

The set of "origin server connect attempts" configuration varibles in 
records.config will be affected by this feature. See the following 
"Requirement on Server Management" and the above "Synopsis of feature"
sections for more information.

Also, there are some known problematic/unexpected behaviors of this feature.
See the following "Problematic behavior" section.

Effect on SDK/API:
------------------
	None.

Management Implications:
-----------------------

<Record Changes> 
There is one new config variable to enable/disable/test congestion control.
	proxy.config.http.congestion_control.enabled INT 0|1|2
	proxy.config.http.congestion_control.filename STRING congestion.config


<Statistics Changes>
1.  Number of congestions because of connection failures
    stat name: proxy.process.congestion.congested_on_conn_failures
2.  Number of congestions because of max_connection reached 
    stat name: proxy.process.congestion.congested_on_max_connection


<Config File>
A new .config file "congestion.config" is used to specify the parameters for 
different servers.

Each rule will have one primary key to identify the servers, the primaries 
can be 
	dest_host=
	dest_domain=
	dest_ip=
	regex_host=

Each rule can also have secondary keys, secondary keys include
	prefix=         // for different directory / service
	port=           // for different server ports

The tag=value pairs are used to specify the rules:

	max_connection_failures=<integer>	//  M
	fail_window=<interger>			//  N
	proxy_retry_interval=<integer>		//  t
	client_wait_interval=<integer>		//  T
	wait_interval_alpha=<integer>		//alpha
	live_os_conn_timeout=<integer>		//  n
	live_os_conn_retries=<integer>		//  m
	dead_os_conn_timeout=<integer>		//  n'
	dead_os_conn_retries=<interger>		//  m'
	max_connection=<integer>		// -1 means unlimited
	error_page=<page uri>          
	congestion_scheme=per_ip|per_host
	snmp=on|off

The suggested default values are as follows:
	max_connection_failures=5
	fail_window=120
	proxy_retry_interval=10
	client_wait_interval=300
	wait_interval_alpha=30
	live_os_conn_timeout=60
	live_os_conn_retries=2
	dead_os_conn_timeout=15
	dead_os_conn_retries=1
	max_connection=-1
	error_page="congestion#retryAfter"
	congestion_scheme="per_ip"
	snmp="on"

The above tag values will be used as default if the tag is not specified 
in the rule.

The default values can be overrided by setting the records.config variables
CONFIG proxy.config.http.congestion_control.default.<tag> <INT|STRING> <value>

The following "origin server connect attempts" configuration variables may
be affected by this congestion control feature:
	proxy.config.http.connect_attempts_max_retries
	proxy.config.http.connect_attempts_max_retries_dead_server
	proxy.config.http.connect_attempts_rr_retries
	proxy.config.http.connect_attempts_timeout
	proxy.config.http.down_server.cache_time
	proxy.config.http.down_server.abort_threshold

For a request to a server that does not have an applicable rule in 
congestion.config, the values for these "origins server connect attempts" 
variables are used by TS. Otherwise, the corresponding values specified 
in congestion.config will override them.


<Alarm Changes>
Add two new alarm types to Traffic Manager:
1) MGMT_SIGNAL_HTTP_CONGESTED_SERVER  
	used to indicate a congested server
2) MGMT_SIGNAL_HTTP_ALLEVIATED_SERVER 
	used to indicate a congested server is no longer congested
These alarms are not processed like the other Traffic Manager alarms.
Whenever these alarms are signalled (even if they are repeat alarms)
*only* an SNMP trap will be sent. Note, that this means that
potentially, users can be flooded with SNMP traps if a congested
server is always signalling an alarm. 


<SNMP Enhancement>
        - when TS detects a congestion server, an alarm is signaled
          and an SNMP trap is sent (to the configured console).
	- when a congested server is alleviated, an alarm is signaled
          and an SNMP trap is sent.
        - user can disable/enable support SNMP as a whole. SNMP cannot
          be disabled/enabled per feature basis (e.g.: turn off all congestion
          control traps only), but SNMP can be enabled/disabled on  per rules
	  basis (e.g.: ignore traps generated by congested server that 
          matches this rule)
	- two traps are defined in the Inktomi vendor MIB tree, they
          are:
	    httpCongestedServer    1.3.6.1.4.1.1967.3.1.3.3.2
	    httpAlleviatedServer   1.3.6.1.4.1.1967.3.1.3.3.3


<Web UI Enhancement>
For configuration purposes, we will add a new "Congestion Control" tab to the 
"Configure -> Networking -> Connection Management" section of the web UI.  
Within this tab users can:
1. enable/disable the congestion control feature
2. edit the congestion.config file (which will be displayed in a html text box)


<Command-Line Interface Enhancement>
Use the traffic_line command-line interface to retrieve the congestion 
statistics and monitoring information.
1. "traffic_line -r <statistic_name>" 
   Returns the value of the statistic specified 
2. "traffic_line -q" 
   Returns a list of currently congested sites (one site per line); 
   for each congested site, displays the information in the following format: 
   '<time>|<rule #>|<hostname>|<ip_address>|<scheme>|<prefix>|<congestion reason>|<F#>|<M#>'
	- time : congestion detected time 
		 in seconds since 00:00:00 UTC, January 1, 1970.
	- rule # 
	- hostname
	- ip address
	- scheme: per_ip or per_host
	- prefix (if none, leave blank)
	- congestion reason: M or F
          M - congestion caused by exceeding max connections, 
	  F - congestion caused by OS response timeout/failure
        - F#    number of congested requests because of F
        - M#    number of congested requests because of M

NOTE: In order to use "traffic_line -q", raf must be enabled and have a 
      raf port specified. These are the default values used.
	CONFIG proxy.config.raf.enabled INT 1
	CONFIG proxy.config.raf.port INT 9000
      If the raf port conflicts with another port, then change it by:
           traffic_line -s "proxy.config.raf.port" <new-port> 


Engineering description of feature:
-----------------------------------
Data structure and algorithm:

Congestion Control Database (in memory and disk):

Using Multicache Implementation
CongestEntry{
	unsigned int ip;
	int hostname_offset;
	int prefix_offset;
	int last_failure; 
	char fail_history[17];
	unsigned int congestion_scheme; // per_ip | per_host;
	unsigned int congested; //0 | 1
	short max_connection;
	short num_connection;
	short max_connection_failures;
	unsigned long   num_congested;    // reserved for per server stat.
};

For each server, TS uses an array of 17 entries to record the number of 
connection failures. Each entry is 16-bit long and records the number of 
connection failures for 1/16 of the fail_window, for example, for a 
fail_window=240 seconds, the granularity of recording is 240/16=15 seconds.
That is, the first entry records the number of connection failures from time
t+0 to t+15, and the second entry records the number of connection failures 
from time t+16 to t+30, and so on. TS will mark the server as congested if 
the sum of the 9 entries is greater than the specified max_connection_failures.
Note that this algorithm will count number of failures in the past 240 to 255
seconds. For higher accuracy, we need to increase the number of entries.


Operation:
----------

The following is an overview of the operation of this congestion control
feature in TS. After parsing a valid request, a TS calls its HostDB module
to get the HostDBInfo record for the host name. If the host name has more 
than one IP address, then TS selects one of them as usual. 
Then, TS uses the hostname, the selected IP address, and request URL to lookup
for the first matched rule in congestion.config. 

TS will lookup the CongestionDB  

case 1: "congested" is true and
         (current time <= "last_failure" + "proxy_retry_window"):
	TS sends to the client a Retry-After response.
case 2: "congested" is true and
	(current time > "last_failure" + "proxy_retry_window"):
	TS makes a connection to this congested server.
case 3: "congested" is false and
        (current connections >= max_connections)
	TS sends to the client a Retry-After response.
case 4: "congested" is false and
	(current connections < max_connections)
	TS makes a connection to this non-congested server.

Various timeouts and max_retry numbers are set up according to the matched 
rule in congestion.config.

If a connection failure is detected, TS updates the CongestionDB record. If it
is case 2, and the connection succeeds, we need to mark the server live again.


Problematic Behavior:
---------------------
For a host with multiple nicknames, we night mis-calculate the number
of failures.

For example, 
	www.berkeley.edu is a nickname for amber.berkeley.edu
	amber.berkeley.edu has address 128.32.25.12. 
In this simple case, if you only specify www.berkeley.edu in the rule and use
per_host scheme, we will miss the info when request use amber.berkeley.edu 
as the hostname.

Another problem is with the granularity of connection failure recording. In 
some cases, TS will mark servers congested which is actually not congested 
according to the rules in congestion.config.

TS will not be able to distinguish a original server busy from TS itself 
is busy.

Implementation Limits:
---------------------
	1. granularity of connection failure records (17 entries).
	2. maximum number of failures can be recorded (1<<16 = 65536).
	3. potentail performance hit beause of updating congestion info
           (need to take locks to update the info)
Modules need to be touched:
---------------------------
ControlMatcher
	one new primary field is added ---- host_regex

HttpSM / HttpTransact
	for apparent reasons

KNOWN PROBLEM
=============
(1) The config filename for congestion control must be congestion.config, 
    this is a known bug for TS
(2) The test case: 
    proxy.config.http.congestion_control.enabled INT 2 
    is not implemented, due to the limited time for coding


TEST DESCRIPTIONS
=================

Test description:
-----------------
(1) Enable congestion control and specify rules for a few server in 
    congestion.config. Then run (existing) tests to verify the 
    "origin server connect attempts" configuration variable are still
    working for servers that are not specified in congestion.config.
(2) Enable congestion control and in congestion.config, specify a rule
    for a server that can be controlled (up/down) and has only one IP
    address. Verify TS follow the rule for the server by sending requests 
    for the server thru TS and controlling whether the server is up and 
    down.
(3) Specify a prefix rule and a dest_host rule on the same host, and
    control the service specified by the prefix, check if prefix rule is
    in effect.
(4) Repeat test (2) with a server with more than one IP address. Both 
    congestion schemes (per_ip and per_host) should be tested.

(5) Test all possiable conbinations of rules. Check the error_page and 
    error logs.

(6) Kill one of the servers that connected to TS, hence the server is 
    "congested". Ensure the congested alarm is signaled (check the
    WebUI) and a SNMP trap is sent Re-start the dead
    server, hence the server is alive. Ensure the alleviated alarm is
    signaled and a SNMP trap is sent.

Test tool:
----------
	syntest could be a good candidate for functional tests.
	for load test, try jtest combined with syntest.

Test configurations:
--------------------


Change Log:
===========

Removed Feature(s):
***  Saving congestion control information to the disk.

New/Modified Feature(s):

*** traffic_line -q output format (short):

short format
   <time>|<rule #>|<hostname>|<ip_address>|<scheme>|<prefix>|<congestion reason>|<F#>|<M#>
long format
   <time>|<rule #>|<hostname>|<ip_address>|<scheme>|<prefix>|<congestion reason>|<F#>|<M#>|<local/GMT time>|<key>|<last_failure>|<num_fail_events>|<internal_ref count>|<num_connections>

	- time : congestion detected time
		 in seconds since 00:00:00 UTC, January 1, 1970.
	- rule # 
	- hostname
	- ip address
	- scheme: per_ip or per_host
	- prefix (if none, leave blank)
	- congestion reason: M or F
          M - congestion caused by exceeding max connections, 
	  F - congestion caused by OS response timeout/failure
        - F#    number of congested requests because of F
        - M#    number of congested requests because of M
	- key: the internal key in congestion control table
	- local/GMT time:  YYYY/MM/DD hh:mm:ss 
		CONFIG proxy.config.http.congestion_control.localtime INT 1 //localtime format
		CONFIG proxy.config.http.congestion_control.localtime INT 0 //GMT format


*** telnet localhost <Raf port>
0 congest list  
		-- list congested servers at the moment (short format)
0 congest list long [0-4]
		-- list congested servers at the moment (long format)
0 query deadhosts
		-- list congested servers at the moment

0 congest remove key=XXXXXXXXXXXXXX {key=XXXXXXXXXXXXXX} 
		-- remove the entries whose keys are listed
		   manual activate the congested server

0 congest remove host=<hostname>[/prefix]

0 congest remove ip=<xxx.xxx.xxx.xxx>[/prefix]

0 congest remove all
		-- remove all entries in the congestion control internal table

*** SNMP trap enable/disable per congestion rules