Apache Knox gateway is a specialized reverse proxy gateway for various Hadoop REST APIs. However, the gateway is built entirely upon a fairly generic framework. This framework is used to “plug-in” all of the behavior that makes it specific to Hadoop in general and any particular Hadoop REST API. It would be equally as possible to create a customized reverse proxy for other non-Hadoop HTTP endpoints. This approach is taken to ensure that the Apache Knox gateway can scale with the rapidly evolving Hadoop ecosystem.
Throughout this guide we will be using a publicly available REST API to demonstrate the development of various extension mechanisms. http://openweathermap.org/
The gateway itself is a layer over an embedded Jetty JEE server. At the very highest level the gateway processes requests by using request URLs to lookup specific JEE Servlet Filter chain that is used to process the request. The gateway framework provides extensible mechanisms to assemble chains of custom filters that support secured access to services.
The gateway has two primary extensibility mechanisms: Service and Provider. The Service extensibility framework provides a way to add support for new HTTP/REST endpoints. For example, the support for WebHdfs is plugged into the Knox gateway as a Service. The Provider extensibility framework allows adding new features to the gateway that can be used across Services. An example of a Provider is an authentication provider. Providers can also expose APIs that other service and provider extensions can utilize.
Service and Provider integrations interact with the gateway framework in two distinct phases: Deployment and Runtime. The gateway framework can be thought of as a layer over the JEE Servlet framework. Specifically all runtime processing within the gateway is performed by JEE Servlet Filters. The two phases interact with this JEE Servlet Filter based model in very different ways. The first phase, Deployment, is responsible for converting fairly simple to understand configuration called topology into JEE WebArchive (WAR) based implementation details. The second phase, Runtime, is the processing of requests via a set of Filters configured in the WAR.
From an “ethos” perspective, Service and Provider extensions should attempt to incur complexity associated with configuration in the deployment phase. This should allow for very streamlined request processing that is very high performance and easily testable. The preference at runtime, in OO style, is for small classes that perform a specific function. The ideal set of implementation classes are then assembled by the Service and Provider plugins during deployment.
A second critical design consideration is streaming. The processing infrastructure is build around JEE Servlet Filters as they provide a natural streaming interception model. All Provider implementations should make every attempt to maintaining this streaming characteristic.
The table below describes the purpose of the current modules in the project. Of particular importance are the root pom.xml and the gateway-release module. The root pom.xml is critical because this is where all dependency version must be declared. There should be no dependency version information in module pom.xml files. The gateway-release module is critical because the dependencies declared there essentially define the classpath of the released gateway server. This is also true of the other -release modules in the project.
File/Module | Description |
---|---|
LICENSE | The license for all source files in the release. |
NOTICE | Attributions required by dependencies. |
README | A brief overview of the Knox project. |
CHANGES | A description of the changes for each release. |
ISSUES | The knox issues for the current release. |
gateway-util-common | Common low level utilities used by many modules. |
gateway-util-launcher | The launcher framework. |
gateway-util-urltemplate | The i18n logging and resource framework. |
gateway-i18n | The URL template and rewrite utilities |
gateway-i18n-logging-log4j | The integration of i18n logging with log4j. |
gateway-i18n-logging-sl4j | The integration of i18n logging with sl4j. |
gateway-spi | The SPI for service and provider extensions. |
gateway-provider-identity-assertion-common | The identity assertion provider base |
gateway-provider-identity-assertion-concat | An identity assertion provider that facilitates prefix and suffix concatenation. |
gateway-provider-identity-assertion-pseudo | The default identity assertion provider. |
gateway-provider-jersey | The jersey display provider. |
gateway-provider-rewrite | The URL rewrite provider. |
gateway-provider-rewrite-func-hostmap-static | Host mapping function extension to rewrite. |
gateway-provider-rewrite-func-service-registry | Service registry function extension to rewrite. |
gateway-provider-rewrite-step-secure-query | Crypto step extension to rewrite. |
gateway-provider-security-authz-acls | Service level authorization. |
gateway-provider-security-jwt | JSON Web Token utilities. |
gateway-provider-security-preauth | Preauthenticated SSO header support. |
gateway-provider-security-shiro | Shiro authentiation integration. |
gateway-provider-security-webappsec | Filters to prevent common webapp security issues. |
gateway-service-as | The implementation of the Access service POC. |
gateway-service-definitions | The implementation of the Service definition and rewrite files. |
gateway-service-hbase | The implementation of the HBase service. |
gateway-service-hive | The implementation of the Hive service. |
gateway-service-oozie | The implementation of the Oozie service. |
gateway-service-tgs | The implementation of the Ticket Granting service POC. |
gateway-service-webhdfs | The implementation of the WebHdfs service. |
gateway-server | The implementation of the Knox gateway server. |
gateway-shell | The implementation of the Knox Groovy shell. |
gateway-test-ldap | Pulls in all of the dependencies of the test LDAP server. |
gateway-server-launcher | The launcher definition for the gateway. |
gateway-shell-launcher | The launcher definition for the shell. |
knox-cli-launcher | A module to pull in all of the dependencies of the CLI. |
gateway-test-ldap-launcher | The launcher definition for the test LDAP server. |
gateway-release | The definition of the gateway binary release. Contains content and dependencies to be included in binary gateway package. |
gateway-test-utils | Various utilities used in unit and system tests. |
gateway-test | The functional tests. |
pom.xml | The top level pom. |
build.xml | A collection of utility for building and releasing. |
The project uses Maven in general with a few convenience Ant targets.
Building the project can be built via Maven or Ant. The two commands below are equivalent.
mvn clean install
ant
A more complete build can be done that builds and generates the unsigned ZIP release artifacts. You will find these in the target/{version} directory (e.g. target/0.7.0-SNAPSHOT).
mvn -Prelease clean install
ant release
There are a few other Ant targets that are especially convenient for testing.
This command installs the gateway into the {{{install}}} directory of the project. Note that this command does not first build the project.
ant install-test-home
This command starts the gateway and LDAP servers installed by the command above into a test GATEWAY_HOME (i.e. install). Note that this command does not first install the test home.
ant start-test-servers
So putting things together the following Ant command will build a release, install it and start the servers ready for manual testing.
ant release install-test-home start-test-servers
There are two distinct phases in the behavior of the gateway. These are the deployment and runtime phases. The deployment phase is responsible for converting topology descriptors into an executable JEE style WAR. The runtime phase is the processing of requests via WAR created during the deployment phase.
The deployment phase is arguably the more complex of the two phases. This is because runtime relies on well known JEE constructs while deployment introduces new framework concepts. The base concept of the deployment framework is that of a “contributor”. In the framework, contributors are pluggable component responsible for generating JEE WAR artifacts from topology files.
The goal of the deployment phase is to take easy to understand topology descriptions and convert them into optimized runtime artifacts. Our goal is not only should the topology descriptors be easy to understand, but have them be easy for a management system (e.g. Ambari) to generate. Think of deployment as compiling an assembly descriptor into a JEE WAR. WARs are then deployed to an embedded JEE container (i.e. Jetty).
Consider the results of starting the gateway the first time. There are two sets of files that are relevant for deployment. The first is the topology file <GATEWAY_HOME>/conf/topologies/sandbox.xml
. This second set is the WAR structure created during the deployment of the topology file.
data/deployments/sandbox.war.143bfef07f0/WEB-INF
web.xml
gateway.xml
shiro.ini
rewrite.xml
hostmap.txt
Notice that the directory sandbox.war.143bfef07f0
is an “unzipped” representation of a JEE WAR file. This specifically means that it contains a WEB-INF
directory which contains a web.xml
file. For the curious the strange number (i.e. 143bfef07f0) in the name of the WAR directory is an encoded timestamp. This is the timestamp of the topology file (i.e. sandbox.xml) at the time the deployment occurred. This value is used to determine when topology files have changed and redeployment is required.
Here is a brief overview of the purpose of each file in the WAR structure.
The deployment framework follows “visitor” style patterns. Each topology file is parsed and the various constructs within it are “visited”. The appropriate contributor for each visited construct is selected by the framework. The contributor is then passed the contrust from the topology file and asked to update the JEE WAR artifacts. Each contributor is free to inspect and modify any portion of the WAR artifacts.
The diagram below provides an overview of the deployment processing. Detailed descriptions of each step follow the diagram.
The gateway server loads a topology file from conf/topologies into an internal structure.
The gateway server delegates to a deployment factory to create the JEE WAR structure.
The deployment factory first creates a basic WAR structure with WEB-INF/web.xml.
Each provider and service in the topology is visited and the appropriate deployment contributor invoked. Each contributor is passed the appropriate information from the topology and modifies the WAR structure.
A complete WAR structure is returned to the gateway service.
The gateway server uses internal container APIs to dynamically deploy the WAR.
The Java method below is the actual code from the DeploymentFactory that implements this behavior. You will note the initialize, contribute, finalize sequence. Each contributor is given three opportunities to interact with the topology and archive. This allows the various contributors to interact if required. For example, the service contributors use the deployment descriptor added to the WAR by the rewrite provider.
public static WebArchive createDeployment( GatewayConfig config, Topology topology ) {
Map<String,List<ProviderDeploymentContributor>> providers;
Map<String,List<ServiceDeploymentContributor>> services;
DeploymentContext context;
providers = selectContextProviders( topology );
services = selectContextServices( topology );
context = createDeploymentContext( config, topology.getName(), topology, providers, services );
initialize( context, providers, services );
contribute( context, providers, services );
finalize( context, providers, services );
return context.getWebArchive();
}
Below is a diagram that provides more detail. This diagram focuses on the interactions between the deployment factory and the service deployment contributors. Detailed description of each step follow the diagram.
The gateway server loads global configuration (i.e.
The gateway server loads a topology descriptor file.
The gateway server delegates to the deployment factory to create a deployable WAR structure.
The deployment factory creates a runtime descriptor to configure that gateway servlet.
The deployment factory creates a basic WAR structure and adds the gateway servlet runtime descriptor to it.
The deployment factory creates a deployment context object and adds the WAR structure to it.
For each service defined in the topology descriptor file the appropriate service deployment contributor is selected and invoked. The correct service deployment contributor is determined by matching the role of a service in the topology descriptor to a value provided by the getRole() method of the ServiceDeploymentContributor interface. The initializeContribution method from each service identified in the topology is called. Each service deployment contributor is expected to setup any runtime artifacts in the WAR that other services or provides may need.
The contributeService method from each service identified in the topology is called. This is where the service deployment contributors will modify any runtime descriptors.
One of they ways that a service deployment contributor can modify the runtime descriptors is by asking the framework to contribute filters. This is how services are loosely coupled to the providers of features. For example a service deployment contributor might ask the framework to contribute the filters required for authorization. The deployment framework will then delegate to the correct provider deployment contributor to add filters for that feature.
Finally the finalizeContribution method for each service is invoked. This provides an opportunity to react to anything done via the contributeService invocations and tie up any loose ends.
The populated WAR is returned to the gateway server.
The following diagram will provided expanded detail on the behavior of provider deployment contributors. Much of the beginning and end of the sequence shown overlaps with the service deployment sequence above. Those steps (i.e. 1-6, 17) will not be described below for brevity. The remaining steps have detailed descriptions following the diagram.
For each provider the appropriate provider deployment contributor is selected and invoked. The correct service deployment contributor is determined by first matching the role of a provider in the topology descriptor to a value provided by the getRole() method of the ProviderDeploymentContributor interface. If this is ambiguous, the name from the topology is used match the value provided by the getName() method of the ProviderDeploymentContributor interface. The initializeContribution method from each provider identified in the topology is called. Each provider deployment contributor is expected to setup any runtime artifacts in the WAR that other services or provides may need. Note: In addition, others provider not explicitly referenced in the topology may have their initializeContribution method called. If this is the case only one default instance for each role declared vis the getRole() method will be used. The method used to determine the default instance is non-deterministic so it is best to select a particular named instance of a provider for each role.
Each provider deployment contributor will typically add any runtime deployment descriptors it requires for operation. These descriptors are added to the WAR structure within the deployment context.
The contributeProvider method of each configured or default provider deployment contributor is invoked.
Each provider deployment contributor populates any runtime deployment descriptors based on information in the topology.
Provider deployment contributors are never asked to contribute to the deployment directly. Instead a service deployment contributor will ask to have a particular provider role (e.g. authentication) contribute to the deployment.
A service deployment contributor asks the framework to contribute filters for a given provider role.
The framework selects the appropriate provider deployment contributor and invokes its contributeFilter method.
During this invocation the provider deployment contributor populate populate service specific information. In particular it will add filters to the gateway servlet’s runtime descriptor by adding JEE Servlet Filters. These filters will be added to the resources (or URLs) identified by the service deployment contributor.
The finalizeContribute method of all referenced and default provider deployment contributors is invoked.
The provider deployment contributor is expected to perform any final modifications to the runtime descriptors in the WAR structure.
The runtime behavior of the gateway is somewhat simpler as it more or less follows well known JEE models. There is one significant wrinkle. The filter chains are managed within the GatewayServlet as opposed to being managed by the JEE container. This is the result of an early decision made in the project. The intention is to allow more powerful URL matching than is provided by the JEE Servlet mapping mechanisms.
The diagram below provides a high level overview of the runtime processing. An explanation for each step is provided after the diagram.
A REST client makes a HTTP request that is received by the embedded JEE container.
A filter chain is looked up in a map of URLs to filter chains.
The filter chain, which is itself a filter, is invoked.
Each filter invokes the filters that follow it in the chain. The request and response objects can be wrapped in typically JEE Filter fashion. Filters may not continue chain processing and return if that is appropriate.
Eventually the end of the last filter in the chain is invoked. Typically this is a special “dispatch” filter that is responsible for dispatching the request to the ultimate endpoint. Dispatch filters are also responsible for reading the response.
The response may be in the form of a number of content types (e.g. application/json, text/xml).
The response entity is streamed through the various response wrappers added by the filters. These response wrappers may rewrite various portions of the headers and body as per their configuration.
The return of the response entity to the client is ultimately “pulled through” the filter response wrapper by the container.
The response entity is returned original client.
This diagram providers a more detailed breakdown of the request processing. Again descriptions of each step follow the diagram.
A REST client makes a HTTP request that is received by the embedded JEE container.
The embedded container looks up the servlet mapped to the URL and invokes the service method. This our case the GatewayServlet is mapped to /* and therefore receives all requests for a given topology. Keep in mind that the WAR itself is deployed on a root context path that typically contains a level for the gateway and the name of the topology. This means that there is a single GatewayServlet per topology and it is effectivly mapped to
The GatewayServlet holds a single reference to a GatewayFilter which is a specialized JEE Servlet Filter. This choice was made to allow the GatewayServlet to dynamically deploy modified topologies. This is done by building a new GatewayFilter instance and replacing the old in an atomic fashion.
The GatewayFilter contains another layer of URL mapping as defined in the gateway.xml runtime descriptor. The various service deployment contributor added these mappings at deployment time. Each service may add a number of different sub-URLs depending in their requirements. These sub-URLs will all be mapped to independently configured filter chains.
The GatewayFilter invokes the doFilter method on the selected chain.
The chain invokes the doFilter method of the first filter in the chain.
Each filter in the chain continues processing by invoking the doFilter on the next filter in the chain. Ultimately a dispatch filter forward the request to the real service instead of invoking another filter. This is sometimes referred to as pivoting.
TODO
<web-app>
<servlet>
<servlet-name>sample</servlet-name>
<servlet-class>org.apache.hadoop.gateway.GatewayServlet</servlet-class>
<init-param>
<param-name>gatewayDescriptorLocation</param-name>
<param-value>gateway.xml</param-value>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>sandbox</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>
<listener>
<listener-class>org.apache.hadoop.gateway.services.GatewayServicesContextListener</listener-class>
</listener>
...
</web-app>
<gateway>
<resource>
<role>WEATHER</role>
<pattern>/weather/**?**</pattern>
<filter>
<role>authentication</role>
<name>sample</name>
<class>...</class>
</filter>
<filter>...</filter>*
</resource>
</gateway>
@Test
public void testDevGuideSample() throws Exception {
Template pattern, input;
Matcher<String> matcher;
Matcher<String>.Match match;
// GET http://api.openweathermap.org/data/2.5/weather?q=Palo+Alto
pattern = Parser.parse( "/weather/**?**" );
input = Parser.parse( "/weather/2.5?q=Palo+Alto" );
matcher = new Matcher<String>();
matcher.add( pattern, "fake-chain" );
match = matcher.match( input );
assertThat( match.getValue(), is( "fake-chain") );
}
There are a number of extension points available in the gateway: services, providers, rewrite steps and functions, etc. All of these use the Java ServiceLoader mechanism for their discovery. There are two ways to make these extensions available on the class path at runtime. The first way to to add a new module to the project and have the extension “built-in”. The second is to add the extension to the class path of the server after it is installed. Both mechanism are described in more detail below.
Extensions are discovered via Java’s [Service Loader|http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html] mechanism. There are good [tutorials|http://docs.oracle.com/javase/tutorial/ext/basics/spi.html] available for learning more about this. The basics come town to two things.
Implement the service contract interface (e.g. ServiceDeploymentContributor, ProviderDeploymentContributor)
Create a file in META-INF/services of the JAR that will contain the extension. This file will be named as the fully qualified name of the contract interface (e.g. org.apache.hadoop.gateway.deploy.ProviderDeploymentContributor). The contents of the file will be the fully qualified names of any implementation of that contract interface in that JAR.
One tip is to include a simple test with each of you extension to ensure that it will be properly discovered. This is very helpful in situations where a refactoring fails to change the a class in the META-INF/services files. An example of one such test from the project is shown below.
@Test
public void testServiceLoader() throws Exception {
ServiceLoader loader = ServiceLoader.load( ProviderDeploymentContributor.class );
Iterator iterator = loader.iterator();
assertThat( "Service iterator empty.", iterator.hasNext() );
while( iterator.hasNext() ) {
Object object = iterator.next();
if( object instanceof ShiroDeploymentContributor ) {
return;
}
}
fail( "Failed to find " + ShiroDeploymentContributor.class.getName() + " via service loader." );
}
One way to extend the functionality of the server without having to recompile is to add the extension JARs to the servers class path. As an extensible server this is made straight forward but it requires some understanding of how the server’s classpath is setup. In the
The bin directory contains very small “launcher” jars that contain only enough code to read configuration and setup a class path. By default the configuration of a launcher is embedded with the launcher JAR but it may also be extracted into a .cfg file. In that file you will see how the class path is defined.
class.path=../lib/*.jar,../dep/*.jar;../ext;../ext/*.jar
The paths are all relative to the directory that contains the launcher JAR.
Note that order is significant. The lib JARs take precedence over dep JARs and they take precedence over ext classes and JARs.
Integrating an extension into the project follows well established Maven patterns for adding modules. Below are several points that are somewhat unique to the Knox project.
Add the module to the root pom.xml file’s
Any new dependencies must be represented in the root pom.xml file’s
If the extension is to be “built into” the released gateway server it needs to be added as a dependency to the gateway-release module. This is done by adding to the
More detailed examples of adding both a service and a provider extension are provided in subsequent sections.
Services are extensions that are responsible for converting information in the topology file to runtime descriptors. Typically services do not require their own runtime descriptors. Rather, they modify either the gateway runtime descriptor (i.e. gateway.xml) or descriptors of other providers (e.g. rewrite.xml).
The service provider interface for a Service is ServiceDeploymentContributor and is shown below.
package org.apache.hadoop.gateway.deploy;
import org.apache.hadoop.gateway.topology.Service;
public interface ServiceDeploymentContributor {
String getRole();
void initializeContribution( DeploymentContext context );
void contributeService( DeploymentContext context, Service service ) throws Exception;
void finalizeContribution( DeploymentContext context );
}
Each service provides an implementation of this interface that is discovered via the ServerLoader mechanism previously described. The meaning of this is best understood in the context of the structure of the topology file. A fragment of a topology file is shown below.
<topology>
<gateway>
....
</gateway>
<service>
<role>WEATHER</role>
<url>http://api.openweathermap.org/data</url>
</service>
....
</topology>
With these two things a more detailed description of the purpose of each ServiceDeploymentContributor method should be helpful.
<service><role>
with a particular ServiceDeploymentContributor implementation. See below how the example WeatherDeploymentContributor implementation returns the role WEATHER that matches the value in the topology file. This will result in the WeatherDeploymentContributor’s methods being invoked when a WEATHER service is encountered in the topology file.public class WeatherDeploymentContributor extends ServiceDeploymentContributorBase {
private static final String ROLE = "WEATHER";
@Override
public String getRole() {
return ROLE;
}
...
}
In order to understand the job of the ServiceDeploymentContributor a few runtime descriptors need to be introduced.
<gateway>
<resource>
<role>WEATHER</role>
<pattern>/weather/**?**</pattern>
<filter>
<role>authentication</role>
<name>sample</name>
<class>...</class>
</filter>
<filter>...</filter>*
...
</resource>
</gateway>
<rules>
<rule dir="IN" name="WEATHER/openweathermap/inbound/versioned/file"
pattern="*://*:*/**/weather/{version}?{**}">
<rewrite template="{$serviceUrl[WEATHER]}/{version}/weather?{**}"/>
</rule>
</rules>
With these two descriptors in mind a detailed breakdown of the WeatherDeploymentContributor’s contributeService method will make more sense. At a high level the important concept is that contributeService is invoked by the framework for each
public class WeatherDeploymentContributor extends ServiceDeploymentContributorBase {
...
@Override
public void contributeService( DeploymentContext context, Service service ) throws Exception {
contributeResources( context, service );
contributeRewriteRules( context );
}
private void contributeResources( DeploymentContext context, Service service ) throws URISyntaxException {
ResourceDescriptor resource = context.getGatewayDescriptor().addResource();
resource.role( service.getRole() );
resource.pattern( "/weather/**?**" );
addAuthenticationFilter( context, service, resource );
addRewriteFilter( context, service, resource );
addDispatchFilter( context, service, resource );
}
private void contributeRewriteRules( DeploymentContext context ) throws IOException {
UrlRewriteRulesDescriptor allRules = context.getDescriptor( "rewrite" );
UrlRewriteRulesDescriptor newRules = loadRulesFromClassPath();
allRules.addRules( newRules );
}
...
}
The DeploymentContext parameter contains information about the deployment as well as the WAR structure being created via deployment. The Service parameter is the object representation of the
protected void addRewriteFilter( DeploymentContext context, Service service, ResourceDescriptor resource ) {
context.contributeFilter( service, resource, "rewrite", null, null );
}
<project>
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.hadoop</groupId>
<artifactId>gateway</artifactId>
<version>0.7.0-SNAPSHOT</version>
</parent>
<artifactId>gateway-service-weather</artifactId>
<name>gateway-service-weather</name>
<description>A sample extension to the gateway for a weather REST API.</description>
<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<dependencies>
<dependency>
<groupId>${gateway-group}</groupId>
<artifactId>gateway-spi</artifactId>
</dependency>
<dependency>
<groupId>${gateway-group}</groupId>
<artifactId>gateway-provider-rewrite</artifactId>
</dependency>
... Test Dependencies ...
</dependencies>
</project>
As of release 0.6.0, the gateway now also supports a declarative way of plugging-in a new Service. A Service can be defined with a combination of two files, these are:
service.xml
rewrite.xml
The rewrite.xml file contains the rewrite rules as defined in other sections of this guide, and the service.xml file contains the various routes (paths) to be provided by the Service and the rewrite rule bindings to those paths. This will be described in further detail in this section.
While the service.xml file is absolutely required, the rewrite.xml file in theory is optional (though it is highly unlikely that no rewrite rules are needed).
To add a new service, simply add a service.xml and rewrite.xml file in an appropriate directory (see Service Definition Directory Structure) in the module gateway-service-definitions to make the new service part of the Knox build.
Below is a sample of a very simple service.xml file, taking the same weather api example.
<service role="WEATHER" name="weather" version="0.1.0">
<routes>
<route path="/weather/**?**"/>
</routes>
</service>
<topology>
<gateway>
....
</gateway>
<service>
<role>WEATHER</role>
<name>weather</name>
<version>0.1.0</version>
<url>http://api.openweathermap.org/data</url>
</service>
....
</topology>
If only role is specified in the topology file (the only required element other than url) then the first service definition of that role found will be used with the highest version of that role and name. Similarly if only the version is omitted from the topology specification of the service, the service definition of the highest version will be used. It is therefore important to specify a version for a service if it is desired that a topology be locked down to a specific version of a service.
Below is an example of a snippet from the WebHDFS service definition
<route path="/webhdfs/v1/**?**">
<rewrite apply="WEBHDFS/webhdfs/inbound/namenode/file" to="request.url"/>
<rewrite apply="WEBHDFS/webhdfs/outbound/namenode/headers" to="response.headers"/>
</route>
This element can be used at the service level (i.e. as a child of the service tag) or at the route level. A dispatch specified at the route level takes precedence over a dispatch specified at the service level. By default the dispatch used is org.apache.hadoop.gateway.dispatch.DefaultDispatch.
The dispatch tag has four attributes that can be specified.
contributor-name : This attribute can be used to specify a deployment contributor to be invoked for a custom dispatch.
classname : This attribute can be used to specify a custom dispatch class.
ha-contributor-name : This attribute can be used to specify a deployment contributor to be invoked for custom HA dispatch functionality.
ha-classname : This attribute can be used to specify a custom dispatch class with HA functionality.
Only one of contributor-name or classname should be specified and one of ha-contributor-name or ha-classname should be specified.
If providing a custom dispatch, either a jar should be provided, see Class Path or a Maven Module should be created.
This element can contain one or more policy elements. The order of the policy elements is important as that will be the order of execution.
For example,
<service role="FOO" name="foo" version="1.0.0">
<policies>
<policy role="webappsec"/>
<policy role="authentication"/>
<policy role="rewrite"/>
<policy role="identity-assertion"/>
<policy role="authorization"/>
</policies>
<routes>
<route path="/foo/?**">
<rewrite apply="FOO/foo/inbound" to="request.url"/>
<policies>
<policy role="webappsec"/>
<policy role="federation"/>
<policy role="identity-assertion"/>
<policy role="authorization"/>
<policy role="rewrite"/>
</policies>
<dispatch contributor-name="http-client" />
</route>
</routes>
<dispatch contributor-name="custom-client" ha-contributor-name="ha-client"/>
</service>
The rewrite.xml file that accompanies the service.xml file follows the same rules as described in the section Rewrite Provider.
On installation of the Knox gateway, the following directory structure can be found under ${GATEWAY_HOME}/data. This is a mirror of the directories and files under the module gateway-service-definitions.
services
|______ service name
|______ version
|______service.xml
|______rewrite.xml
For example,
services
|______ webhdfs
|______ 2.4.0
|______service.xml
|______rewrite.xml
To test out a new service, you can just add the appropriate files (service.xml and rewrite.xml) in a directory under ${GATEWAY_HOME}/data/services. If you want to make the service contribution to the Knox build, they files need to go in the gateway-service-definitions module.
The runtime artifacts as well as the behavior does not change whether the service is plugged in via the deployment descriptors or through a service.xml file.
When writing a custom dispatch class, one often needs configuration or gateway services. A lightweight dependency injection system is used that can inject instances of classes or primitives available in the filter configuration’s init params or as a servlet context attribute.
Details of this can be found in the module gateway-util-configinjector and also an example use of it is in the class org.apache.hadoop.gateway.dispatch.DefaultDispatch. Look at the following method for example:
@Configure
protected void setReplayBufferSize(@Default("8") int size) {
replayBufferSize = size;
}
public interface ProviderDeploymentContributor {
String getRole();
String getName();
void initializeContribution( DeploymentContext context );
void contributeProvider( DeploymentContext context, Provider provider );
void contributeFilter(
DeploymentContext context,
Provider provider,
Service service,
ResourceDescriptor resource,
List<FilterParamDescriptor> params );
void finalizeContribution( DeploymentContext context );
}
<project>
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.hadoop</groupId>
<artifactId>gateway</artifactId>
<version>0.7.0-SNAPSHOT</version>
</parent>
<artifactId>gateway-provider-security-authn-sample</artifactId>
<name>gateway-provider-security-authn-sample</name>
<description>A simple sample authorization provider.</description>
<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<dependencies>
<dependency>
<groupId>${gateway-group}</groupId>
<artifactId>gateway-spi</artifactId>
</dependency>
</dependencies>
</project>
package org.apache.hadoop.gateway.deploy;
import ...
public interface DeploymentContext {
GatewayConfig getGatewayConfig();
Topology getTopology();
WebArchive getWebArchive();
WebAppDescriptor getWebAppDescriptor();
GatewayDescriptor getGatewayDescriptor();
void contributeFilter(
Service service,
ResourceDescriptor resource,
String role,
String name,
List<FilterParamDescriptor> params );
void addDescriptor( String name, Object descriptor );
<T> T getDescriptor( String name );
}
public class Topology {
public URI getUri() {...}
public void setUri( URI uri ) {...}
public String getName() {...}
public void setName( String name ) {...}
public long getTimestamp() {...}
public void setTimestamp( long timestamp ) {...}
public Collection<Service> getServices() {...}
public Service getService( String role, String name ) {...}
public void addService( Service service ) {...}
public Collection<Provider> getProviders() {...}
public Provider getProvider( String role, String name ) {...}
public void addProvider( Provider provider ) {...}
}
public interface GatewayDescriptor {
List<GatewayParamDescriptor> params();
GatewayParamDescriptor addParam();
GatewayParamDescriptor createParam();
void addParam( GatewayParamDescriptor param );
void addParams( List<GatewayParamDescriptor> params );
List<ResourceDescriptor> resources();
ResourceDescriptor addResource();
ResourceDescriptor createResource();
void addResource( ResourceDescriptor resource );
}
TODO - Describe the service registry and other global services.
gateway-provider-rewrite org.apache.hadoop.gateway.filter.rewrite.api.UrlRewriteRulesDescriptor
<rules>
<rule
dir="IN"
name="WEATHER/openweathermap/inbound/versioned/file"
pattern="*://*:*/**/weather/{version}?{**}">
<rewrite template="{$serviceUrl[WEATHER]}/{version}/weather?{**}"/>
</rule>
</rules>
<rules>
<filter name="WEBHBASE/webhbase/status/outbound">
<content type="*/json">
<apply path="$[LiveNodes][*][name]" rule="WEBHBASE/webhbase/address/outbound"/>
</content>
<content type="*/xml">
<apply path="/ClusterStatus/LiveNodes/Node/@name" rule="WEBHBASE/webhbase/address/outbound"/>
</content>
</filter>
</rules>
@Test
public void testDevGuideSample() throws Exception {
URI inputUri, outputUri;
Matcher<Void> matcher;
Matcher<Void>.Match match;
Template input, pattern, template;
inputUri = new URI( "http://sample-host:8443/gateway/topology/weather/2.5?q=Palo+Alto" );
input = Parser.parse( inputUri.toString() );
pattern = Parser.parse( "*://*:*/**/weather/{version}?{**}" );
template = Parser.parse( "http://api.openweathermap.org/data/{version}/weather?{**}" );
matcher = new Matcher<Void>();
matcher.add( pattern, null );
match = matcher.match( input );
outputUri = Expander.expand( template, match.getParams(), null );
assertThat(
outputUri.toString(),
is( "http://api.openweathermap.org/data/2.5/weather?q=Palo+Alto" ) );
}
@Test
public void testDevGuideSampleWithEvaluator() throws Exception {
URI inputUri, outputUri;
Matcher<Void> matcher;
Matcher<Void>.Match match;
Template input, pattern, template;
Evaluator evaluator;
inputUri = new URI( "http://sample-host:8443/gateway/topology/weather/2.5?q=Palo+Alto" );
input = Parser.parse( inputUri.toString() );
pattern = Parser.parse( "*://*:*/**/weather/{version}?{**}" );
template = Parser.parse( "{$serviceUrl[WEATHER]}/{version}/weather?{**}" );
matcher = new Matcher<Void>();
matcher.add( pattern, null );
match = matcher.match( input );
evaluator = new Evaluator() {
@Override
public List<String> evaluate( String function, List<String> parameters ) {
return Arrays.asList( "http://api.openweathermap.org/data" );
}
};
outputUri = Expander.expand( template, match.getParams(), evaluator );
assertThat(
outputUri.toString(),
is( "http://api.openweathermap.org/data/2.5/weather?q=Palo+Alto" ) );
}
TODO - Cover the supported content types. TODO - Provide a XML and JSON “properties” example where one NVP is modified based on value of another name.
<rules>
<filter name="WEBHBASE/webhbase/regions/outbound">
<content type="*/json">
<apply path="$[Region][*][location]" rule="WEBHBASE/webhbase/address/outbound"/>
</content>
<content type="*/xml">
<apply path="/TableInfo/Region/@location" rule="WEBHBASE/webhbase/address/outbound"/>
</content>
</filter>
</rules>
<gateway>
...
<resource>
<role>WEBHBASE</role>
<pattern>/hbase/*/regions?**</pattern>
...
<filter>
<role>rewrite</role>
<name>url-rewrite</name>
<class>org.apache.hadoop.gateway.filter.rewrite.api.UrlRewriteServletFilter</class>
<param>
<name>response.body</name>
<value>WEBHBASE/webhbase/regions/outbound</value>
</param>
</filter>
...
</resource>
...
</gateway>
HBaseDeploymentContributor
params = new ArrayList<FilterParamDescriptor>();
params.add( regionResource.createFilterParam().name( "response.body" ).value( "WEBHBASE/webhbase/regions/outbound" ) );
addRewriteFilter( context, service, regionResource, params );
TODO - Provide an lowercase function as an example.
<rules>
<functions>
<hostmap config="/WEB-INF/hostmap.txt"/>
</functions>
...
</rules>
TODO - Provide an lowercase step as an example.
<rules>
<rule dir="OUT" name="WEBHDFS/webhdfs/outbound/namenode/headers/location">
<match pattern="{scheme}://{host}:{port}/{path=**}?{**}"/>
<rewrite template="{gateway.url}/webhdfs/data/v1/{path=**}?{scheme}?host={$hostmap(host)}?{port}?{**}"/>
<encrypt-query/>
</rule>
</rules>
Adding a new identity assertion provider is as simple as extending the AbstractIdentityAsserterDeploymentContributor and the CommonIdentityAssertionFilter from the gateway-provider-identity-assertion-common module to initialize any specific configuration from filter init params and implement two methods:
To implement a simple toUpper or toLower identity assertion provider:
package org.apache.hadoop.gateway.identityasserter.caseshifter.filter;
import org.apache.hadoop.gateway.identityasserter.common.filter.AbstractIdentityAsserterDeploymentContributor;
public class CaseShifterIdentityAsserterDeploymentContributor extends AbstractIdentityAsserterDeploymentContributor {
@Override
public String getName() {
return "CaseShifter";
}
protected String getFilterClassname() {
return CaseShifterIdentityAssertionFilter.class.getName();
}
}
We merely need to provide the provider name for use in the topology and the filter classname for the contributor to add to the filter chain.
For the identity assertion filter itself it is just a matter of extension and the implementation of the two methods described earlier:
package org.apache.hadoop.gateway.identityasserter.caseshifter.filter;
import javax.security.auth.Subject;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import org.apache.hadoop.gateway.identityasserter.common.filter.CommonIdentityAssertionFilter;
public class CaseShifterIdentityAssertionFilter extends CommonIdentityAssertionFilter {
private boolean toUpper = false;
@Override
public void init(FilterConfig filterConfig) throws ServletException {
String upper = filterConfig.getInitParameter("caseshift.upper");
if ("true".equals(upper)) {
toUpper = true;
}
}
@Override
public String[] mapGroupPrincipals(String mappedPrincipalName, Subject subject) {
return null;
}
@Override
public String mapUserPrincipal(String principalName) {
if (toUpper) {
principalName = principalName.toUpperCase();
}
else {
principalName = principalName.toLowerCase();
}
return principalName;
}
}
Note that the above:
That is the extent of what is needed to implement a new identity assertion provider module.
TODO
public class AuditingSample {
private static Auditor AUDITOR = AuditServiceFactory.getAuditService().getAuditor(
"sample-channel", "sample-service", "sample-component" );
public void sampleMethod() {
...
AUDITOR.audit( Action.AUTHORIZATION, sourceUrl, ResourceType.URI, ActionOutcome.SUCCESS );
...
}
}
@Messages( logger = "org.apache.project.module" )
public interface CustomMessages {
@Message( level = MessageLevel.FATAL, text = "Failed to parse command line: {0}" )
void failedToParseCommandLine( @StackTrace( level = MessageLevel.DEBUG ) ParseException e );
}
public class CustomLoggingSample {
private static GatewayMessages MSG = MessagesFactory.get( GatewayMessages.class );
public void sampleMethod() {
...
MSG.failedToParseCommandLine( e );
...
}
}
@Resources
public interface CustomResources {
@Resource( text = "Apache Hadoop Gateway {0} ({1})" )
String gatewayVersionMessage( String version, String hash );
}
public class CustomResourceSample {
private static GatewayResources RES = ResourcesFactory.get( GatewayResources.class );
public void sampleMethod() {
...
String s = RES.gatewayVersionMessage( "0.0.0", "XXXXXXX" ) );
...
}
}
Apache Knox, Apache Knox Gateway, Apache, the Apache feather logo and the Apache Knox Gateway project logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Apache Knox uses the standard Apache license.
Apache Knox uses the standard Apache privacy policy.
Information about your use of this website is collected using server access logs and a tracking cookie. The collected information consists of the following:
Part of this information is gathered using a tracking cookie set by the Google Analytics service. Google’s policy for the use of this information is described in their privacy policy. See your browser’s documentation for instructions on how to disable the cookie if you prefer not to share this data with Google.
We use the gathered information to help us make our site more useful to visitors and to better understand how and when our site is used. We do not track or collect personally identifiable information or associate gathered data with any personally identifying information from other sources.
By using this website, you consent to the collection of this data in the manner and for the purpose described above.