Apache Slider: Service Registry End-to-End Scenarios¶
AM startup¶
-
AM starts, reads in configuration, creates provider
-
AM builds web site, involving provider in process (there's a possible race condition here, due to the AM registration sequence)
-
AM registers self with RM, including web and IPC ports, and receives list of existing containers; container loss notifications come in asynchronously (which is why the AM startup process is in a synchronized block)
-
AM inits it's
ApplicationState
instance with the config, instance description and RM-supplied container list. -
AM creates service registry client using ZK quorum and path provided when AM was started
-
AM registers standard endpoints: RPC, WebUI, REST APIs
-
AM registers standard content it can serve (e.g
yarn-site.xml
) -
AM passes registry to provider in
bind()
operation. -
AM triggers review of application state, requesting/releasing nodes as appropriate
Agent Startup: standalone¶
-
Container is issued to AM
-
AM chooses component, launches agent on it -with URL of AM a parameter (TODO: Add registry bonding of ZK quorum and path)
-
Agent starts up.
-
Agent locates AM via URL/ZK info
-
Agent heartbeats in with state
-
AM gives agent next state command.
AM gets state from agent:¶
-
Agent heartbeats in
-
AM decides if it wants to receive config
-
AM issues request for state information -all (dynamic) config data
-
Agent receives it
-
Agent returns all config state, including: hostnames, allocated ports, generated values (e.g. database connection strings, URLs) - as two-level (allows agent to define which config options are relevant to which document)
AM saves state for serving¶
-
AM saves state in RAM (assumptions: small, will rebuild on restart)
-
AM updates service registry with list of content that can be served up and URLs to retrieve them.
-
AM fields HTTP GET requests on content
AM Serves content¶
A simple REST service serves up content on paths published to the service registry. It is also possible to enumerate documents published by GET operations on parent paths.
-
On GET command, AM locates referenced agent values
-
AM builds up response document from K-V pairs. This can be in a limited set of formats: Hadoop XML, Java properties, YAML, CSV, HTTP, JSON chosen as ? type param. (this generation is done from template processing in AM using slider.core.template module)
-
response is streamed with headers of :
content-type
,content-length
, do not cache in proxy, expires, (with expiry date chosen as ??)
Slider Client¶
Currently slider client enumerates the YARN registry looking for slider instances -including any instances of the same application running before launching a cluster.
This
- has race conditions
- has scale limitations
O(apps-in-YARN-cluster)
+O(completed-apps-in-RM-memory)
- only retrieves configuration information from slider-deployed application instances. We do not need to restrict ourselves here.
Slider Client lists applications¶
slider registry --list [--servicetype <application-type>]
-
Client starts
-
Client creates creates service registry client using ZK quorum and path provided in client config properties (slider-client.xml)
-
Client enumerates registered services and lists them
Slider Client lists content published by an application instance¶
slider registry <instance> --listconf [--servicetype <application-type>]
-
Client starts
-
Client creates creates service registry client using ZK quorum and path provided in client config properties (slider-client.xml)
-
Client locates registered service entry -or fails
-
Client retrieves service data, specifically the listing of published documents
-
Client displays list of content
Slider Client retrieves content published by an application instance¶
slider registry <instance> --getconf <document> [--format (xml|properties|text|html|csv|yaml|json,...) [--dest <file>] [--servicetype <application-type>]
-
Client starts
-
Client creates creates service registry client using ZK quorum and path provided in client config properties (slider-client.xml)
-
Client locates registered service entry -or fails
-
Client retrieves service data, specifically the listing of published documents
-
Client locates URL of content
-
Client builds GET request including format
-
Client executes command, follows redirects, validates content length against supplied data.
-
Client prints response to console or saves to output file. This is the path specified as a destination, or, if that path refers to a directory, to a file underneath.
Slider Client retrieves content set published by an application instance¶
Here a set of documents published is retrieved in the desired format of an application.
Slider Client retrieves document and applies template to it¶
Here a set of documents published is retrieved in the desired format of an application.
slider registry <instance> --source <document> [--template <path-to-template>] [--outfile <file>] [--servicetype <application-type>]
-
document is retrieved as before, using a simple format such as json to retrieve it.
-
The document is parsed and converted back into K-V pairs
-
A template using a common/defined template library is applied to the content , generating the final output.
Template paths may include local filesystem paths or (somehow) something in a package file