Where and When
Attendees
Topics/Discussions/Goals
Agenda Proposal
- Monday
- Tuesday
- Wednesday
- Thursday
- Friday
Prep Work
Notes from the Oakathon

Where and When

November 5th - 9th 2018
Location: Bucharest
Meeting Rooms: Game of Thrones 7C (5. - 7., 9.), Breaking Bad 7A (8.)

Attendees

Who	When
Andrei Dulceanu	5. - 8.
Bogdan Ieran Draghiciu	5. - 9.
Michael Dürig	6. - 9.
Marcel Reutegger	5. - 8. (remote)
Francesco Mari	6. - 9.
Axel Hanikel	5. - 9.
Matt Ryan	5. - 9.
Tomek Rękawek	5. - 9.
Thomas Müller	6. - 8. (remote)
Stefan Egli	5. - 8. (remote)
Julian Reschke	5. - 9. (remote) (Tuesday and Thursday afternoon only)

Topics/Discussions/Goals

Title	Summary	Effort	Participants	Proposed by
Independently-Releasable Oak Module	In the last Oakathon we discussed making it possible to release some Oak modules independently. We could try to split one out during the Oakathon to see what is required to make this work and determine the feasibility of the effort. Maybe another outcome could be to evaluate which modules it makes sense to release independently and which ones don't make sense.	2-5d		Matt Ryan
Enable CI for cloud blob storage	In the last Oakathon we talked about CI for the cloud-dependent bundles (e.g. `oak-blob-cloud`, `oak-blob-cloud-azure`). We could pick one and try to get CI working for it using a storage emulator (e.g. S3Mock for `oak-blob-cloud` - OAK-7743, Azurite for `oak-blob-cloud-azure` - OAK-7742) and document any gaps in the emulator that would need addressing (e.g. direct binary access capabilities).	2-5d	Matt Ryan and at least one more person	Matt Ryan
Oak Capabilities	The Sling Capabilities module is a fairly new Sling module that allows a user to query the system capabilities. I'd like to explore how we could make it possible for this functionality to determine repository capabilities and provide them to users - or if we should even do this in Oak. One use case for this is for a remote client to determine whether direct binary access is available.	2h	Everyone interested, Stefan Egli	Matt Ryan
OAK-7511	1.8, null annotations, and impact on backports (tracked in OAK-7669)	1h		Julian Reschke
Branching Oak 1.10	...and Jackrabbit 2.18	1h	Davide Giannella, Julian Reschke	Julian Reschke

Agenda Proposal

Monday

Morning: Enable CI for cloud blob storage, OAK-7511 and Oak Capabilities

Tuesday

After lunch: Independently-Releasable Oak Module.

Wednesday

Thursday

Friday

Prep Work

Notes from the Oakathon

Enable CI for Cloud Data Stores

We discussed this issue and the proposal to try using S3Mock for S3 data store and azurite for Azure data store. The following topics were discussed:

Mock frameworks by definition lag behind the official implementation. Using a mock framework can be helpful, but we need to be sure we also continue to run the tests against the real thing to make sure we aren't blind to critical implementation differences.
One of the issues we have is that it is not obvious when the S3 and Azure data store unit tests are ignored and not actually executed. It is reasonable to expect that a developer working on the S3 data store also has an S3 account and is therefore running all the unit tests. However it is also possible that changes could occur lower in the stack, done by someone not working directly on the cloud data stores, and running the full test suite gives a false sense that everything is fine when it is possible that something broke in the cloud data stores as a result of the change.
These tests are also not being executed during release voting. Voters run the release check and report their results, but the S3 and Azure data store tests aren't running at this time. We need to include in the release check report which backends were tested (if any) and provide instructions for how to run the S3 and Azure data store tests to those running the release check.
Adding a mock framework for S3 and Azure seems the wrong prioritization. The most important thing is to make sure the tests are running, and we can solve those problems first, and then worry about mocks.

The proposal for moving forward is:

In addition to the Apache CI, Adobe is also running CI for Oak internally. We can't provide S3 or Azure credentials to the public-facing Apache CI system, but we can do this internally at Adobe at least, and that's better than no testing at all. So we should start by checking the Adobe CI and making sure we are also running the S3 and Azure data store tests as part of that CI. 2. We need to make it more clear when running test suites that the S3 and Azure data store tests aren't being executed, and indicate what needs to be done to enable them. 3. We need to report which backends were tested by the release check, if any, and if any backend is not tested indicate via message to the user what can be done to configure a backend so the backends are tested. 4. Once all of those things are done we can revisit the implementation of mocks, as we still think it would be good to do.

Prototype Results

We tried this out in Jenkins with a configured storage container for each service with access tokens for each service. The config was provided to the build using the -Ds3.config option for !S3DataStore and -Dazure.config for AzureDataStore. We were able to perform successful CI builds of the Oak codebase using these config files and see that all the tests for the cloud data stores were executed (not ignored as before).

OAK-7887 was found during this process. A quick fix was prototyped to isolate the problem to the creation of the container in the init() function of the S3 backend, where presumably some test threads are trying to delete the container at the same time as others are trying to create it. Wrapping the create bucket call in a loop with a limited number of retries solved the problem. Further investigation is warranted to figure out the best way of handling this in the production and test code.

The cloud data stores are subject to occasional test failures caused by network glitches or service glitches, etc. These don't happen frequently but they do have the potential to cause a build to report a failure due to no issues with the code itself. We instead could create separate CI jobs in Jenkins, one for each cloud data store, that only builds that data store and runs the tests, and leave the regular CI build alone. For example, to do this for oak-blob-cloud you can run mvn clean test -pl oak-blob-cloud -am -Ds3.config=<file>. Doing this in separate builds allows us to get CI going for these data stores without failing the main build due to external issues.

Oak Capabilities

Sling has a new Sling Capabilities module which allows an HTTP client to determine specific capabilities supported by the system.

The first topic to be discussed was, should we even worry about trying to support this in Oak? After some discussion we seemed to align with the position that doing so would be useful for Oak. The argument was made that while it is currently possible to make a determination that a capability should be available (for example, by calling an API and getting an exception or null value returned), it would be preferable to have a method that is specifically designed to answer the question of whether a capability is available or not, rather than relying on a convention or side effect of another API not designed to answer this specific question.

The next thing discussed was an approach to the problem. We want to avoid a tight coupling between Sling and Oak. In particular we don't want a pattern where there is a 1:1 relationship between an Oak capability and a Sling Capabilities provider. At best that potentially means lots of Sling providers; at worst, it means potentially long lead times between when a capability is available in Oak and when it can be queried through Sling since Sling depends on a stable oak release (e.g. direct binary access - the feature is in Oak trunk now, but a Sling provider specifically for direct binary access could not be released in Sling until after the next Oak stable release). We discussed a couple of ways to do this, including JMX and repository descriptors.

After discussion around JMX we started leaning away from it due to inconsistencies between when the repository becomes available in OSGi and when a JMX Mbean would become available. This could lead to situations where a capability is reported as unavailable even if it is available. It would be preferable to tie the lifecycle of a capability to the lifecycle of the service providing it. This puts us more in favor of repository descriptors, which are simply registered with OSGi at the time the service starts.

Prototype of 'Apache Sling Capabilities - Oak Repository Descriptors'

Note that 'Sling Capabilities' is still a very young project and there's currently a discussion going on about it on the sling list.

part 1 : exposing the descriptors as capabilities

https://github.com/apache/sling-whiteboard/tree/master/capabilities-oak

part 2 : actually creating a descriptor which declares availability of S3 binary upload

store.data.binary.upload.descriptor

Modularization

Different modules were discussed as candidates to try out releasing a module independently. Modules that were considered easier and a better start are oak-api and oak-commons. Similarly modules in Jackrabbit could also be released independently (e.g. jackrabbit-api), which may make it easier in the future to add new features. Adding a new method or interface to jackrabbit-api would then only require a new release of the API bundle and not the entire Jackrabbit project. However, to keep things simple for now, an Oak module will be picked for a prototype.

Previous discussions with open questions and concerns can be found on the September 2018 Oakathon wiki page. The goal of the prototype should be to find answers to those questions and point out which of those didn't apply to the picked module.

The prototype is on GitHub.

Apache Jackrabbit : Oakathon November 2018