News Aggregation Framework for the ASF ====================================== Distributed Atom files, aggregated into a single place. Projects maintain and publish their own news items. Once aggregated the information is published via a number of forms. Anticipated mediums for publishing the information are - website - email announce lists - rss - atom - twitter - internal redirections - internal, periodic board reporting The main idea behind the proposal is to simplify the process for individual projects publishing news and provide a more unified approach to press and marketing relationships by the ASF. In addition to publishing a website with the information, feeds will be produced to allow people to keep up to date with the ASF projects. Details ------- Each news item will be in a Atom entry. It is proposed that items will need to be cryptographically signed before being included in the output to provide - definitive proof of authorship - ability to verify the integrity of the file The exact type of cryptographic system has yet to be decided, but the following 2 options seem to be the forerunners. PGP - use of existing keys to sign and then using the PGP keyrings to provide authentication x509 - creating a dedicated CA to provide x509 certificates for use when creating/publishing news items is possible. In either case managing the authorisation mechanism would be done by the PRC. Once approved items would be published without further intervention of PRC, with the exception of press releases (see below). Types of News Items ------------------- Each news item will be created as a standard Atom file, within one or more of the categories below. Each category is designed to map to a use of the information and it's likely applicability. Press Release - items that are created as press releases are intended to be an initial indication to the PRC that a project wishes to have a press release issued. The content of the file should contain a projects initial text and some idea of the reasons for the press release. Release - the release of a project, whether a major, minor or bug fix. The text should contain the release text that the project wishes to be sent to the relevant announce lists. Project - news about the project. This could be as simple as a new committer or as complex as announcing significant issues PMC - changes to the PMC membership. Events - information about upcoming events that are taking place that may be of interest to committers of the ASF. Internal Redirections --------------------- The news items will automaticallt trigger all neccesary external communications. Each project will have a profile that contains information for email addresses/lists that need to be contacted when a new release is available. It is anticipated that this mechanism will be flexible enough to allow different distribution profiles for different categories of news. This mechanism will be used for sending emails to the announce lists, removing the requirement for the project release managers to know which lists should be notified and providing a consistency that is often missing today. Basic Framework --------------- Following the methodology used for projects.a.o and people.a.o the essential layout will be very similar and will work as follows. 1. Where do we get the files? A single, small XML file that will be kept in a central SVN repository, containing a list of all locations that should be retrieved when processing. Locations can be either a single file within SVN or a directory. In the case of an SVN directory being specified the directory will be checked out and each file found will be processed. 2. Can we include the file? Each file that has been found is checked for the inclusion of an encrypted signature. Once the signature is found it is checked for validity and if valid it is included. In addition the file checksum is checked, and any file that has an invalid checksum is rejected. If the GUID has already been seen (see below) and the checksum has been changed or it is invalid, the existing file is removed. 3. Build the cache An internal cache is maintained by retrieving the files from their external locations. Files that are retrieved have their GUID stored with a checksum of the file as stored. If the same file is retrieved again and the checksum has changed it is assumed that the file has been modified. 4. Which files do we process? In default mode all files stored will be processed into the relevant outputs, but it is also anticipated that the processing can be restricted to certain dates. Once all files are available locally (in the cache) a list of files to be processed is built and used from this point forward. 5. Process the files. It's anticipated that XSLT transformation will be used to create HTML and feeds as required.