Understanding Repository Configuration of Apache Archiva

Archiva has two types of repository configuration: managed repository and remote repository.

Managed Repository

A managed repository is a repository which resides locally to the server where Archiva is running. It could serve as a proxy repository, an internal deployment repository or a local mirror repository.

Managed repository fields:

Id The identifier of the repository. This must be unique.
Name The name of the repository. This is the display name.
Directory The location of the repository. If the path specified does not exist, Archiva will create the missing directories.
Index Directory The location of the index files generated by Archiva. If no location is specified, then the index directory (named .indexer) will be created at the root of the repository directory. This directory contains the packaged/bundled index which is consumed by different consumers of the index such as M2Eclipse.
Type The repository layout (maven 2 or maven 1)
Cron Expression The cron schedule when repository scanning will be executed.
Days Older The first option for repository purge. Archiva will check how old the artifact is and if it is older than the set number of days in this field, then the artifact will be deleted respecting the retention count of course. In order to disable the purge by number of days old and set Archiva to purge by retention count, just set the repository purge field to 0. The maximum number of days which can be set here is 1000. See the Repository Purge section below for more details.
Retention Count The second option for repository purge. When running the repository purge, Archiva will retain only the number of artifacts set for this field for a specific snapshot version. See the Repository Purge section below for more details.
Description Additional information about the repository.
Releases Specifies whether there are released artifacts in the repository.
Snapshots Specifies whether there are snapshot artifacts in the repository.
Block Redeployments Specifies whether released artifacts that are already existing in the repository can be overwritten. Note that this only take effects for non-snapshot deployments.
Scanned Specifies whether the repository can be scanned, meaning it is a local repository which should be indexed, purged, etc.
Delete Released Snapshots Specifies whether to remove those snapshot artifacts which already has release versions of it in the repository during repository purge.
Staging Repository Automatic creation of a stage repository for this local repository.
Skip Packed Index Creation Avoid creation of compressed index for IDE usage.
Managed Repositories

Each repository has its own http(s)/webdav url. This allows the user to browse and access the repository via http(s)/webdav. The url has the following format:

http://[URL TO ARCHIVA]/repository/[REPOSITORY ID] (e.g. http://localhost:8080/repository/releases).

A pom snippet is also available for each repository. The <distributionManagement> section can be copied and pasted into a project's pom to specify that the project will be deployed in that managed repository. The <repositories> section on the other hand, can be copied and pasted to a project's pom.xml or to Maven's settings.xml to tell Maven to get artifacts from the managed repository when building the project.

Remote Repository

A remote repository is a repository which resides remotely. These repositories are usually the proxied repositories. See Proxy Connectors on how to proxy a repository.

Remote repository fields:

Id The identifier of the remote repository.
Name The name of the remote repository.
Url The url of the remote repository. It is also possible to use a 'file://' url to proxy a local repository. Be careful that if this local repository is a managed repository of archiva which has some proxies connectors, those ones won't be triggered.
Username The username (if authentication is needed) to be used to access the repository.
Password The password (if authentication is needed) to be used to access the repository.
Download Timeout The time in seconds after which a download from the remote repository is stopped.
Type The layout (maven 2 or maven 1) of the remote repository.
Download Remote Index To activate downloading remote index to add available remote artifacts in search queries.
Remote Index Url Can be relative to Url - path of the remote index directory.
Cron expression Cron expression for downloading remote index (default weekly on sunday)
Index Directory Path to store index directory, default will be ${appserver.base}/data/remotes/${repositoryId}/.indexer
Download Remote Index Timeout Time in seconds, after which download of remote index files will be stopped (default 300).
Proxy for Remote Download Index Proxy to use for downloading remote index files.
Download Remote Index on Startup If selected, the remote index will be downloaded on Archiva startup.
Description Can be used to store additional information about the repository.
Connection Check Path If set, the connection to the remote repository is checked by validating the existence of the given file / artifact. Some repositories do not allow to browse the base directory and the standard check may fail. The path is relative to the repository Url.
Additionnal Url Parameters Key/Value pairs to add to url when querying remote repository.
Additionnal Http Headers Key/Value pairs to add as http headers when querying remote repository.
Remote Repositories

You can also trigger an immediate download of remote index files.

Maven Index from Remote repositories

Since 1.4-M4: If you have configured download remote index, those files (Maven Indexer project format) will be available in the path http://[URL TO ARCHIVA]/repository/id/.index (you can consume those files for IDE)

Scanning a Repository

Repository scan can be executed on schedule or it can be explicitly executed by clicking the 'Scan Repository Now' button in the repositories page. By default, Archiva only processes new artifacts in the repository with respect to the last run of the repository scanner. Meaning that if the artifact's last modified date is newer than the last repository scan, then the artifact will be processed. Otherwise, it will be skipped. You can override this behavior and force Archiva to process all artifacts regardless of its age by ticking the 'Process All Artifacts' checkbox in the repositories page and clicking the 'Scan Repository Now' button.

Repositories

For every artifact found by the repository scanner, processing is done on this artifact by different consumers. Examples of the processing done are: indexing, repository purge and database update. Details about consumers are available in the Consumers page.

Repository Purge

Repository purge is the process of cleaning up the repository of old snapshots. When deploying a snapshot to a repository, Maven deploys the project/artifact with a timestamped version. Doing daily/nightly builds of the project then tends to bloat the repository. What if the artifact is large? Then disk space will definitely be a problem. That's where Archiva's repository purge feature comes in. Given a criteria to use -- by the number of days old and by retention count, it would clean up the repository by removing old snapshots.

Please take note that the by number of days old criteria is activated by default (set to 100 days). In order to de-activate it and use the by retention count criteria, you must set the Repository Purge By Days Older field to 0. Another thing to note here is that if the by number of days old criteria is activated, the retention count would still be respected (See the Repository Purge By Days Older section below for more details) but not the other way around.

Let's take a look at different behaviours for repository purge using the following scenario:

Artifacts in the repository:

../artifact-x/2.0-SNAPSHOT/artifact-x-20061118.060401-2.jar
../artifact-x/2.0-SNAPSHOT/artifact-x-20061118.060401-2.pom
../artifact-x/2.0-SNAPSHOT/artifact-x-20070113.034619-3.jar
../artifact-x/2.0-SNAPSHOT/artifact-x-20070113.034619-3.pom
../artifact-x/2.0-SNAPSHOT/artifact-x-20070203.028902-4.jar
../artifact-x/2.0-SNAPSHOT/artifact-x-20070203.028902-4.pom
  1. Repository Purge By Number of Days Older

    Using this criteria for the purge, Archiva will check how old an artifact is and if it is older than the set value in the repository purge by days older field, then the artifact will be deleted respecting the retention count of course.

    If repository purge by days older is set to 100 days (with repository purge by retention count field set to 1), and the current date is let's say 03-01-2007, given the scenario above.. the following artifacts will be retained: artifact-x-20070113.034619-3.jar, artifact-x-20070113.034619-3.pom, artifact-x-20070203.028902-4.jar and artifact-x-20070203.028902-4.pom. It is clear in the version timestamps that these 4 artifacts are not more than 100 days old from the current date (which is 03-01-2007 in our example) so they are all retained. In this case the retention count doesn't have any effect since the priority is the age of the artifact.

    Now, if the repository purge by days older is set to 30 days (with repository purge by retention count field still set to 1) and the current date is still 03-01-2007, then given the same scenario above.. only the following artifacts will be retained: artifact-x-20070203.028902-4.jar and artifact-x-20070203.028902-4.pom. In this case, we can see that the retained artifacts are still not older by the number of days set in the repository purge by days older field and the retention count is still met.

    Now, let's set the repository purge by days older to 10 days (with repository purge by retention count field still set to 1) and the current date is still 03-01-2007, then still given the same repository contents above.. the following artifacts will still be retained: artifact-x-20070203.028902-4.jar and artifact-x-20070203.028902-4.pom. It is clear from the version timestamps that the artifacts ARE MORE THAN the repository purge by days older value, which is 10 days. Why is it still retained? Recall the value of the repository purge by retention count -- 1 :) This ensures that there is ALWAYS 1 artifact timestamped version retained for every unique version snapshot directory of an artifact.

  2. Repository Purge By Retention Count

    If the repository purge by retention count field is set to 2, then only the artifacts artifact-x-20070113.034619-3.jar, artifact-x-20070113.034619-3.pom, artifact-x-20070203.028902-4.jar and artifact-x-20070203.028902-4.pom will be retained in the repository. The oldest snapshots will be deleted maintaining only a number of snapshots equivalent to the set retention count (regardless of how old or new the artifact is).

Deleting Released Snapshots

You can also configure Archiva to clean up snapshot artifacts that have already been released. This can be done by ticking the Delete Released Snapshots checkbox in the Repository Configuration form.

Once this feature is enabled, if Archiva encounters a snapshot artifact during repository scanning, it would check all the repositories configured for a released version of that snapshot. If it finds one, then it would delete the entire snapshot version directory.

It should be noted that this feature is entirely separate from the repository purge by number of days older and by retention count.