Maven Artifact is supposed to be a general artifact mechanism for retrieving, installing, and deploying artifacts to repositories. Maven Artifact was originally decoupled from Maven proper and as such carries a lot of baggage which prevents it from being used generally and carries many notions that are very specific to Maven itself. Artifacts currently have a notion of scope, classifiers, and behavioral attributes such as whether scopes should be inherited. For any mechanism to work generally these baked in notions need to be removed, vetted, and then made compatible with notions currently in Maven. A list of things that should not be in the Artifact: * scope * classifier * dependency filter * dependency trail * resolved * released * optional * available versions These are all attributes of the target system *Removal of the ArtifactFactory 3 February 2008 (Sunday) I have removed the factory and left only a small set of constructors (which I would like to reduce to one) so that you have a valid artifact after construction. I have also started to hide the VersionRange creation. You just pass in a string and the constructor for the DefaultArtifact will do the right thing. This will ultimately need to be more pluggable as different versioning strategies happen. But variations of the theme like Maven, OSGi, will have their own subclasses and tools to operate on the graphs of dependencies. 4 February 2008 (Monday) John: Some notes about classifiers taken from the mailing list in a discussion with John about classifiers: I'd tend to disagree about classifier not being a 'core' part of the artifact system...it distinguishes a main artifact from one of its derivatives, and serves as a pretty foundational part of how we retrieve artifacts from existing remote repositories. Without it, I doubt that you can reconstruct the path to some existing artifacts (like sources or javadocs) reliably without bastardizing the version string. We can see that the artifact system has certain inescapable identity attributes. Scope is obviously more related to how an artifact is used, since you can't see any trace of scope in the artifact as it's been deployed on a remote repository. Classifier, however, doesn't fit this criteria...it's not a usage marker, but an identity marker. The rest I agree with. Jason: This is where I think you've already baked in what you think about Maven. Look at how we deploy our derivative artifacts right now. We don't track any of it in the metadata when we deploy. We toss it up there and things hope they are there. Like javadocs, or sources. I think what's more important is that the coordinate be unique and we have a way to associate what ever artifacts together in a scalable way. So you say "I want to associate this artifact with that one, this is how I would like to record that relationship in the metadata.". Subsequently you can query the metadata and know these relationships. We currently don't do this. It generally boils down to a bunch of coordinates in the repository. How we choose to relate them via the metadata. We have all sort of problems with classifiers currently because it was an adhoc method of association. A general model of association would be a superset of what we currently do for classifiers. I agree we need an mechanism for association, I don't think classifiers have worked all that well. 5 February 2008 (Tuesday) The rework of the artifact resolution mechanism is an attempt to entirely separate 1) the process of metadata retrieval into a tree, 2) converting the tree to a graph by a process of conflict resolution, and 3) retrieving the complete set of artifacts, and ultimately 4) Doing something in a particular fashion withe the retrieved set like make a classpath. Currently we have an incremental processing model that doesn't let a complete graph be formed for analysis which greatly complicates the process whereas having a graph and using standard graph analysis techniques and graph optimization is the only reasonable way forward. There should be no doubt about what needs to be retrieved once the analysis is complete. We could actually create an aggregrate request where instructions are sent to retrieve everything required. The server could send a stream all the artifacts back in one shot. What Oleg is attempting to do is create a working solution for 1) and 2) above. Along with the implementation we also have a visualization tool that will help us determine what exactly the correct analysis is. The beauty of this is that regardless of the analysis we arrive at a representation of the complete set can be modeled and we can start working on the optimized retrieval mechanism. We still need to do some work to separate out 4) as we're doing some classpath calculations already which we will need to further decouple but that should be relatively straight forward. 7 February 2008 (Friday) The number of methods in the artifact factory is simply insane, for each type that we ended up with in Maven just started being effectively hard-coded in the factory which is totally unscalable, any new types with handlers become a nightmare to maintain. I have reduced everything to two constructors in the DefaultArtifact and I would like to reduce it to being one. Right now I have to account for needing to use a version string, or creating a range which is completely confusing to anyone using the API. You should just need one constructor with a version string and everything else should be taken care of for you. Right now there are bits of code all over the place that do the if/else versionRange detection. inheritedScope goes away entirely from the model when a graph is used because the scope selected will be a function of how the graph is processed. 24 May 2008 1. Retrieval & Storage There is the task of retrieving a set of resources from a data source atomically. Simple, and safe retrieval. Period. This has nothing to do with dependency management per se, but is the basis of any safe and reliable dependency management system. We need to deal with repository corruption and recovery as well. The method employed by GIT with hierarchical checksums provides an efficient means to detect where in a repository corruption has occured to make sure the problem can be correct, shunted around, or simply bring it to the users attention. 2. Representation Processing There is the task of processing the representation of an artifact. In the case of Maven an artifact's representation is encapsulated in a POM. If the representation refers to other representations i.e. dependencies then these have to be taken into account as well. The system may allow transitive processing and this is where the real power of a dependency management system comes into play. The representations are gathered into a tree structure where the flavour of the system imparts special processing on this tree to yield a graph. Once the representation has been processed and we have a graph, we fall back to the retrieval mechanism to place the desired artifacts in the storage system. Ultimately from this graph, according to the desired purpose we have set of artifacts that we can do something with. Processing I have come to the conclusion that providing the necessary support for version ranges cannot be done without a SAT solver, as we are approaching an NP complete problem and we're going to end up with an approximation and all the heavy lifting is being done already by SAT4J. The Quality of providers The new processing model will allow the complete manipulation of the resultant set either in memory, or in a location that does not pollute the local repository. So any provider must guarantee the safe retrieval of the set the artifacts from remote sources, but must also guarantee the safe placement of that set into the local repository and if the user aborts, or the machine crashes that the provider supplies a means to the process to clean up anything half-baked. We need to provide a single place where a journal can be written which can be easily detected and any action taken if partially complete operations have been detected. We know what can go wrong and every possible measure needs to make sure the repository cannot be corrupted.