Corpus: gzipped file, formatted like this:
XXX-1
Date: Tue, 09 Dec 2003 22:39:08 GMT
blah blah blah
yackedy smackedy
...
Note: The date is "EEE, dd MMM yyyy kk:mm:ss z" in SimpleDateFormat
(or completely blank)
Queries: text file, formatted like this:
Number: nnn
yackedy smackedy
Description:
foo bar foo bar
blah blah blah
Narrative:
blah blah blah
yackedy smackedy
...
Judgements: text file, tab-separated, formatted like this:
Query# Iteration# DOC# Judgement
Query# corresponds to the Number: nnn in queries.txt
Iteration# is not useful.
Doc# corresponds to the DOCNO from the corpus.
Judgement is some numeric value (such as 0 or 1) indicating relevance.