The homepage of this test collection is http://ilps.science.uva.nl/resources/bahasa The tempo corpus contains daily news documents from the Tempo online daily newspaper (http://www.tempo.com). The documents span from June 2000 to July 2002. The corpus has the following statistics: Size (MB): 45.57 # of documents: 22,944 avg. doc length (byte): 1549.59 avg. unique words (terms): 155.00 The document collection was parsed to remove all HTML tags, and transformed into an SGML-like structure. Manually correction was performed in some cases. There are 35 queries provided, covering widely known events which happened in Indonesia during the timeframe. The queries have the following statistics: # of queries: 35 avg query length (word): 5.2 avg # unique words: 5.17 avg # of relevant docs per query: 66.971 The set of relevant documents for each query was constructed manually by one person, and assessed again by a second person. In the case of disagreement, the document was considered not relevant. More information regarding this corpus is available from this paper: http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf