Hardware Environment

  • Dedicated machine for indexing: Self-explanatory (yes/no)
  • CPU: Self-explanatory (Type, Speed and Quantity)
  • RAM: Self-explanatory
  • Drive configuration: Self-explanatory (IDE, SCSI, RAID-1, RAID-5)
  • Software environment

  • Lucene Version: Self-explanatory
  • Java Version: Version of Java SDK/JRE that is run
  • Java VM: Server/client VM, Sun VM/JRockIt
  • OS Version: Self-explanatory
  • Location of index: Is the index stored in filesystem or database? Is it on the same server (local) or over the network?
  • Lucene indexing variables

  • Number of source documents: Number of documents being indexed
  • Total filesize of source documents: Self-explanatory
  • Average filesize of source documents: Self-explanatory
  • Source documents storage location: Where are the documents being indexed located? Filesystem, DB, http,etc
  • File type of source documents: Types of files being indexed, e.g. HTML files, XML files, PDF files, etc.
  • Parser(s) used, if any: Parsers used for parsing the various files for indexing, e.g. XML parser, HTML parser, etc.
  • Analyzer(s) used: Type of Lucene analyzer used
  • Number of fields per document: Number of Fields each Document contains
  • Type of fields: Type of each field
  • Index persistence: Where the index is stored, e.g. FSDirectory, SqlDirectory, etc
  • Figures

  • Time taken (in ms/s as an average of at least 3 indexing runs): Time taken to index to index all files
  • Time taken / 1000 docs indexed: Time taken to index 1000 files
  • Memory consumption: Self-explanatory
  • Query speed: average time a query takes, type of queries (e.g. simple one-term query, phrase query), not measuring any overhead outside Lucene
  • Notes

  • Notes: Any comments which don't belong in the above, special tuning/strategies, etc