------ Database Maintenance ------ Brent Atkinson ------ May 5 2015 ------ ~~ Licensed to the Apache Software Foundation (ASF) under one ~~ or more contributor license agreements. See the NOTICE file ~~ distributed with this work for additional information ~~ regarding copyright ownership. The ASF licenses this file ~~ to you under the Apache License, Version 2.0 (the ~~ "License"); you may not use this file except in compliance ~~ with the License. You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, ~~ software distributed under the License is distributed on an ~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY ~~ KIND, either express or implied. See the License for the ~~ specific language governing permissions and limitations ~~ under the License. Database Maintenance If properly tuned and maintained, Continuum should be able to run efficiently even with a large database. The following guide describes how to tune an existing installation running on the embedded Derby database. While the exact steps will be different for the other supported databases, the general approach should still apply. * Common Bottlenecks and Tuning Advice Before attempting to upgrade or modify the database, it is advisable to take a full backup. With Derby, this is as simple as shutting down Continuum and copying the continuum_database and users_database directories. Before attempting any risky changes it is advisable to confirm the backup copies are functional by connecting to them with a database client. After securing and confirming the backups, maintenance or tuning can safely be attempted. Exactly how to go about this will depend on your individual scenario, but the following is a conservative approach that allows you to continue running your production build server while you determine how best to tune performance: * Create a tuning installation of Continuum with a copy of the backup databases * Using the tuning installation, use query execution plans for slow operations to create relevant indices. Once created, run the same operations and confirm the indices are being used by the execution plans * Once you are satisfied with the overall performance, you can attempt the same changes on your production system [] While you should only attempt to optimize based on actual measurements, if you have many build results you might start by looking at queries that use the BUILDRESULTS table. I found that Continuum 1.5 ran quite well with a build result table containing over a million records after creating the following indices. Note that the unique index is likely not possible with data created before 1.5 due to differences in the build number semantics. ---- CREATE UNIQUE INDEX BUILDRESULT_BUILD_NUMBER_IDX ON BUILDRESULT (PROJECT_ID_OID, BUILD_NUMBER); CREATE INDEX BUILDRESULT_ID_DESC ON BUILDRESULT (ID DESC); CREATE INDEX BUILDRESULT_PID_ID_DESC ON BUILDRESULT (PROJECT_ID_OID, ID DESC); CREATE INDEX BUILDRESULT_STIME_ID_DESC ON BUILDRESULT (START_TIME, ID DESC); ---- * Enabling the Network Server While Continuum is running against the embedded database you can not inspect or manipulate it with an out-of-process tool. It can often be helpful to query the database while Continuum is running against it. To do this, you can reconfigure your system to use the derby network server instead. The steps required to do this: * Shut down Continuum * Add the derbynet jar to your installation's classpath * Start the derbynet server with a suitable version for your database files (same derby version as your files or greater) * Reconfigure the datasources to use the network client driver, specifying the file system path to the relevant database files * Start Continuum [] If you do this correctly, Continuum will successfully start. You should then be able to connect to the database using your development tool as long as you are using the network client driver. Using the network client driver data source to connect to a network server running on the local host usually looks something like the following: * JDBC driver class: org.apache.derby.jdbc.ClientDataSource * Connection URL: jdbc:derby://localhost:1527//continuum_database * Username: sa [] * Enabling Query Logging Performance optimization can be difficult and frustrating when attempting to do it by trial-and-error. In contrast, making changes based on measurement can save you a lot of time and effort. While it is a little cumbersome to use, Derby can be configured to log executed SQL statements along with the details of the execution plans it used to run the queries. To do this, create a file named derby.properties and add the following: ---- derby.language.logStatementText=true derby.language.logQueryPlan=true ---- The correct location will depend on how you are running the database. While using Derby embedded within Continuum, the location will be the parent directory of your continuum_database and users_database. When running the network server, the correct location will be the directory where you started the network server. * Upgrading Your Database By using statistics about a table's data distribution, newer versions of Derby can make more intelligent query execution plans. However, you need may need to upgrade your database files to gain full capabilities like computing and updating statistics. Fortunately, this is a simple process: * Ensure you make backups of your databases and verify the backups are good before continuing * Using the new version of the derby driver, connect to the database with the upgrade=true attribute specified [] By connecting with this attribute, the derby driver will attempt to upgrade your database files to the newer format. To confirm the upgrade process added the statistics capabilities, connect to the database with a development tool and verify the existence of the SYSCS_UTIL.SYSCS_UPDATE_STATISTICS procedure. * Maintenance (Rebuilding Indexes and Updating Statistics) Indices can become fragmented over time, especially when you remove and add items frequently to a table. This fragmentation can cause performance to degrade. You can re-optimize the indices by periodically rebuilding them. This is done by dropping and recreating them in Derby. For example: ---- DROP INDEX BUILDRESULT_ID_DESC; CREATE INDEX BUILDRESULT_ID_DESC ON BUILDRESULT (ID DESC); ---- Performance can also degrade when the characteristics of your stored data diverges from the computed statistics. When this occurs, updating the statistics can help. The following example recomputes statistics for the build results table under Derby: ---- call SYSCS_UTIL.SYSCS_UPDATE_STATISTICS('SA', 'BUILDRESULT', null); ----