Release Notes - Apache Accumulo - Version 1.4.1 ** Bug * [ACCUMULO-355] - functional tests should run under a non-hadoop user * [ACCUMULO-398] - Table tablets not evenly spread. * [ACCUMULO-428] - loggers are not closed when high hold times are detected * [ACCUMULO-449] - Failed log copy is not restarted * [ACCUMULO-468] - low-memory warning message is incorrect * [ACCUMULO-487] - Batch Scanner can get stuck when external thread closes scanner * [ACCUMULO-488] - InputFormats' RecordReaders should call Context.progress * [ACCUMULO-496] - Cloudtrace.thrift needs to have it's namespace updated to the new package * [ACCUMULO-499] - Garbage collector should call Tracer.offNoFlush instead of Tracer.off * [ACCUMULO-505] - "% of Used DFS" is overflowing on a large cluster * [ACCUMULO-509] - default walog copy/sort uses replication of 1 * [ACCUMULO-512] - DEFAULT_MAX_LATENCY in AccumuloOutputFormat wrong units * [ACCUMULO-516] - Column family search with sparse files is painfully long * [ACCUMULO-520] - assigned tablets are not considered offline * [ACCUMULO-521] - examples functional test is using the wrong class name * [ACCUMULO-533] - make system iterators thread-safer * [ACCUMULO-542] - with a large root tablet, tablet servers will get stuck loading other metadata tablets * [ACCUMULO-561] - Monitor scan mb/s does not show batch scan activity * [ACCUMULO-562] - Multi term grep in shell fails * [ACCUMULO-566] - monitor should display zero tablet servers as red, even if there's only one tablet server configured * [ACCUMULO-570] - dirlist example README is a little inconsistent * [ACCUMULO-585] - Complain loudly when env file is missing * [ACCUMULO-591] - Need to clear tablet location cache before computing input splits in input format * [ACCUMULO-592] - We lost the security policy example * [ACCUMULO-601] - client fails to resolve master hostname when not using fully qualified domain name * [ACCUMULO-615] - accumulo runs fine on Hadoop 1.0 * [ACCUMULO-633] - FirstEntryInRowIterator is broken and has no test ** Improvement * [ACCUMULO-337] - Make logging an example file * [ACCUMULO-364] - Security Policy file should be an example * [ACCUMULO-399] - need a randomwalk node that checks tablet balance * [ACCUMULO-459] - Accumulo init should fail fast * [ACCUMULO-471] - allow wikisearch ingest to run on uncompressed input * [ACCUMULO-480] - config.html should be more clear with regards to "needs restart" * [ACCUMULO-489] - Input Format puts Base64 encoded passwords in Configuration, which is world readable * [ACCUMULO-494] - Document relationship between max walogs and max tserver memory * [ACCUMULO-502] - Document what replaces the deprecated PerColumnIteratorConfig class. * [ACCUMULO-503] - start-here does unecessary SSHs * [ACCUMULO-510] - Rename scanner threads to identify table being scanned * [ACCUMULO-550] - Colocate rfile index entries within file * [ACCUMULO-554] - monitor should display red when time-since last report is higher than a few minutes * [ACCUMULO-573] - Fix grammar in shell commands * [ACCUMULO-574] - Document reseek of iterators * [ACCUMULO-637] - Make entries written configurable for continuous ingest ** New Feature * [ACCUMULO-49] - optionally monitor swappiness on every server * [ACCUMULO-404] - Support running on-top of Kerberos-enabled HDFS * [ACCUMULO-492] - Provide method for gathering system stats to API ** Task * [ACCUMULO-491] - Manual has UTF8 characters * [ACCUMULO-546] - User manual inconsistencies Release notes for Apache Accumulo 1.4.0 Major features added prior to Incubation * Tablet merging * Efficient deletion of row range * Compaction of row range * Table cloning * FATE : Fault Tolerant Executor. Used to make table operation survive master restart. * Concurrent table operation execute correctly * Bulk load is now done by master and tablet servers and uses FATE to survive server restarts. * Multi-level RFile index * Merging minor compactions * Logical time for bulk import Minor features added prior to Incubation * Monitor will display tablet server and loggers that have died * Shell and CloudbaseInputFormat will test extension classes before using them * Empty tablet directories will be removed * ZooKeeper ACLs prevent accidental disclosure of user passwords * Compact and flush can wait for completion * Additional test to verify master persistent state and concurrent table operations * Added security checks to limit runtime environment of pluggable components * Made monitor page display cache info * Made file limit configuration less confusing * Easily iterate over rows with new row iterator that wraps a scanner * New row input format for mapre reduce jobs * Execute iterators in a client client side iterator * Introduced file archive example * Added getMaxRow table operation * Replaced Aggregators with Combiners which are iterators * Made Filters Iterators * New API for configuring iterators * Added visibility constraint, it prevents users from writing data they can not read * Removed dependency on accumulo-site.xml from client library. * Improved tracing granularty * Added information about running scans to monitor page Preformance improvements added prior to incubation * Cache no longer accesses Namenode when data is in cache. Now all file names generated by Accumulo are unique over time. * No longer use file flage in HDFS, reducing pressure on Namenode. * Shorter tablet dir names, saves namenode memory. * Logger can write to multiple drives. The following is a list of changes made during incubation. ** Sub-task * [ACCUMULO-305] - Log context before starting major compaction ** Bug * [ACCUMULO-3] - MockConnector should return a MockDeleter * [ACCUMULO-5] - Log recovery fails with IllegalStateException * [ACCUMULO-6] - Files deleted while in use by scan * [ACCUMULO-7] - tablet is both assigned and hosted * [ACCUMULO-9] - FATE Threads die when zookeeper shutdown * [ACCUMULO-11] - unique server id is not constructed consistently * [ACCUMULO-16] - Master uses wrong path to remove tserver lock from zookeeper * [ACCUMULO-17] - MockScanner and MockBatchScanner do not seek their Filters * [ACCUMULO-32] - Clean up bin dir * [ACCUMULO-46] - Fix functional tests * [ACCUMULO-53] - Multiple deletes cause a RuntimeException * [ACCUMULO-54] - monitor doesn't provide web service if zookeeper is down * [ACCUMULO-55] - Accumulo Output Format can create numerous empty files * [ACCUMULO-56] - Merge tablet fails when tablet have empty files * [ACCUMULO-60] - the digest used to protect data in zookeeper is "cb:password"; change this to "accumulo:password" * [ACCUMULO-62] - Random walk logging config wrong * [ACCUMULO-63] - Unable to build git mirror * [ACCUMULO-64] - general.dynamic.classpaths property exists but is never used * [ACCUMULO-65] - missing minor compaction files under heavy namenode load * [ACCUMULO-71] - developer's manual is out of date * [ACCUMULO-72] - Logical time not considered in merge * [ACCUMULO-79] - LiveTServerSet.scanServers makes connections at scan time * [ACCUMULO-93] - listscans in the shell attempts to contact tablet servers that do not hold locks * [ACCUMULO-94] - createMultiTableBatchWriter has arguments that are inconsistent with createTableBatchWriter * [ACCUMULO-95] - MockConnector does not implement createMultiTableBatchWriter * [ACCUMULO-98] - Bloom filter should ignore duplicate inserts * [ACCUMULO-100] - Accumulo system tests (SunnyDay at least) are locking up * [ACCUMULO-107] - shell's GrepCommand sets options under same flag * [ACCUMULO-108] - simple.gc.GCLotsOfCandidatesTest does not use the TestUtils settings properly * [ACCUMULO-109] - KilledTabletServerSplit functional test fails * [ACCUMULO-110] - Scanner creates threads too frequently * [ACCUMULO-113] - servers are exiting without logging an unexpected exception * [ACCUMULO-117] - all files in the tarball are executable. * [ACCUMULO-126] - ZooKeeperInstance should use site configuration for ZK timeout * [ACCUMULO-128] - close consistency check fails: Start key must be less than end key * [ACCUMULO-130] - accumulo_sample/ingest/bin/ingest.sh is not executable * [ACCUMULO-151] - Combiner default behavior is dangerous * [ACCUMULO-158] - Add aggregator adds versioning iterator to the iterator tree. * [ACCUMULO-165] - spell check the documentation * [ACCUMULO-168] - Bulk import may not use configured file system * [ACCUMULO-170] - NPE in shell when trying to get help for non-existant command * [ACCUMULO-176] - subheading text falls outside header border * [ACCUMULO-178] - Off-by-one error in FamilyIntersectingIterator * [ACCUMULO-179] - Install script is missing * [ACCUMULO-184] - update documentation and prompt for the initial configuration of the trace table * [ACCUMULO-189] - RegExFilter deepCopy NullPointerException * [ACCUMULO-192] - if accumulo is run as a non-hadoop user, the monitor says the NameNode is down * [ACCUMULO-193] - key.followingKey(PartialKey.ROW_COLFAM_COLQUAL_COLVIS) can produce a key with an invalid COLVIS * [ACCUMULO-194] - need a simple way to add tracing to clients * [ACCUMULO-195] - FATE operations do not preserve trace information * [ACCUMULO-209] - RegExFilter does not properly regex when using multi-byte characters * [ACCUMULO-213] - compact tables by pattern only compacts the METADATA table * [ACCUMULO-214] - took a table offline, remaining tablets did not balance * [ACCUMULO-215] - tracer continues to get errors writing to an online trace table * [ACCUMULO-216] - MockAccumulo does not support InstanceOperations * [ACCUMULO-217] - MockAccumulo doesn't throw informative errors * [ACCUMULO-218] - Mock Accumulo Inverts order of mutations w/ same timestamp * [ACCUMULO-226] - BatchScanner iterator implementation erroneously returns true for hasNext upon subsequent hasNext calls * [ACCUMULO-230] - problems found in accumulo_sample * [ACCUMULO-264] - Users with the create permission but no tthe grant permission have the ability to create a new user with arbitrary scan authorizations * [ACCUMULO-267] - Mapreduce API should not use JobContext to set configuration information * [ACCUMULO-270] - Cache consistency bug in ClientServiceHandler.checkTableId() * [ACCUMULO-271] - WholeRowIterator may call hasTop on unseeked source * [ACCUMULO-281] - ArrayIndexOutOfBoundsException running examples.shard.ContinuousQuery. * [ACCUMULO-292] - Accumulo examples tests are loud and (partially) broken * [ACCUMULO-293] - bulk loading causing an consistency check failure * [ACCUMULO-297] - User could delete table after permission to do so was removed * [ACCUMULO-299] - tracing not working * [ACCUMULO-300] - monitor page warns if the number of tablets goes into the hundreds of thousands: this is no longer a significant limitation to scalability * [ACCUMULO-301] - TApplicationException running example in README.shard * [ACCUMULO-304] - shell not connecting * [ACCUMULO-308] - Isolation example is broken * [ACCUMULO-310] - AccumuloInputFormat and AccumuloOutputFormat configuration methods don't match * [ACCUMULO-314] - Re-queue tablets immediately after major compaction if there is more work * [ACCUMULO-315] - Hole in metadata table occurred during random walk test * [ACCUMULO-316] - Master has thousands of threads while running random walk * [ACCUMULO-317] - Managing FATE operations is difficult * [ACCUMULO-327] - master lost all tablet servers * [ACCUMULO-328] - NPE when getting user authorizations * [ACCUMULO-329] - tablet being unassigned/reassigned frequently * [ACCUMULO-333] - Shell 'help' should wrap more cleanly * [ACCUMULO-334] - Bulk random walk test failed * [ACCUMULO-338] - Combiners lose data when reading off of disk * [ACCUMULO-345] - fix random walk start script to not require environment variables * [ACCUMULO-346] - button color is too dark * [ACCUMULO-356] - merge failed to complete * [ACCUMULO-357] - NPE in hasSystemPermission * [ACCUMULO-365] - randomwalk bulk test verify failed * [ACCUMULO-366] - master killed a tablet server * [ACCUMULO-368] - tablet had location but was not loaded * [ACCUMULO-373] - file missing during a major compaction * [ACCUMULO-374] - wikisearch-ingest stop list should be removed * [ACCUMULO-380] - UnsupportedOperation exception on wikisearch example * [ACCUMULO-393] - Master not balancing after agitation * [ACCUMULO-395] - Map reduce reading from Accumulo is not running mapper locally * [ACCUMULO-405] - References to accumulo-examples still exist * [ACCUMULO-412] - importdirectory failing on split table * [ACCUMULO-413] - DeleteMany breaks with aggregators/combiners * [ACCUMULO-414] - Make sure iterators handle deletion entries properly * [ACCUMULO-422] - Bulk import failing when tablet server dies * [ACCUMULO-424] - data lost during merge * [ACCUMULO-427] - Data lost when tablets moving around frequently * [ACCUMULO-435] - Accumulo debian packaging does not handle JAVA_HOMEs properly * [ACCUMULO-436] - tablet merge stuck * [ACCUMULO-444] - Data loss possible when tablet killed immediately after recovery * [ACCUMULO-446] - boolean logic iterators do not correctly "jump()" * [ACCUMULO-447] - BooleanLogicIterator cannot handle ORs correctly * [ACCUMULO-476] - java.lang.ClassCastException: com.mapr.fs.MapRFileSystem cannot be cast to org.apache.hadoop.hdfs.DistributedFileSystem * [ACCUMULO-484] - root tablet fails to load after recovery when all minor compaction threads are busy * [ACCUMULO-486] - tablet server runs out of memory after 8 hours of continuous ingest ** Improvement * [ACCUMULO-10] - tablet server bulk import methods should only be callable by system user * [ACCUMULO-15] - Define expected iterator behavior * [ACCUMULO-20] - Tablet constructor and run cleanup * [ACCUMULO-21] - Remove extra ScannerOptions creation in BatchReader * [ACCUMULO-22] - top level pom.xml should point to URLs that are useful * [ACCUMULO-24] - Improve messaging regarding non-native map memory use * [ACCUMULO-27] - issues found during scale random-walk testing * [ACCUMULO-28] - make tserver client timeout configurable * [ACCUMULO-30] - thrift generated code produces hundreds of warnings * [ACCUMULO-38] - Add svnignores for eclipse specific files/folders * [ACCUMULO-44] - RowIterator and AccumuloRowInputFormat require row to fit in memory * [ACCUMULO-50] - Add Security Randomwalk to All Randomwalk test * [ACCUMULO-67] - zookeeper session id encoded in metadata in two different ways * [ACCUMULO-68] - Document new 1.4 features in user manual * [ACCUMULO-69] - Document how Accumulo uses Zookeeper and HDFS * [ACCUMULO-87] - Accumulo needs a logo * [ACCUMULO-90] - add ":" to the legal character making up the term in a visibility expression * [ACCUMULO-101] - Include tserver.memory.maps.max in the example xml file * [ACCUMULO-103] - Unify examples * [ACCUMULO-106] - non-native In memory map warning does not accurate check memory * [ACCUMULO-114] - hide passwords when logging the configuration at start-up * [ACCUMULO-127] - Environment settings should be set in the accumulo-env.sh and we should document as such * [ACCUMULO-129] - shell requires iterators to implement OptionDescriber * [ACCUMULO-139] - Add Snappy support * [ACCUMULO-153] - Iterator options for input formats can't contain certain characters * [ACCUMULO-154] - Combiner configuration is confusing * [ACCUMULO-156] - Refactor Trie * [ACCUMULO-162] - TimestampFilter requires both start and end timestamps * [ACCUMULO-167] - Add static configuration methods to all user iterators * [ACCUMULO-188] - Need functional test to test reloading iterators * [ACCUMULO-224] - Add configurable banner to monitor page * [ACCUMULO-239] - Add .gitignore to svn tree * [ACCUMULO-245] - Add convenient methods to TypedValueCombiner * [ACCUMULO-248] - In user manual, please define 'Mutation' before using it. * [ACCUMULO-251] - Add wording to README.bloom about reason for flushing. * [ACCUMULO-252] - Improve start-up scripts to avoid common errors * [ACCUMULO-256] - Monitor page needs uptime * [ACCUMULO-265] - Fix iterator priority conflict in README.combiner * [ACCUMULO-273] - Report stuck random walk test * [ACCUMULO-274] - Add descriptions to README.filedata * [ACCUMULO-275] - README.filter refers to non-existing interface: org.apache.accumulo.core.iterators.iterators.filter.Filter * [ACCUMULO-277] - Change helloworld.InsertWithBatchWriter example so table name is last command-line parameter * [ACCUMULO-278] - Example in README.mapred produces "aggregators are deprecated" message. * [ACCUMULO-280] - Create an example that explains visibilities and authorizations * [ACCUMULO-284] - Provide usage example for isolation.InterferenceTest * [ACCUMULO-287] - Enable AccumuloOutputFormat to use a mock instance * [ACCUMULO-289] - Allow Combiner to work on all columns * [ACCUMULO-291] - Iterator attachment and removal should be more atomic * [ACCUMULO-303] - Implement per-table persistent formatters * [ACCUMULO-307] - Combiner needs a deepCopy method * [ACCUMULO-318] - Add Bulk random walk test to All.xml * [ACCUMULO-319] - Provide optional stricness for SummingCombiner * [ACCUMULO-326] - Lower replication on write ahead log archive * [ACCUMULO-336] - Change default amount of logs kept * [ACCUMULO-339] - Add lib and target directories to .gitignore file. * [ACCUMULO-341] - Use -e shell command to make README.bloom simpler * [ACCUMULO-342] - Incorrect path to accumulo in README.bloom * [ACCUMULO-344] - In user manual mention why [] is displayed by scan command. * [ACCUMULO-354] - Create combiners for wikisearch example * [ACCUMULO-371] - Please discuss client classpath in the doc * [ACCUMULO-372] - Deprecate selectrow add row option to scan * [ACCUMULO-377] - Shell scan command needs support for multiple columns * [ACCUMULO-379] - lower the noise for randomwalk tests * [ACCUMULO-381] - wikisearch-ingest should not have runtime dependencies on ACCUMULO_HOME or ZOOKEEPER_HOME * [ACCUMULO-389] - Update .gitignore * [ACCUMULO-390] - Provide examples for different configurations * [ACCUMULO-407] - Look into on the fly log4j configuration * [ACCUMULO-460] - zookeeper jar and ubuntu * [ACCUMULO-474] - wikisearch against an unindexed field causes tablet servers to run out of memory, or stop-the-world-gc * [ACCUMULO-477] - inconsistent names and duplicate methods in IteratorSettings * [ACCUMULO-485] - reduce the debugging level for the examples ** New Feature * [ACCUMULO-19] - Debian packaging support * [ACCUMULO-387] - Support map reduce directly over files * [ACCUMULO-403] - Create general row selection iterator * [ACCUMULO-431] - Add server status visualization to the monitor ** Task * [ACCUMULO-4] - Remove jfreechart dependency * [ACCUMULO-41] - Conform to a uniform style * [ACCUMULO-42] - Apply apache license to code * [ACCUMULO-45] - Remove unneeded MockEntry * [ACCUMULO-61] - Add files generated by c++ code to svn ignore * [ACCUMULO-92] - Contrib accumulo_sample for 1.4 has GPL dependencies * [ACCUMULO-116] - Enabling debugging in shell does not work * [ACCUMULO-145] - Release Accumulo 1.4 * [ACCUMULO-148] - Transition aggregation and filter documentation * [ACCUMULO-155] - Deprecate things in 1.4 * [ACCUMULO-169] - Create 1.3 to 1.4 upgrade process * [ACCUMULO-200] - Undefined behavior when adding an iterator that conflicts with existing iterators * [ACCUMULO-242] - Make appropriate references to Apache Accumulo * [ACCUMULO-285] - move accumulo_sample into the examples directory * [ACCUMULO-306] - Deprecate mapfileoperations functionality removed in 1.5 * [ACCUMULO-411] - Improve javadocs and post to web site * [ACCUMULO-438] - Rename cloudtrace package to org.apache.accumulo.cloudtrace * [ACCUMULO-440] - Check 1.4 examples before release * [ACCUMULO-469] - Java files are missing license headers * [ACCUMULO-472] - Simple examples sources jar is in the wrong place * [ACCUMULO-475] - Update LICENSE and NOTICE * [ACCUMULO-481] - accumulo graduates: remove "incubating" flavor from project name ** Test * [ACCUMULO-426] - Create shuffling load balancer * [ACCUMULO-433] - Create functional test using chaotic load balancer