Release Notes - Flume - Version v1.2.0 ** New Feature * [FLUME-896] - Implement a Durable Memory Channel * [FLUME-971] - Create developer guide for Flume NG * [FLUME-988] - Client SDK * [FLUME-1085] - Implement a durable FileChannel * [FLUME-1157] - Implement Interceptors (previously known as Decorators) for Flume 1.x * [FLUME-1183] - Implement an HBase Sink which supports table level access * [FLUME-1215] - Implement Timestamp Interceptor * [FLUME-1252] - Asynchronous HBase Sink ** Improvement * [FLUME-828] - LoggerSink representation of the event's body isn't too useful * [FLUME-881] - Would be nice if HDFS Sink would automatically choose best writableFormat based on fileType * [FLUME-979] - ExecSource should optionally restart the command when it exits * [FLUME-985] - All HDFS Operations in HDFSEventSink should have a timeout * [FLUME-1001] - Support custom processors * [FLUME-1011] - AvroSource should have a configurable max thread count * [FLUME-1020] - Implement Kerberos security for HDFS Sink * [FLUME-1030] - Retry logic for failover sink processor to handle downstream exceptions in a predictable manner. * [FLUME-1032] - Fix / clean up Flume NG build * [FLUME-1043] - SDK should mark slf4j deps as optional * [FLUME-1048] - speed up mvn package: stop building .zip packages * [FLUME-1049] - Use hadoop-1.0.0 as basis for default Flume build instead of 0.20.205 * [FLUME-1078] - flume-ng script has no way to add, not replace, classpath * [FLUME-1090] - JDBC Channel: Minimize logging under nominal conditions * [FLUME-1117] - Support output to files in Avro container format * [FLUME-1122] - Flume documentation layout should be refactored * [FLUME-1126] - Support RFC 3164 and 5424 syslog format timestamp parsing * [FLUME-1127] - Add configuration support to AbstractAvroEventSerializer for Avro sync interval and compression support * [FLUME-1132] - HDFSEventSink has spurious and verbose log message * [FLUME-1140] - Adding Xms value in flume-env.sh * [FLUME-1160] - ComponentConfigurationFactory catches NullPointerException * [FLUME-1196] - Allow different HDFS Sinks within the same agent to write to HDFS as different users * [FLUME-1198] - Implement a load-balancing sink processor * [FLUME-1212] - Flume should pick HBase jars from HBASE_HOME * [FLUME-1238] - Support active rolling of files created by HDFS Event Sink * [FLUME-1242] - Make flume user & dev guides easily editable * [FLUME-1275] - Add Regex Serializer for HBaseSink * [FLUME-1287] - Add Standalone Example to Docs * [FLUME-1330] - Avro Source should not use Fixed thread pool for boss threads when pool size is specified * [FLUME-1338] - Produce helpful error message in case that timestamp header is missing when time based bucketing is in use * [FLUME-1343] - Improve user guide * [FLUME-1345] - Use apache-flume for the artifact instead of flume-ng-dist * [FLUME-1351] - Add release version to Flume documentation ** Bug * [FLUME-862] - AvroSource breaks when config properties changes different service * [FLUME-1002] - FailoverSinkProcessor replaces sinks with same priority * [FLUME-1017] - syslog classes missing * [FLUME-1026] - Document Thread Safety Guarantees * [FLUME-1027] - Missing log4j library in Flume distribution * [FLUME-1031] - Depricate code generated by Thrift and Avro OG sources that is under com.cloudera package * [FLUME-1035] - slf4j error in flume sdk unit tests * [FLUME-1036] - Reconfiguration of AVRO or NETCAT source causes port bind exception * [FLUME-1037] - NETCAT handler theads terminate under stress test * [FLUME-1040] - Release-Notes says Apache Ivy instead of Apache Flume * [FLUME-1041] - maven warns of duplicate dependencies * [FLUME-1046] - invoking flume-ng script from bin directory fails * [FLUME-1047] - Client SDK has dependency on apache commons * [FLUME-1070] - Fix javadoc for configuring hdfsSink * [FLUME-1074] - AvroSink if any non-caught exception is thrown, an exception is thrown in finally clause * [FLUME-1075] - HDFSEventSink begin is called when transaction opened due to other error * [FLUME-1079] - Flume agent reconfiguration enters permanent bad state * [FLUME-1080] - Issue with HDFSEventSink for append support * [FLUME-1083] - Why does flume binary archive produces the following empty directories: bin/{ia64,amd64} ? * [FLUME-1087] - Restore Client API compat with v1.1.0 * [FLUME-1088] - TestWAL.testThreadedAppend fails on jenkins build server * [FLUME-1094] - hadoop.profile=23 build is broken by slf4j-jcl dependencies * [FLUME-1096] - Add support to pass headers through AvroCLIClient * [FLUME-1098] - Hadoop jars from compilation step included in assembly build * [FLUME-1099] - copy-paste issue with RecoverableMemoryChannel * [FLUME-1102] - HDFSEventSink rollInterval is broken * [FLUME-1104] - HDFS rolls the first file incorrectly * [FLUME-1108] - FILE_ROLL sink doesn't accept value 0 for unlimited wait time before rolling file * [FLUME-1109] - Syslog sources need to be refactored * [FLUME-1112] - HDFSCompressedDataStream append does not work * [FLUME-1114] - Syslog Sources does not implement maxsize * [FLUME-1116] - Extra event created for max payload size of 2500 bytes in Flume syslogtcp source * [FLUME-1119] - Remove default ports for syslog sources * [FLUME-1121] - Recoverable Memory Channel cannot recover data * [FLUME-1124] - Lifecycle supervisor can cause thread contention, sometimes causing components to not startup. * [FLUME-1125] - flume-ng script allows flume-env.sh to clobber some command-line arguments * [FLUME-1128] - Conf poller should use schedule with fixed delay * [FLUME-1129] - change foo to agent in sample config * [FLUME-1130] - flume-ng script bad ordering on FLUME_HOME var * [FLUME-1135] - flume-docs exclude is not sufficient for rat * [FLUME-1136] - Remove from executor service does not always remove the runnables from the queue * [FLUME-1142] - Seq source fails with multiplexing channel selector * [FLUME-1148] - Refactor logging * [FLUME-1149] - All sources get same channel list even if configuration is different. * [FLUME-1154] - flume-ng script should first try finding java from PATH and then try using bigtop, instead of vice-versa * [FLUME-1156] - If config file has empty sources, then throws NPE * [FLUME-1163] - HDFSEventSink leaves .tmp files in place when Flume is stopped * [FLUME-1164] - Configure should be called after stopping all events. * [FLUME-1177] - Maven deps on flume-ng-configuration module are brought in transitively instead of directly * [FLUME-1180] - ChannelSelectorFactory creates incorrect selector for multiplexing selector type * [FLUME-1181] - Context must enforce dot-separated prefix for sub-properties. * [FLUME-1182] - Syslog source cannot read format correctly from configuration * [FLUME-1184] - TestFileChannel.testThreaded fails sometimes * [FLUME-1188] - TestRecoverableMemoryChannel.testThreaded can fail sometimes * [FLUME-1190] - DurableFileChannel requires FILE enum definition in ChannelConfigurationType * [FLUME-1194] - RecoverableMemoryChannel prop misspelled -- "rentention" should be "retention" * [FLUME-1200] - HDFSEventSink causes *.snappy file to be created in HDFS even when snappy isn't used (due to missing lib) * [FLUME-1202] - Too many approved licenses * [FLUME-1204] - Add more unit tests for hbase sink * [FLUME-1205] - NPE related to checkpointing when using FileChannel * [FLUME-1213] - HDFS sink should allow bucketpath rounding down. * [FLUME-1216] - Need useful error message when keytab does not exist * [FLUME-1217] - HDFS Event Sink generates warnings due to recent change * [FLUME-1219] - Race conditions in BucketWriter / HDFSEventSink * [FLUME-1220] - Load balancing channel selector needs to be in the configuration type * [FLUME-1221] - ThriftLegacySource doesn't handle fields -> headers conversions for bytebuffers * [FLUME-1226] - FailoverRpcClient should check for NULL batch-size property * [FLUME-1229] - System.nanoTime incorrectly used in filename for HDFS file rolling * [FLUME-1230] - Sink gets initialized even when not active * [FLUME-1231] - Deadlock between BucketWriter and LeaseChecker on shutdown * [FLUME-1232] - Cannot start agent a 3rd time when using FileChannel * [FLUME-1234] - Can't use %P escape sequence for bucket path of HDFS sink * [FLUME-1236] - File channel has a race condition between start and create transaction method * [FLUME-1240] - Add version info to Flume NG * [FLUME-1241] - Flume dist should include the flume-ng-doc directory * [FLUME-1244] - Implement a load-balancing RpcClient with round/robin and random distribution capabilties. * [FLUME-1245] - HDFSCompressedDataStream calls finish() on sync instead of flush() * [FLUME-1246] - FileChannel hangs silently when Hadoop libs not found * [FLUME-1248] - flume-ng script gets broken when it tried to load hbase classpath * [FLUME-1253] - Support for running integration tests * [FLUME-1254] - RpcClient can hang when communication is broken with the source. * [FLUME-1270] - Incorrect default hdfs.callTimeout and hdfs.fileType of HDFSEventSink in FlumeUserGuide.rst * [FLUME-1271] - Incorrect configuration causes NPE * [FLUME-1280] - Make all config properties of Hbase sinks public constants * [FLUME-1284] - Need host interceptor for hdfs bucket path escape sequence * [FLUME-1288] - Async hbase sink should throw exception when hbase reports failure and check hbase table correctness * [FLUME-1290] - HDFS Sink should accept fileType parameters of arbitrary case * [FLUME-1297] - Flume tests should wait for a few seconds for agent to start. * [FLUME-1301] - HDFSCompressedDataStream can lose data * [FLUME-1303] - java.library.path value is being truncated at first 'n' char * [FLUME-1304] - Allow for faster allocation of checkpoint file. * [FLUME-1306] - LoadBalancingRpcClient should catch exception for invalid RpcClient and failover to valid one * [FLUME-1309] - Integration tests not included in assembly build artifacts * [FLUME-1312] - Host interceptor should support custom headers * [FLUME-1314] - File channel log file can grow beyond max size which causes startup failure * [FLUME-1315] - Null sink should support batching * [FLUME-1316] - AvroSink should be configurable for connect-timeout and request-timeout * [FLUME-1317] - Assembly build pulls in target folder from flume-ng-tests * [FLUME-1319] - File Channel optimize replay of logs when a checkpoint is present * [FLUME-1320] - Add safeguard for checkpoint corruption detection * [FLUME-1322] - ChannelProcessor should catch Throwable to work around close() clobbering uncaught Exceptions * [FLUME-1323] - Remove shutdown hook from FileChannel * [FLUME-1324] - File Channel Log can contain unassigned blocks * [FLUME-1325] - Components should be stopped in the reverse order that they were started * [FLUME-1327] - File Channel can deadlock in when checkpoint happens in between a put/take/commit * [FLUME-1329] - AvroSink can hang during Avro RPC handshake * [FLUME-1331] - Start method of components throwing NoClassDefFoundError are not caught * [FLUME-1333] - Disable running of saveVersion.sh on Windows * [FLUME-1341] - Build fails on jenkins because a file exists in the environment * [FLUME-1344] - AvroSink JMX does not report connection created count accurately * [FLUME-1346] - Build warning from missing maven-sphinx version in reporting section * [FLUME-1347] - Deprecate RecoverableMemoryChannel * [FLUME-1348] - Update the documentation, correcting links and removing incubation. * [FLUME-1349] - Document Hbase sinks. * [FLUME-1352] - Add documentation for HDFS path rounddown. * [FLUME-1355] - Improve user guide section about sink processors * [FLUME-1356] - Document interceptors ** Task * [FLUME-840] - Update project committers in pom file * [FLUME-991] - Make flume configuration validation component specific at time rather than at runtime * [FLUME-1028] - Fix jenkins build after addition of submodule * [FLUME-1050] - Update version of surefire plugin * [FLUME-1073] - Default Log4j configuration file should have a rolling policy * [FLUME-1082] - Add User and dev guide to Flume site * [FLUME-1151] - Exclude docs directory from rat * [FLUME-1189] - Test ReoverableMemoryChannel throughput versus FileChannel * [FLUME-1300] - Update user guide for File Channel * [FLUME-1353] - Ensure license headers are consistent ** Sub-task * [FLUME-748] - Create metric collection infrastructure * [FLUME-962] - Failover capability for Client SDK * [FLUME-992] - Create configuration stubs for sources, channels, sinks etc * [FLUME-999] - Updating init scripts and variables to fit the term agent * [FLUME-1052] - Core configuration component * [FLUME-1053] - Basic SourceConfiguration * [FLUME-1054] - Basic ChannelConfiguration * [FLUME-1055] - Basic SinkConfiguration * [FLUME-1105] - Allow the optional disabling of foreign keys * [FLUME-1107] - Configuration keys for JDBC channel contain redundant prefix. * [FLUME-1113] - JDBC Channel invokes size query on every put ---- Release Notes - Flume - Version v1.1.0 ** Sub-task * [FLUME-989] - Factor Flume Avro RPC interfaces out into separate Client SDK ** Bug * [FLUME-11] - Tests are setting logger level and should not be. * [FLUME-889] - All events in memory channel are lost on reconfiguration * [FLUME-920] - flume-ng script does not work on Ubuntu Maverick * [FLUME-933] - Default[Source|Sink|Channel]Factory implementation should do reference counting for create/unregistering instances. * [FLUME-936] - MemoryChannel is not thread safe * [FLUME-955] - Rat failure: Legacy Avro Source missing Apache license header * [FLUME-957] - Remove unused flume json config file * [FLUME-960] - TestAvroSink.testFailedConnect is racy and fails often * [FLUME-963] - Add additional tests to TestHDFSEventSink and demystify existing tests * [FLUME-972] - Missing dep when attempting to prepare flume dir for import into Eclipse * [FLUME-987] - LoggerSink prints garbage for body * [FLUME-1003] - The memory channel does not seem to respect the capacity * [FLUME-1005] - Several issues with flume-ng script * [FLUME-1009] - HDFSEventSink should return BACKOFF when the channel returns null * [FLUME-1018] - Context can cause NullPointerException ** Improvement * [FLUME-886] - Create Log4j Appender * [FLUME-919] - flume-ng script should use exec when spawning the java process * [FLUME-922] - Straighten up branches for development * [FLUME-925] - Update build infrastructure to follow Apache Maven guidelines * [FLUME-931] - Flume NG TailSource * [FLUME-932] - Making flume-ng components pluggage and name aware * [FLUME-935] - Create abstract implementations of basic channel/transaction semantics * [FLUME-939] - Load flume-env.sh from flume_conf_dir environment variable / system property as opposed to bin directory * [FLUME-945] - Add the ability to specify a default channel for multiplexing channel selector. * [FLUME-956] - Configuration File Template * [FLUME-958] - Miscellaneous build improvements * [FLUME-964] - Remove compiler warnings where possible * [FLUME-978] - Context interface is too basic requiring boilerplate user code * [FLUME-984] - SinkRunner should catch unhanded exceptions and log them like PollingSourceRunner * [FLUME-990] - Hive sink * [FLUME-1019] - Document Sink and related interfaces, defining expected behaviors * [FLUME-1021] - Document API contracts and expected behavior in additional interfaces, including Source ** New Feature * [FLUME-865] - Implement failover sink * [FLUME-892] - Support for SysLog as source * [FLUME-914] - Port the IRC sink to flume ng * [FLUME-930] - Support for multiplexing into different channels from single source. * [FLUME-942] - Support event compatibility with Flume 0.9x * [FLUME-970] - Create user guide for Flume NG * [FLUME-1015] - S3 sink on flumeNG ** Task * [FLUME-940] - Remove unused code from Flume * [FLUME-949] - Collapse PollableSink into Sink interface. * [FLUME-977] - Migrate trunk to 0.9.5 branch and move branch flume-728 over to trunk ---- Release Notes - Flume - Version v1.0.0 - 20111230 ** Bug * [FLUME-830] - flume uber jar is missing files from flume-file-channel project * [FLUME-831] - flume-jdbc-channel project has unnecessary direct dependency on log4j API * [FLUME-833] - Audit Direct Library Deps for Flume NG * [FLUME-835] - Issues during clean build of Flume NG * [FLUME-850] - Upgrade the version of Hadoop we use for HDFS sink * [FLUME-858] - HDFSWriterFactory is using operation == for string comparison * [FLUME-863] - Use of unknown sink type leads to NullPointerException * [FLUME-868] - RAT checks fail on builds.apache.org due to local maven repo * [FLUME-869] - JDBC channel tests leave derby.log in module directory * [FLUME-870] - LoggerSink contains two calls to Transaction#commit() * [FLUME-880] - HDFSFormatterFactory is using == operator for String objects * [FLUME-887] - Add maven assembly to build a source only artifact * [FLUME-891] - flume-ng script doesn't build the classpath properly * [FLUME-894] - Add log4j as part of the build * [FLUME-898] - Create DISCLAIMER file for Flume project * [FLUME-900] - RELEASENOTES file needs to be ignored by RAT * [FLUME-902] - Remove thrift references in NG build * [FLUME-903] - Update project metadata in main pom * [FLUME-904] - Update plugin and dependency repos referenced in the main pom * [FLUME-905] - ExecSource silently fails after first transaction with channel * [FLUME-906] - Maven Avro plugin missing an entry in plugin dep management * [FLUME-907] - Maven assembly missing CHANGELOG and other misc files * [FLUME-908] - Clean out old bin and conf contents * [FLUME-909] - org.apache.flume.node.TestAbstractLogicalNodeManager is failing on some machines * [FLUME-910] - Typo in maven avro plugin groupId in plugin dep management * [FLUME-911] - README should reference Apache Flume rather than just Flume * [FLUME-912] - DEVNOTES contains outdated info * [FLUME-913] - slf4j-log4j binding is excluded from the dist assembly due to test scope * [FLUME-915] - Incorrect license information in various files * [FLUME-916] - DISCLAIMER file has an incorrect URL ** Improvement * [FLUME-846] - Bump Avro version to 1.6.x * [FLUME-867] - Pollable source and sink runners should reduce poll interval after several BACKOFFs * [FLUME-871] - HDFS sink needs to handle blocked/hung append operation ** Question * [FLUME-856] - Move data across hosts ** Task * [FLUME-876] - Update README, NOTICE, LICENSE, and DEVNOTES files * [FLUME-878] - Write release-ready Maven assembly descriptor * [FLUME-879] - Document Flume's ASF release process * [FLUME-885] - Set version number of project to 1.0.0-SNAPSHOT for NG branch * [FLUME-899] - Add Release notes * [FLUME-901] - Make Flume NG build and pass tests against Hadoop 0.23 branch Release Notes - Flume - Version NG alpha 2 - 20111107 ** Bug * [FLUME-773] - ExecSource doesn't rollback transactions on errors * [FLUME-805] - HDFS sink should mangle the names of incomplete files till they are closed * [FLUME-815] - Test json config file has missing bind property * [FLUME-816] - TestJdbcChannelProvider throws OOME based on RNG values * [FLUME-817] - JdbcChannel can not be created by DefaultChannelFactory * [FLUME-818] - PropertiesFileConfigurationProvider doesn't properly log exceptions * [FLUME-822] - JDBC channel lock acquisition failure during take() * [FLUME-823] - The properties configuration provider should fail if the configuration file is not found * [FLUME-825] - Need to remove dependency on hadoop core from flume-ng-core project * [FLUME-827] - Avro client conn failure results in 60-second wait before terminating * [FLUME-848] - Typo is TestHDFSEventSink path * [FLUME-861] - AvroSource is failing on ClosedChannelException ** Improvement * [FLUME-819] - JDBC channel plugin is not registered with Flume node * [FLUME-820] - JDBC channel should support capacity specification. * [FLUME-821] - Derby schema handler should create the necessary indexes for fast lookups. * [FLUME-826] - Update libs ** New Feature * [FLUME-775] - Support text output * [FLUME-814] - Add support for multiple channels to sources ** Task * [FLUME-812] - Enable Apache RAT checks during Maven build * [FLUME-866] - Remove old plugins and log4j appender Release Notes - Flume - Version NG alpha 1 - 20111021 ** Sub-task * [FLUME-737] - Port Flume OG sources and sinks to NG interfaces * [FLUME-739] - Create NG node configuration components * [FLUME-747] - Create NG command line launchers and daemon infrastructure * [FLUME-760] - Implement JDBC based channel implementation * [FLUME-761] - Implement HDFS Flume NG sink * [FLUME-777] - Support text output for HDFS sink * [FLUME-795] - Replace the non-transactional memory channel with new transactional memory channel ** Bug * [FLUME-769] - FLUME-728 - TestJsonFileConfigurationProvider fails due to timing issue * [FLUME-784] - MemoryChannel should poll with timeout on take() rather than block indefinitely * [FLUME-788] - Add more test cases to Flume-NG HDFS test * [FLUME-803] - support re-entrant transaction for memory channel * [FLUME-806] - Fix cast exception in MemoryChannel due to config type changes * [FLUME-807] - Fix tests broken by FLUME-802 changes * [FLUME-809] - Fix channel syntax javadoc bug in PropertiesFileConfigurationProvider * [FLUME-811] - Remove logging of avro client that causes errors with proxy object methods ** Epic * [FLUME-728] - Flume NG refactoring ** Improvement * [FLUME-772] - MemoryChannel should push events back into channel on rollback * [FLUME-774] - Move HDFS sink into a separate module * [FLUME-781] - Add error checking to AvroCLICilent * [FLUME-782] - Instrument AvroSource with counters * [FLUME-783] - Add a batch event RPC call to AvroSource * [FLUME-804] - Support help and node name arguments from the command line * [FLUME-810] - Add help command line options to AvroCLIClient ** New Feature * [FLUME-771] - Implement NG Avro source * [FLUME-778] - Implement NG Avro sink * [FLUME-779] - Create an Avro CLI for sending data to the Avro source ** Task * [FLUME-780] - Reduce default log levels for chatty libraries * [FLUME-785] - Write javadoc for builtin channels * [FLUME-786] - Write javadoc for builtin sources * [FLUME-787] - Write javadoc for builtin sinks * [FLUME-801] - Write NG getting started guide * [FLUME-802] - Complete PropertyFileConfigurationProvider implementation