# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. End to end tests --------------- End to end tests in templeton runs tests against an existing templeton server. It runs hcat, mapreduce, streaming, hive and pig tests. It's a good idea to look at current versions of http://hive.apache.org/docs/hcat_r0.5.0/rest_server_install.html and http://hive.apache.org/docs/hcat_r0.5.0/configuration.html before proceeding. (Note that by default, webhcat-default.xml templeton.hive.properties sets hive.metastore.uris=thrift://localhost:9933, thus WebHCat will expect an external metastore to be running. to start hive metastore: ./bin/hive --service metastore -p 9933) launch templeton server: ./hcatalog/sbin/webhcat_server.sh start to control which DB the metastore uses put something like javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/Users/ekoifman/dev/data/tmp/metastore_db_e2e;create=true Controls which DB engine metastore will use for persistence. In particular, where Derby will create it's data files. in hive-site.xml ) !!!! NOTE !!!! -------------- USE SVN TO CHECKOUT CODE FOR RUNNING TESTS AS THE TEST HARNESS IS EXTERNED FROM PIG. GIT WILL NOT IMPORT IT (if you are using GIT, check out http://svn.apache.org/repos/asf/hive/trunk (or whichever branch) (http://hive.apache.org/version_control.html) and symlink hcatalog/src/test/e2e/harness/ to corresponding harness/ in SVN tree) Test cases ---------- The tests are defined in src/test/e2e/templeton/tests/*.conf Test framework -------------- The test framework is derived from the one used in pig, there is more documentation here on the framework - https://cwiki.apache.org/confluence/display/PIG/HowToTest Setup ----- 1. Templeton needs to be installed and setup to be able to run hcat, maprduce, hive and pig commands. 2. Install perl and following perl modules (cpan -i ) * IPC::Run * JSON * JSON::Path * Data::Dump * Number::Compare * Text::Glob * Data::Compare * File::Find::Rule * HTTP::Daemon * Parallel::ForkManager Tips: * Using perlbrew (http://perlbrew.pl) should make installing perl modules easier. * Use 'yes | cpan -i ' to avoid answering the 100's of questions cpan asks. 3. Copy contents of src/test/e2e/templeton/inpdir to hdfs (e.g. ./bin/hadoop fs -put ~/dev/hive/hcatalog/src/test/e2e/templeton/inpdir/ webhcate2e) 4. You will need to two jars in the same HDFS directory as the contents of inpdir. piggybank.jar, which can be obtained from Pig. The second is the hadoop-examples.jar, which can be obtained from your Hadoop distribution. This should be called hexamples.jar when it is uploaded to HDFS. Also see http://hive.apache.org/docs/hcat_r0.5.0/rest_server_install.html#Hadoop+Distributed+Cache for notes on additional JAR files to copy to HDFS. 5. Make sure TEMPLETON_HOME evnironment variable is set 6. hadoop/conf/core-site.xml should have items described in http://hive.apache.org/docs/hcat_r0.5.0/rest_server_install.html#Permissions Running the tests ----------------- Use the following command to run tests - ant test -Dinpdir.hdfs= -Dtest.user.name= \ -Dsecure.mode= -Dharness.webhdfs.url= -Dharness.templeton.url= If you want to run specific test group you can specify the group, for example: -Dtests.to.run='-t TestHive' If you want to run specific test in a group group you can specify the test, for example: -Dtests.to.run='-t TestHive_1' For example, tests/ddl.conf has several groups such as 'name' => 'REST_DDL_TABLE_BASIC'; use REST_DDL_TABLE_BASIC as the name Running the hcat authorization tests ------------------------------------ Hcat authorization tests run commands as different users to test if authorization is done right. ant test-hcat-authorization -Dkeytab.dir= -Dsecure.mode= -Dtest.group.name= -Dinpdir.hdfs= -Dtest.user.name= -Dtest.group.user.name= -Dtest.other.user.name= -Dharness.webhdfs.url= -Dharness.templeton.url= The is expected to have keytab filenames of the form - user_name.*keytab . Running WebHCat doas tests -------------------------- ant clean test-doas -Dinpdir.hdfs=/user/ekoifman/webhcate2e -Dsecure.mode=no -Dharness.webhdfs.url=http://localhost:8085 -Dharness.templeton.url=http://localhost:50111 -Dtests.to.run='-t doAsTests' -Dtest.user.name=hue -Ddoas.user=joe The canonical example, is WebHCat server is running as user 'hcat', end user 'joe' is using Hue, which generates a request to WebHCat. If Hue specifies doAs=joe, then the commands that WebHCat submits to Hadoop will be run as user 'joe'. In order for this test suite to work, webhcat-site.xml should have webhcat.proxyuser.hue.groups and webhcat.proxyuser.hue.hosts defined, i.e. 'hue' should be allowed to impersonate 'joe'. [Of course, 'hcat' proxyuser should be configured in core-site.xml for the command to succeed.] Furthermore, metastore side file based security should be enabled. To do this 3 properties in hive-site.xml should be configured: 1) hive.security.metastore.authorization.manager set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider 2) hive.security.metastore.authenticator.manager set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator 3) hive.metastore.pre.event.listeners set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener 4) hive.metastore.execute.setugi set to true Notes ----- Enable webhdfs by adding the following to your hadoop hdfs-site.xml : dfs.webhdfs.enabled true dfs.http.address 127.0.0.1:8085 true You can build a server that will measure test coverage by using templeton: ant clean; ant e2e This assumes you've got webhdfs at the address above, the inpdir info in /user/templeton, and templeton running on the default port. You can change any of those properties in the build file. It's best to set HADOOP_HOME_WARN_SUPPRESS=true everywhere you can. Also useful to add to conf/hadoop-env.sh export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk" to prevent warning about SCDynamicStore which may throw some tests off (http://stackoverflow.com/questions/7134723/hadoop-on-osx-unable-to-load-realm-info-from-scdynamicstore) Performance ----------- It's a good idea to set fork.factor.conf.file={number of .conf files} and fork.factor.group to something > 1 (see build.xml) to make these tests run faster. If doing this, make sure the Hadoop Cluster has enough map slots (10?) (mapred.tasktracker.map.tasks.maximum), otherwise test parallelism won't help. Adding Tests ------------ ToDo: add some guidelines