~~ Licensed under the Apache License, Version 2.0 (the "License"); ~~ you may not use this file except in compliance with the License. ~~ You may obtain a copy of the License at ~~ ~~ http://www.apache.org/licenses/LICENSE-2.0 ~~ ~~ Unless required by applicable law or agreed to in writing, software ~~ distributed under the License is distributed on an "AS IS" BASIS, ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ~~ See the License for the specific language governing permissions and ~~ limitations under the License. See accompanying LICENSE file. --- C API libhdfs --- --- ${maven.build.timestamp} C API libhdfs %{toc|section=1|fromDepth=0} * Overview libhdfs is a JNI based C API for Hadoop's Distributed File System (HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and the filesystem. libhdfs is part of the Hadoop distribution and comes pre-compiled in <<<${HADOOP_PREFIX}/libhdfs/libhdfs.so>>> . * The APIs The libhdfs APIs are a subset of: {{{hadoop fs APIs}}}. The header file for libhdfs describes each API in detail and is available in <<<${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h>>> * A Sample Program ---- \#include "hdfs.h" int main(int argc, char **argv) { hdfsFS fs = hdfsConnect("default", 0); const char* writePath = "/tmp/testfile.txt"; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { fprintf(stderr, "Failed to open %s for writing!\n", writePath); exit(-1); } char* buffer = "Hello, World!"; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); if (hdfsFlush(fs, writeFile)) { fprintf(stderr, "Failed to 'flush' %s\n", writePath); exit(-1); } hdfsCloseFile(fs, writeFile); } ---- * How To Link With The Library See the Makefile for <<>> in the libhdfs source directory (<<<${HADOOP_PREFIX}/src/c++/libhdfs/Makefile>>>) or something like: <<>> * Common Problems The most common problem is the <<>> is not set properly when calling a program that uses libhdfs. Make sure you set it to all the Hadoop jars needed to run Hadoop itself. Currently, there is no way to programmatically generate the classpath, but a good bet is to include all the jar files in <<<${HADOOP_PREFIX}>>> and <<<${HADOOP_PREFIX}/lib>>> as well as the right configuration directory containing <<>> * Thread Safe libdhfs is thread safe. * Concurrency and Hadoop FS "handles" The Hadoop FS implementation includes a FS handle cache which caches based on the URI of the namenode along with the user connecting. So, all calls to <<>> will return the same handle but calls to <<>> with different users will return different handles. But, since HDFS client handles are completely thread safe, this has no bearing on concurrency. * Concurrency and libhdfs/JNI The libhdfs calls to JNI should always be creating thread local storage, so (in theory), libhdfs should be as thread safe as the underlying calls to the Hadoop FS.