Skip to content
This repository was archived by the owner on Apr 19, 2022. It is now read-only.

Commit

Permalink
Update documentation - testing/development
Browse files Browse the repository at this point in the history
 * add sphinx_rtd_theme theme to dev requirements
 * change theme to sphinx_rtd_theme
 * fix section headers - use chapters where needed
 * refresh documentation where needed
 * add docs on development and testing

Closes #81 and #82
  • Loading branch information
Rafal Wojdyla committed Oct 9, 2014
1 parent b687347 commit f3ba22d
Show file tree
Hide file tree
Showing 10 changed files with 225 additions and 68 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ dist/
.tox
*.swp
.DS_Store
doc/html
doc/doctrees
60 changes: 49 additions & 11 deletions doc/source/cli.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,31 @@
**********
CLI client
==========
**********
A command line interface for HDFS using :mod:`snakebite.client <client>`.

The CLI client first tries parse the path and in case it's in the form
``hdfs://namenode:port/path`` it will use that configuration.
Otherwise it will use -n and -p command line arguments.
If the previous aren't set it tries to read the config from ``~/.snakebiterc`` and
if that doesn't exist, it will check ``$HADOOP_HOME/core-site.xml``.
Config
======

A config looks like
Snakebite CLI can accept configuration in a couple of different ways,
but there's strict priority for each of them.
List of methods, in priority order:

1. via path in command line - eg: ``hdfs://namenode_host:port/path``
2. via ``-n``, ``-p``, ``-V`` flags in command line
3. via ``~/.snakebiterc`` file
4. via ``/etc/snakebiterc`` file
5. via ``$HADOOP_HOME/core-site.xml`` and/or ``$HADOOP_HOME/hdfs-site.xml`` files
6. via ``core-site.xml`` and/or ``hdfs-site.xml`` in default locations

More about methods from 3 to 6 below.

Config files
^^^^^^^^^^^^

Snakebite config can exist in ``~/.snakebiterc`` - per system user, or in
``/etc/snakebiterc`` - system wide config.

A config looks like:

::

Expand All @@ -22,10 +39,31 @@ A config looks like
}


The version property denotes the protocol version used. CDH 4.1.3 uses protocol 7, while
HDP 2.0 uses protocol 8. Snakebite defaults to 7.
The version property denotes the protocol version used. CDH 4.1.3 uses protocol 7, while
HDP 2.0 uses protocol 9. Snakebite defaults to 9. Default port of namenode is 8020.
Default value of ``skiptrash`` is ``true``.

Hadoop config files
^^^^^^^^^^^^^^^^^^^

Last two methods of providing config for snakebite is through hadoop config files.
If ``HADOOP_HOME`` environment variable is set, snakebite will try to find ``core-site.xml``
and/or ``hdfs-site.xml`` files in ``$HADOOP_HOME`` directory. If ``HADOOP_HOME`` is not set,
snakebite will try to find those files in a couple of default hadoop config locations:
* /etc/hadoop/conf/core-site.xml
* /usr/local/etc/hadoop/conf/core-site.xml
* /usr/local/hadoop/conf/core-site.xml
* /etc/hadoop/conf/hdfs-site.xml
* /usr/local/etc/hadoop/conf/hdfs-site.xml
* /usr/local/hadoop/conf/hdfs-site.xml

Bash completion
===============

Snakebite cli comes with bash completion inf /scripts.
Snakebite CLI comes with bash completion file in /scripts. If snakebite is installed
via debian package it will install completion file automatically. But if snakebite
is installed via pip/setup.py it will not do that, as it would requite write access
in /etc (usually root), in that case it's required to install completion script manually.

Usage
=====
Expand Down Expand Up @@ -67,4 +105,4 @@ Usage
touchz [paths] creates a file of zero length
usage <cmd> show cmd usage

to see command-specific options use: snakebite [cmd] --help
to see command-specific options use: snakebite [cmd] --help
5 changes: 3 additions & 2 deletions doc/source/client.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
**************
Client library
==============
**************
.. automodule:: client

.. autoclass:: Client
Expand All @@ -9,4 +10,4 @@ Client library
:members:

.. autoclass:: HAClient
:members:
:members:
7 changes: 5 additions & 2 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@

import sys
import os

import sphinx_rtd_theme

import snakebite.version

# If extensions (or modules to document with autodoc) are in another directory,
Expand Down Expand Up @@ -96,15 +99,15 @@

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
html_theme = "sphinx_rtd_theme"

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}

# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
Expand Down
37 changes: 37 additions & 0 deletions doc/source/development.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
***********
Development
***********

How to start
============

We try to make it as easy as possible to start development on snakebite.
We recommend to use virtualenv (+ virtualenvwrapper) for development purposes,
it's not required to but highly recommended. To install, and create development
environment for snakebite:

1. install virtualenvwrapper:
``$ pip install virtualenvwrapper``
2. create development environment:
``$ mkvirtualenv snakebite_dev``

More about virtualenvwrapper and virtualenv `here <http://virtualenvwrapper.readthedocs.org/en/latest/>`_

Below is the list of recommended steps to start development:

1. clone repo:
``$ git clone [email protected]:spotify/snakebite.git``
2. fetch all developer requirements:
``$ pip install -r requirements-dev.txt``
3. run tests:
``$ python setup.py test``

If tests succeeded you are ready to hack! Remember to always test
your changes and please come back with a PR <3

Open issues
===========

If you're looking for open issues please take a look `here <https://github.com/spotify/snakebite/issues>`_.

Thanks!
12 changes: 5 additions & 7 deletions doc/source/hadoop_rpc.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
*******************************
Hadoop RPC protocol description
===============================
*******************************

Snakebite currently implements the following protocol in
:py:data:`snakebite.channel.SocketRpcChannel` to communicate with the NameNode.

=============
Connection
=============
==========
The Hadoop RPC protocol works as described below. On connection, headers are
sent to setup a session. After that, multiple requests can be sent within the session.

Expand All @@ -26,9 +26,8 @@ sent to setup a session. After that, multiple requests can be sent within the se
| IpcConnectionContextProto | :py:data:`bytes` | |
+----------------------------------+------------------+----------------------------------------+

==================
Sending messages
==================
================

When sending a message, the following is sent to the sever:

Expand Down Expand Up @@ -62,9 +61,8 @@ The :py:data:`HadoopRpcRequestProto` contains a :py:data:`methodName` field that
what server method is called and a has a property :py:data:`request` that contains the
serialized actual request message.

====================
Receiving messages
====================
==================

After a message is sent, the response can be read in the following way:

Expand Down
53 changes: 9 additions & 44 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
#######################
Snakebite documentation
=======================
#######################
Snakebite is a python package that provides:

.. toctree::
:hidden:

client
cli
development
testing
minicluster
hadoop_rpc

* :doc:`A pure python HDFS client library that uses protobuf messages over Hadoop RPC to communicate with the namenode. <client>`

* :doc:`A pure python HDFS client library that uses protobuf messages over Hadoop RPC to communicate with HDFS. <client>`
* :doc:`A command line interface (CLI) for HDFS that uses the pure python client library. <cli>`
* :doc:`A hadoop minicluster wrapper. <minicluster>`
* :doc:`Hadoop RPC specification. <hadoop_rpc>`
Expand All @@ -21,10 +25,10 @@ Since the 'normal' Hadoop HDFS client (``hadoop fs``) is written in Java and has
a lot of dependencies on Hadoop jars, startup times are quite high (> 3 secs).
This isn't ideal for integrating Hadoop commands in python projects.

At Spotify we use the `luigi job scheduler <http://github.com/spotify/luigi>`_
At Spotify we use the `luigi job scheduler <http://github.com/spotify/luigi>`_
that relies on doing a lot of existence checks and moving data around in HDFS.
And since calling ``hadoop`` from python is expensive, we decided to write a
pure python HDFS client that only relies on protobuf. The current
pure python HDFS client that only relies on protobuf. The current
:mod:`snakebite.client <client>` library uses protobuf messages and
implements the Hadoop RPC protocol for talking to the NameNode.

Expand All @@ -40,45 +44,6 @@ we've implemented a :doc:`cli` as well.
CRC during transfer, but this is disabled by default because of performance
reasons. This is the opposite behaviour from the stock Hadoop client.

Testing
=======
.. warning:: :mod:`snakebite.client <client>` hasn't been tested in the wild
a lot! **USE AT YOUR OWN RISK!**

Tests can be run with ``nosetests``. Currently, only integration tests are
provided and use ``minicluster.py`` to spawn an HDFS minicluster.

When running the tests, make sure that the ``HADOOP_HOME`` environment variable is set.
The minicluster uses the ``hadoop-mapreduce-client-jobclient.<version>-tests.jar`` and
assumes this is located in ``HADOOP_HOME``. The job client test jar can also be specified
by using the ``HADOOP_JOBCLIENT_JAR`` environment variable.

Also, make sure the ``JAVA_HOME`` environment variable is set.

.. note:: Different Hadoop distributions use different protocol versions. Snakebite 1.3.x
and the tests default to version 7 (CDH 4.1.3).
Snakebite 2.x **ONLY** supports Hadoop > 2.2.0 (protocol version >9, e.g. HDP2.0/CDH5)! I
If you want to test with different protocol versions, set the ``HADOOP_PROTOCOL_VER``
environment variable to the apropriate version number.


.. note:: A hadoop installation is only required for testing.

TODO
====
* Only supports Auth method SIMPLE. We might want to have SASL or KERBEROS as well
* More tests.
* Return correct exit codes from cli client.
* Imrove speed of CRC verification.
* Improve methods:
* [-rm [-f] [-r|-R] [-skipTrash] <src> ...] (implement -f)

* Implement more methods (those need interaction with DataNodes):
* [-expunge]
* put [paths] dst copy sources from local file system to destination



LICENSE
=======
Copyright (c) 2013 - 2014 Spotify AB
Expand Down
5 changes: 3 additions & 2 deletions doc/source/minicluster.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
***********
Minicluster
===========
***********
.. automodule:: minicluster

.. autoclass:: MiniCluster
:members:
:members:
Loading

0 comments on commit f3ba22d

Please sign in to comment.