Update documentation - testing/development

* add sphinx_rtd_theme theme to dev requirements * change theme to sphinx_rtd_theme * fix section headers - use chapters where needed * refresh documentation where needed * add docs on development and testing Closes #81 and #82
spotify · Oct 9, 2014 · f3ba22d · f3ba22d
1 parent b687347
commit f3ba22d
Show file tree

Hide file tree

Showing 10 changed files with 225 additions and 68 deletions.
diff --git a/.gitignore b/.gitignore
@@ -11,3 +11,5 @@ dist/
 .tox
 *.swp
 .DS_Store
+doc/html
+doc/doctrees
diff --git a/doc/source/cli.rst b/doc/source/cli.rst
@@ -1,14 +1,31 @@
+**********
 CLI client
-==========
+**********
 A command line interface for HDFS using :mod:`snakebite.client <client>`.
 
-The CLI client first tries parse the path and in case it's in the form
-``hdfs://namenode:port/path`` it will use that configuration.
-Otherwise it will use -n and -p command line arguments.
-If the previous aren't set it tries to read the config from ``~/.snakebiterc`` and
-if that doesn't exist, it will check ``$HADOOP_HOME/core-site.xml``.
+Config
+======
 
-A config looks like
+Snakebite CLI can accept configuration in a couple of different ways,
+but there's strict priority for each of them.
+List of methods, in priority order:
+
+1. via path in command line - eg: ``hdfs://namenode_host:port/path``
+2. via ``-n``, ``-p``, ``-V`` flags in command line
+3. via ``~/.snakebiterc`` file
+4. via ``/etc/snakebiterc`` file
+5. via ``$HADOOP_HOME/core-site.xml`` and/or ``$HADOOP_HOME/hdfs-site.xml`` files
+6. via ``core-site.xml`` and/or ``hdfs-site.xml`` in default locations
+
+More about methods from 3 to 6 below.
+
+Config files
+^^^^^^^^^^^^
+
+Snakebite config can exist in ``~/.snakebiterc`` - per system user, or in
+``/etc/snakebiterc`` - system wide config.
+
+A config looks like:
 
 ::
 
@@ -22,10 +39,31 @@ A config looks like
   }
 
 
-The version property denotes the protocol version used. CDH 4.1.3 uses protocol 7, while 
-HDP 2.0 uses protocol 8. Snakebite defaults to 7.
+The version property denotes the protocol version used. CDH 4.1.3 uses protocol 7, while
+HDP 2.0 uses protocol 9. Snakebite defaults to 9. Default port of namenode is 8020.
+Default value of ``skiptrash`` is ``true``.
+
+Hadoop config files
+^^^^^^^^^^^^^^^^^^^
+
+Last two methods of providing config for snakebite is through hadoop config files.
+If ``HADOOP_HOME`` environment variable is set, snakebite will try to find ``core-site.xml``
+and/or ``hdfs-site.xml`` files in ``$HADOOP_HOME`` directory. If ``HADOOP_HOME`` is not set,
+snakebite will try to find those files in a couple of default hadoop config locations:
+ * /etc/hadoop/conf/core-site.xml
+ * /usr/local/etc/hadoop/conf/core-site.xml
+ * /usr/local/hadoop/conf/core-site.xml
+ * /etc/hadoop/conf/hdfs-site.xml
+ * /usr/local/etc/hadoop/conf/hdfs-site.xml
+ * /usr/local/hadoop/conf/hdfs-site.xml
+
+Bash completion
+===============
 
-Snakebite cli comes with bash completion inf /scripts.
+Snakebite CLI comes with bash completion file in /scripts. If snakebite is installed
+via debian package it will install completion file automatically. But if snakebite
+is installed via pip/setup.py it will not do that, as it would requite write access
+in /etc (usually root), in that case it's required to install completion script manually.
 
 Usage
 =====
@@ -67,4 +105,4 @@ Usage
       touchz [paths]                 creates a file of zero length
       usage <cmd>                    show cmd usage
 
-    to see command-specific options use: snakebite [cmd] --help
+    to see command-specific options use: snakebite [cmd] --help
diff --git a/doc/source/client.rst b/doc/source/client.rst
@@ -1,5 +1,6 @@
+**************
 Client library
-==============
+**************
 .. automodule:: client
 
 .. autoclass:: Client
@@ -9,4 +10,4 @@ Client library
     :members:
 
 .. autoclass:: HAClient
-    :members:
+    :members:
diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -13,6 +13,9 @@
 
 import sys
 import os
+
+import sphinx_rtd_theme
+
 import snakebite.version
 
 # If extensions (or modules to document with autodoc) are in another directory,
@@ -96,15 +99,15 @@
 
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
-html_theme = 'default'
+html_theme = "sphinx_rtd_theme"
 
 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
 # documentation.
 #html_theme_options = {}
 
 # Add any paths that contain custom themes here, relative to this directory.
-#html_theme_path = []
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
 
 # The name for this set of Sphinx documents.  If None, it defaults to
 # "<project> v<release> documentation".

diff --git a/doc/source/development.rst b/doc/source/development.rst
@@ -0,0 +1,37 @@
+***********
+Development
+***********
+
+How to start
+============
+
+We try to make it as easy as possible to start development on snakebite.
+We recommend to use virtualenv (+ virtualenvwrapper) for development purposes,
+it's not required to but highly recommended. To install, and create development
+environment for snakebite:
+
+1. install virtualenvwrapper:
+  ``$ pip install virtualenvwrapper``
+2. create development environment:
+  ``$ mkvirtualenv snakebite_dev``
+
+More about virtualenvwrapper and virtualenv `here <http://virtualenvwrapper.readthedocs.org/en/latest/>`_
+
+Below is the list of recommended steps to start development:
+
+1. clone repo:
+  ``$ git clone [email protected]:spotify/snakebite.git``
+2. fetch all developer requirements:
+  ``$ pip install -r requirements-dev.txt``
+3. run tests:
+  ``$ python setup.py test``
+
+If tests succeeded you are ready to hack! Remember to always test
+your changes and please come back with a PR <3
+
+Open issues
+===========
+
+If you're looking for open issues please take a look `here <https://github.com/spotify/snakebite/issues>`_.
+
+Thanks!
diff --git a/doc/source/hadoop_rpc.rst b/doc/source/hadoop_rpc.rst
@@ -1,12 +1,12 @@
+*******************************
 Hadoop RPC protocol description
-===============================
+*******************************
 
 Snakebite currently implements the following protocol in
 :py:data:`snakebite.channel.SocketRpcChannel` to communicate with the NameNode.
 
-=============
 Connection
-=============
+==========
 The Hadoop RPC protocol works as described below. On connection, headers are
 sent to setup a session. After that, multiple requests can be sent within the session.
 
@@ -26,9 +26,8 @@ sent to setup a session. After that, multiple requests can be sent within the se
 | IpcConnectionContextProto        | :py:data:`bytes` |                                        |
 +----------------------------------+------------------+----------------------------------------+
 
-==================
 Sending messages
-==================
+================
 
 When sending a message, the following is sent to the sever:
 
@@ -62,9 +61,8 @@ The :py:data:`HadoopRpcRequestProto` contains a :py:data:`methodName` field that
 what server method is called and a has a property :py:data:`request` that contains the
 serialized actual request message.
 
-====================
 Receiving messages
-====================
+==================
 
 After a message is sent, the response can be read in the following way:
 

diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -1,16 +1,20 @@
+#######################
 Snakebite documentation
-=======================
+#######################
 Snakebite is a python package that provides:
 
 .. toctree::
    :hidden:
-   
+
    client
    cli
+   development
+   testing
    minicluster
    hadoop_rpc
 
-* :doc:`A pure python HDFS client library that uses protobuf messages over Hadoop RPC to communicate with the namenode. <client>`
+
+* :doc:`A pure python HDFS client library that uses protobuf messages over Hadoop RPC to communicate with HDFS. <client>`
 * :doc:`A command line interface (CLI) for HDFS that uses the pure python client library. <cli>`
 * :doc:`A hadoop minicluster wrapper. <minicluster>`
 * :doc:`Hadoop RPC specification. <hadoop_rpc>`
@@ -21,10 +25,10 @@ Since the 'normal' Hadoop HDFS client (``hadoop fs``) is written in Java and has
 a lot of dependencies on Hadoop jars, startup times are quite high (> 3 secs).
 This isn't ideal for integrating Hadoop commands in python projects.
 
-At Spotify we use the `luigi job scheduler <http://github.com/spotify/luigi>`_ 
+At Spotify we use the `luigi job scheduler <http://github.com/spotify/luigi>`_
 that relies on doing a lot of existence checks and moving data around in HDFS.
 And since calling ``hadoop`` from python is expensive, we decided to write a
-pure python HDFS client that only relies on protobuf. The current 
+pure python HDFS client that only relies on protobuf. The current
 :mod:`snakebite.client <client>` library uses protobuf messages and
 implements the Hadoop RPC protocol for talking to the NameNode.
 
@@ -40,45 +44,6 @@ we've implemented a :doc:`cli` as well.
    CRC during transfer, but this is disabled by default because of performance
    reasons. This is the opposite behaviour from the stock Hadoop client.
 
-Testing
-=======
-.. warning:: :mod:`snakebite.client <client>` hasn't been tested in the wild
-   a lot! **USE AT YOUR OWN RISK!**
-
-Tests can be run with ``nosetests``. Currently, only integration tests are
-provided and use ``minicluster.py`` to spawn an HDFS minicluster. 
-
-When running the tests, make sure that the ``HADOOP_HOME`` environment variable is set.
-The minicluster uses the ``hadoop-mapreduce-client-jobclient.<version>-tests.jar`` and
-assumes this is located in ``HADOOP_HOME``. The job client test jar can also be specified
-by using the ``HADOOP_JOBCLIENT_JAR`` environment variable.
-
-Also, make sure the ``JAVA_HOME`` environment variable is set.
-
-.. note:: Different Hadoop distributions use different protocol versions. Snakebite 1.3.x
-   and the tests default to version 7 (CDH 4.1.3).
-   Snakebite 2.x **ONLY** supports Hadoop > 2.2.0 (protocol version >9, e.g. HDP2.0/CDH5)! I
-   If you want to test with different protocol versions, set the ``HADOOP_PROTOCOL_VER``
-   environment variable to the apropriate version number.
-
-
-.. note:: A hadoop installation is only required for testing.
-
-TODO
-====
-* Only supports Auth method SIMPLE. We might want to have SASL or KERBEROS as well
-* More tests.
-* Return correct exit codes from cli client.
-* Imrove speed of CRC verification. 
-* Improve methods:
-    * [-rm [-f] [-r|-R] [-skipTrash] <src> ...] (implement -f)
-
-* Implement more methods (those need interaction with DataNodes):
-    * [-expunge]
-    * put [paths] dst                copy sources from local file system to destination
-
-
-
 LICENSE
 =======
 Copyright (c) 2013 - 2014 Spotify AB

diff --git a/doc/source/minicluster.rst b/doc/source/minicluster.rst
@@ -1,6 +1,7 @@
+***********
 Minicluster
-===========
+***********
 .. automodule:: minicluster
 
 .. autoclass:: MiniCluster
-    :members:
+    :members: