My IPython-powered semi-automatic git workflow

2013-12-30 16:41 English | Código fuente | Minimap

This is the last post of this year, so I try to do my best to give you something interesting to think about...

In this case, I will show you my git workflow... and you know there are a lot of workflows out there... and probably better than mine, but I just want to share with you the place where I find myself comfortable.

And yes... my git workflow is also powered by IPython (I am very repetitive when I love a project!). And it is a semi-automatic one, using the IPython notebooks (ipynbs) as a sort of templates, transforming them into a new conceptual entity: the ipytmpl (and yes, I love to invent names too!).

Because my workflow have essentially two cycles, in this post, I will show you the general set up of the environment and the first Short cycle, leaving the second Extended cycle (and other details) for other post (after the new year, of course).

I will also show you my workflow with a real PR (pull-request) to the IPython project.

Are you ready? Here we go...

Get everything ready...¶

First, we need to set up the environment to make our work:

Check the current working directory:

In [1]:

%pwd

Out[1]:

u'/media/datos/Desarrollos'

Make a new folder to isolate our work and cd into it:

In [2]:

!mkdir devel_example

In [3]:

%cd devel_example/

/media/datos/Desarrollos/devel_example

NOTE: You can avoid these steps opening the notebook in the proper directory, but in this case I want to have the example isolated to not overwrite my current development environment.

Load variables with useful information:

In [4]:

project_name = "ipython"
project_remote = "git://github.com/ipython/ipython.git"
project_remote_name = "origin"
my_fork_remote = "git@github.com:damianavila/ipython.git"
my_fork_remote_name = "damianavila"

Clone the project and connect the local repo with my Github fork:

In [5]:

# Get a read-only copy of the project
!git clone $project_remote

# cd into the local project folder
%cd $project_name

# Link the local repo with my Github fork
!git remote add $my_fork_remote_name $my_fork_remote

# Check remotes
!git remote -v

Clonar en «ipython»...
remote: Reusing existing pack: 96757, done.
remote: Counting objects: 82, done.
remote: Compressing objects: 100% (82/82), done.
remote: Total 96839 (delta 5), reused 31 (delta 0)
Receiving objects: 100% (96839/96839), 40.92 MiB | 726 KiB/s, done.
Resolving deltas: 100% (70554/70554), done.
/media/datos/Desarrollos/devel_example/ipython
damianavila	git@github.com:damianavila/ipython.git (fetch)
damianavila	git@github.com:damianavila/ipython.git (push)
origin	git://github.com/ipython/ipython.git (fetch)
origin	git://github.com/ipython/ipython.git (push)

NOTE: A lot of git workflows use origin to point to our fork and upstream to point to the project repo. But I do not like that configuration. It seems more natural to me to clone the project repo (the origin repo) and add a connection to my fork called damianavila... and the next steps take into consideration this last approach.

Short cycle¶

This short cycle just create a new branch to work on, make the needed changes in the source code and upload the local changes to our Github fork to finally submit a pull-request:

Set up the master and development branch names:

In [6]:

master_branch = "master"
feature_branch = "doc_post_serve"

Create a new branch from master:

In [7]:

# Make sure we are in master branch
!git checkout $master_branch

# Pull the changes from origin/master
!git pull $project_remote_name

# Start a new branch to work on
!git checkout -b $feature_branch

# Check where we are
!git status

Ya está en «master»
Already up-to-date.
Switched to a new branch 'doc_post_serve'
# En la rama doc_post_serve
nothing to commit, working directory clean

Make the changes you want to do:

NOTE: In this example, I will update the IPython docs about some details using the IPython slides and the post-serve post-processor (IPython.nbconvert).

In [9]:

# list the files structure to find the needed files
%ls

CONTRIBUTING.md  examples/   MANIFEST.in  setupbase.py  setup.py*
COPYING.txt      git-hooks/  README.rst   setupegg.py*  tools/
docs/            IPython/    scripts/     setupext/     tox.ini

In [10]:

%load docs/source/interactive/nbconvert.rst
# After executing %load, a new cell containing the source code will be added.
# Be sure to add the next line (with the proper path) to overwrite the file
# with you changes.
#
# %%writefile docs/source/interactive/nbconvert.rst

In [11]:

%%writefile docs/source/interactive/nbconvert.rst
.. _nbconvert:

Converting notebooks to other formats
=====================================

Newly added in the 1.0 release of IPython is the ``nbconvert`` tool, which 
allows you to convert an ``.ipynb`` notebook document file into various static 
formats. 

Currently, ``nbconvert`` is provided as a command line tool, run as a script 
using IPython. A direct export capability from within the 
IPython Notebook web app is planned. 

The command-line syntax to run the ``nbconvert`` script is::

  $ ipython nbconvert --to FORMAT notebook.ipynb

This will convert the IPython document file ``notebook.ipynb`` into the output 
format given by the ``FORMAT`` string.

The default output format is html, for which the ``--to`` argument may be 
omitted::
  
  $ ipython nbconvert notebook.ipynb

IPython provides a few templates for some output formats, and these can be
specified via an additional ``--template`` argument.

The currently supported export formats are:

* ``--to html``

  - ``--template full`` (default)
  
    A full static HTML render of the notebook.
    This looks very similar to the interactive view.

  - ``--template basic``
  
    Simplified HTML, useful for embedding in webpages, blogs, etc.
    This excludes HTML headers.

* ``--to latex``

  Latex export.  This generates ``NOTEBOOK_NAME.tex`` file,
  ready for export.  You can automatically run latex on it to generate a PDF
  by adding ``--post PDF``.
  
  - ``--template article`` (default)
  
    Latex article, derived from Sphinx's howto template.

  - ``--template book``
  
    Latex book, derived from Sphinx's manual template.

  - ``--template basic``
  
    Very basic latex output - mainly meant as a starting point for custom templates.

* ``--to slides``

  This generates a Reveal.js HTML slideshow.
  It must be served by an HTTP server. The easiest way to do this is adding
  ``--post serve`` on the command-line. The ``--post serve`` post-processor 
  proxies Reveal.js requests to a CDN if no local Reveal.js library is present. 
  For low connectivity environments, just place the Reveal.js library in the 
  same directory where your_talk.slides.html is located or point to another 
  directory using the ``--reveal-prefix`` alias.

* ``--to markdown``

  Simple markdown output.  Markdown cells are unaffected,
  and code cells are placed in triple-backtick (```````) blocks.

* ``--to rst``

  Basic reStructuredText output. Useful as a starting point for embedding notebooks
  in Sphinx docs.

* ``--to python``

  Convert a notebook to an executable Python script.
  This is the simplest way to get a Python script out of a notebook.
  If there were any magics in the notebook, this may only be executable from
  an IPython session.
  
.. note::

  nbconvert uses pandoc_ to convert between various markup languages,
  so pandoc is a dependency of most nbconvert transforms,
  excluding Markdown and Python.

.. _pandoc: http://johnmacfarlane.net/pandoc/

The output file created by ``nbconvert`` will have the same base name as
the notebook and will be placed in the current working directory. Any
supporting files (graphics, etc) will be placed in a new directory with the
same base name as the notebook, suffixed with ``_files``::

  $ ipython nbconvert notebook.ipynb
  $ ls
  notebook.ipynb   notebook.html    notebook_files/

For simple single-file output, such as html, markdown, etc.,
the output may be sent to standard output with::
    
  $ ipython nbconvert --to markdown notebook.ipynb --stdout
    
Multiple notebooks can be specified from the command line::
    
  $ ipython nbconvert notebook*.ipynb
  $ ipython nbconvert notebook1.ipynb notebook2.ipynb
    
or via a list in a configuration file, say ``mycfg.py``, containing the text::

  c = get_config()
  c.NbConvertApp.notebooks = ["notebook1.ipynb", "notebook2.ipynb"]

and using the command::

  $ ipython nbconvert --config mycfg.py


.. _notebook_format:

LaTeX citations
---------------

``nbconvert`` now has support for LaTeX citations. With this capability you
can:

* Manage citations using BibTeX.
* Cite those citations in Markdown cells using HTML data attributes.
* Have ``nbconvert`` generate proper LaTeX citations and run BibTeX.

For an example of how this works, please see the citations example in
the nbconvert-examples_ repository.

.. _nbconvert-examples: https://github.com/ipython/nbconvert-examples

Notebook JSON file format
-------------------------

Notebook documents are JSON files with an ``.ipynb`` extension, formatted
as legibly as possible with minimal extra indentation and cell content broken
across lines to make them reasonably friendly to use in version-control
workflows.  You should be very careful if you ever manually edit this JSON
data, as it is extremely easy to corrupt its internal structure and make the
file impossible to load.  In general, you should consider the notebook as a
file meant only to be edited by the IPython Notebook app itself, not for 
hand-editing.

.. note::

     Binary data such as figures are also saved directly in the JSON file.  
     This provides convenient single-file portability, but means that the 
     files can be large; a ``diff`` of binary data is also not very 
     meaningful.  Since the binary blobs are encoded in a single line, they 
     affect only one line of the ``diff`` output, but they are typically very 
     long lines.  You can use the ``Cell | All Output | Clear`` menu option to 
     remove all output from a notebook prior to committing it to version 
     control, if this is a concern.

The notebook server can also generate a pure Python version of your notebook, 
using the ``File | Download as`` menu option. The resulting ``.py`` file will 
contain all the code cells from your notebook verbatim, and all Markdown cells 
prepended with a comment marker.  The separation between code and Markdown
cells is indicated with special comments and there is a header indicating the
format version.  All output is removed when exporting to Python.

As an example, consider a simple notebook called ``simple.ipynb`` which 
contains one Markdown cell, with the content ``The simplest notebook.``, one 
code input cell with the content ``print "Hello, IPython!"``, and the 
corresponding output.

The contents of the notebook document ``simple.ipynb`` is the following JSON 
container::

  {
   "metadata": {
    "name": "simple"
   },
   "nbformat": 3,
   "nbformat_minor": 0,
   "worksheets": [
    {
     "cells": [
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": "The simplest notebook."
      },
      {
       "cell_type": "code",
       "collapsed": false,
       "input": "print \"Hello, IPython\"",
       "language": "python",
       "metadata": {},
       "outputs": [
        {
         "output_type": "stream",
         "stream": "stdout",
         "text": "Hello, IPython\n"
        }
       ],
       "prompt_number": 1
      }
     ],
     "metadata": {}
    }
   ]
  }


The corresponding Python script is::

  # -*- coding: utf-8 -*-
  # <nbformat>3.0</nbformat>

  # <markdowncell>

  # The simplest notebook.

  # <codecell>

  print "Hello, IPython"

Note that indeed the output of the code cell, which is present in the JSON 
container, has been removed in the ``.py`` script.

Overwriting docs/source/interactive/nbconvert.rst

Check the status and diff of your modifications:

In [12]:

# Check status
!git status

# En la rama doc_post_serve
# Cambios no preparados para el commit:
#   (use «git add <archivo>...» para actualizar lo que se ejecutará)
#   (use «git checkout -- <archivo>...« para descartar cambios en le directorio de trabajo)
#
#	modificado:   docs/source/interactive/nbconvert.rst
#
no hay cambios agregados al commit (use «git add» o «git commit -a»)

In [13]:

# See the diff
!git diff

diff --git a/docs/source/interactive/nbconvert.rst b/docs/source/interactive/nbconvert.rst
index 1789a62..610edf0 100644
--- a/docs/source/interactive/nbconvert.rst
+++ b/docs/source/interactive/nbconvert.rst
@@ -61,8 +61,12 @@ The currently supported export formats are:
 * ``--to slides``
 
   This generates a Reveal.js HTML slideshow.
-  It must be served by an HTTP server.  The easiest way to do this is adding
-  ``--post serve`` on the command-line.
+  It must be served by an HTTP server. The easiest way to do this is adding
+  ``--post serve`` on the command-line. The ``--post serve`` post-processor 
+  proxies Reveal.js requests to a CDN if no local Reveal.js library is present. 
+  For low connectivity environments, just place the Reveal.js library in the 
+  same directory where your_talk.slides.html is located or point to another 
+  directory using the ``--reveal-prefix`` alias.
 
 * ``--to markdown``
 
@@ -224,4 +228,3 @@ The corresponding Python script is::
 
 Note that indeed the output of the code cell, which is present in the JSON 
 container, has been removed in the ``.py`` script.
-

Add the changes an commit them:

In [14]:

# Add the modified files to the stage
!git add .

In [15]:

# And do your commit
!git commit -am "Added --post-serve explanation into the nbconvert docs."

[doc_post_serve c87ac2f] Added --post-serve explanation into the nbconvert docs.
 1 file changed, 6 insertions(+), 3 deletions(-)

Finally, push your local development branch to your Github fork:

In [16]:

# Push updates from your local branch to your github branch
!git push $my_fork_remote_name $feature_branch

Counting objects: 8732, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2767/2767), done.
Writing objects: 100% (7842/7842), 1.44 MiB, done.
Total 7842 (delta 5520), reused 7275 (delta 4971)
To git@github.com:damianavila/ipython.git
 * [new branch]      doc_post_serve -> doc_post_serve

NOTE: The merging of your Github development branch into the master is done via pull-request on the Github website. For reference, you can see the proposed PR here: https://github.com/ipython/ipython/pull/4751

As you can see, this workflow is very simple... and with the aid of this ipytmpl is easier than before (before = making the same but in your traditional console).

You set up the environment, fill the variables to use a posteriori, and you have only to be concern about the changes you want to introduce (or remove) from the source code. All the other steps, all those git calls are predetermined and will be called whereas you advance in the workflow...

After making the PR at the Github website, you will receive some feedback and if you have to modified something, just start the short cycle again... Sometimes you will need more... I mean, because you are working in a communitary project, if somebody changes the same file as you, there will be some conflicts at the merge step, so it will be necessary to rebase the "thing". But this is the central idea of the second Extended cycle which I will describe you in a second part of this post.

As always, I am waiting for your comments and critics!

OK, too long... I hope you did not get bored!

Have a nice New Year! And I see you in 2014 ;-)

Cheers.

Damián

Did you like the content? Great!

Or visit my support page for more information.

Btw, don't forget this blog post is an ipynb file itself! So, you can download it from the "Source" link at the top of the post if you want to play with it ;-)