Best Practices for Reproducible Research

Session 4

Ingmar Steiner

2017-05-24

Build Lifecycle

Build, rinse, repeat

Developing and building software is an iterative process.

Write code
Compile code
Test code
Go to 1.

Eventually, the software is ready to use (according to some specification), and can be “shipped”, i.e., released.

Round two

Inevitably, more development work needs to be done, due to:

Unexpected failures
- Bugs in the code
- Bugs in the data
- Bugs in the system
Changed specifications
etc.

A new version is developed and released.

Consumer perspective

Identify need for software
Obtain software, version 1
Use software
Discover bugs
Report bugs
Wait for version 2

Developer perspective

Release version 1
Receive bug report, etc.
Build lifecycle
1. Write code
2. Compile code
3. Test code
4. Go to 1.
Release version 2

SCM perspective

Commit new code
Tag release version 1
Create bugfix branch
1. Write test code to reproduce bug
2. Write code until test passes
Merge bugfix branch
Tag release version 2

Open-source software (OSS) models can blur the boundaries between these perspectives.

Build management

Specialized build tools can help developers automate this build/test/release lifecycle
They should produce reproducible builds given the same source code
SCM hooks can further automate the lifecycle (for continuous integration testing)

But it’s a good idea to manage build tools and source code separately

Software build tools

Build tools specialized for various programming languages automate or simplify the build/test/release lifecycle for those languages… or in general!

Java: Ant, Maven, Gradle
Python: SCons, Waf, PyBuilder
Ruby: Rake
JavaScript: Grunt, Gulp
C/C++: GNU Make (The Original Build Tool.™)

Build tool wishlist

Fast
Efficient
Easy to use
Cross-platform
Minimal requirements
Flexible
Automate build/test/release lifecycle (duh!)

Build Tool Examples

Build lifecycle

Remember:

Build
Test
Release

These are just tasks.

We might as well just

Do this
Do that
Do something else

Build script…?

A build script describes the tasks which must be performed to build a project.

A “build script” could just be a README file:

The user would need to manually follow the instructions.

Build shell scripts

A “build script” could also be an actual shell script:

build.sh

#!/bin/sh

do_this() { echo "doing this"; }

do_that() { echo "doing that"; }

do_something_else() { echo "doing something else"; }

# main
do_this
do_that
do_something_else

Build script automation

“Real” build scripts can be parsed by build tools for automatic build execution.
This requires a specific format/syntax (depending on the tool)

GNU Make

The GNU implementation of Make
First released in 1976
Still widely used
Custom build script language
Builds anything (traditionally C/C++) via shell commands

`Makefile`

do_this:
    @echo 'doing this'

do_that:
    @echo 'doing that'

do_something_else:
    @echo 'doing something else'

Makefile basics

GNU Make build scripts (Makefiles) define a number of rules (i.e., build tasks).

target: prerequisite
    recipe

Running make target will run recipe
If prerequisite is defined, make prerequisite will be run first

Pitfall: The leading whitespace before recipe must be an actual tab!

Task dependencies

Prerequisites establish relations between rules; task dependencies can be represented as a directed acyclic graph (DAG)
If make is run with no explicit rule argument, the first one is invoked.

Makefile

do_something_else: do_that
    @echo 'doing something else'

do_that: do_this
    @echo 'doing that'

do_this:
    @echo 'doing this'

Task outputs

Rules are normally used to create file targets.

somethingelse: that
    touch somethingelse

that: this
    touch that

this:
    touch this

clean:
    @rm -f this that somethingelse

File modification timestamps are used to determine which rules are up-to-date.

Apache Ant

First released in 2000
Runs on Java
XML-based build scripts
Builds Java, extensible to build anything

`build.xml`

<project default="something else">
    <target name="this">
        <echo message="doing this"/>
    </target>
    <target name="that" depends="this">
        <echo message="doing that"/>
    </target>
    <target name="something else" depends="that">
        <echo message="doing something else"/>
    </target>
</project>

Rake

First released in 2003
“Ruby port” of Make
Runs on Ruby
Builds anything via Ruby-based build scripts

`Rakefile`

task :this do
    puts "doing this"
end

task :that => :this do
    puts "doing that"
end

task :something_else => :that do
    puts "doing something else"
end

task :default => :something_else

Gradle

First released in 2007
Runs on Java
Groovy-based build scripts with custom DSL
Builds Java, C/C++, Android, anything else
Extensible via plugins

`build.gradle`

defaultTasks "something_else"

task 'this' << {
    println "doing this"
}

task that(dependsOn: 'this') << {
    println "doing that"
}

task something_else(dependsOn: 'that') << {
    println "doing something else"
}

PyBuilder

First released in 2011
Inspired by Ant, Maven, Gradle
Python-based build scripts
Builds Python, anything else
Extensible via plugins

`build.py`

from pybuilder.core import task, depends

@task
def this():
    print "doing this"

@task
@depends("this")
def that():
    print "doing that"

@task
@depends("that")
def something_else():
    print "doing something else"

default_task = "something_else"

Conclusion

Build tools can emulate Make via shell execution features
But leveraging their “native”, object-oriented language makes them
- more efficient
- more powerful
- cross-platform
Modular build logic (e.g., plugins) can be externalized and re-used

Build Tools for Research

Research differs from software engineering…

Typical workflow:

Get data
Convert data
Run experiments
Collect results

But these are also just tasks!

Mixing external tools and custom scripts is common

Adding tests is a good idea!

Real-world example

Bob, Kevin, and Stuart want to analyze word distribution for books from Project Gutenberg
This time, they version the build script in SCM – not the data

Tasks include:

Download text resources
Strip formatting
Convert to lower case
Count word frequencies
Generate a barplot for the 20 most frequent words

`Makefile`

plot.svg: data_words.txt
    @gnuplot -e '\
    set terminal svg;\
    set output "plot.svg";\
    set size ratio 0.5;\
    set boxwidth 0.5;\
    set style fill solid;\
    plot "data_words.txt" using 1:xtic(2) with boxes'

data_words.txt: data_lower.txt
    @perl -ne '\
    @w = split /[^a-z]+/;\
    foreach (@w) {\
      $$w{$$_}++;\
    }\
    END {\
      foreach (keys(%w)) {\
        printf "%d\t%s\n", $$w{$$_}, $$_\
      }\
    }' < data_lower.txt | sort -nr | head -n 20 > data_words.txt

data_lower.txt: data_stripped.txt
    @tr '[:upper:]' '[:lower:]' < data_stripped.txt > data_lower.txt

data_stripped.txt: data.txt
    @perl -pe 's/_(.+?)_/$1/g' < data.txt > data_stripped.txt

data.txt:
    @wget 'http://aleph.gutenberg.org/1/7/9/5/17958/17958-8.zip' -O - | funzip | recode latin1..utf8 > data.txt

Assignment

Do this with another build tool (Ant, Rake, Gradle, PyBuilder, etc.)
Leverage build script language if possible (Ruby, Groovy, Python, respectively)
Plot using any OSS framework (e.g., R, GNU Octave, matplotlib, etc.)
Write a brief README
Version all code with SCM

Up next

Emacs (guest star: Sébastien Le Maguer)

Note: Alternate location!

Best Practices for Reproducible Research

Build Lifecycle

Build, rinse, repeat

Round two

Consumer perspective

Developer perspective

SCM perspective

Build management

Software build tools

Build tool wishlist

Build Tool Examples

Build lifecycle

Build script…?

Build shell scripts

Build script automation

GNU Make

Makefile

Makefile basics

Task dependencies

Task outputs

Apache Ant

build.xml

Rake

Rakefile

Gradle

build.gradle

PyBuilder

build.py

Conclusion

Build Tools for Research

Research differs from software engineering…

Real-world example

Makefile

Assignment

Up next

Questions?

`Makefile`

`build.xml`

`Rakefile`

`build.gradle`

`build.py`

`Makefile`