Best Practices for Reproducible Research

Session 7

Ingmar Steiner

2017-06-21

Dependencies

Concepts

  • A project can depend on external data, software library, etc.

  • Such dependencies can be declared, and managed by the build tool.

  • Dependencies are resolved from repositories, and retrieved for use.

  • Dependencies can declare dependencies of their own; these become transitive dependencies of the project.

Metadata

Dependencies are uniquely identified by:

name
required
group
“organization” releasing the dependency; by convention, in reverse domain-name notation
version
important to distinguish different revisions
extension (default: jar)
packaging or file format (Jar for java objects, Zip or Tarball for generic packages, etc.)
classifier (optional)
auxiliary dependencies (source code, documentation) as applicable
Shorthand notation
group:name:version:classifier@ext

Dependency types

We distinguish

  • software dependencies
  • buildscript dependencies
  • data dependencies

Software Dependencies

Java project dependencies

build.gradle

apply plugin: 'application'
repositories.jcenter()
dependencies.compile 'org.apache.commons:commons-lang3:3.6'
mainClassName = 'OSDetector'

src/main/java/OSDetector.java

public class OSDetector {
  public static void main(String[] args) {
    System.out.println(org.apache.commons.lang3.SystemUtils.OS_NAME);
  }
}

Buildscript Dependencies

buildscript block

  • Since the Gradle DSL is actually Groovy/Java, the build script can declare dependencies and use them in tasks, beyond the built-in powers of Groovy!

build.gradle

buildscript {
  repositories.jcenter()
  dependencies.classpath 'org.apache.commons:commons-lang3:3.6'
}

task osName {
  doLast {
    println org.apache.commons.lang3.SystemUtils.OS_NAME
  }
}

buildSrc build script

  • The build output of a buildSrc subproject will be put on the main buildscript classpath:

buildSrc/build.gradle

repositories.jcenter()
dependencies.runtime 'org.apache.commons:commons-lang3:3.6'

build.gradle

task osName {
  doLast {
    println org.apache.commons.lang3.SystemUtils.OS_NAME
  }
}

Example: Convert CSV to JSON

buildSrc/build.gradle

repositories.jcenter()
dependencies.compile 'com.xlson.groovycsv:groovycsv:1.0'

buildSrc/src/main/groovy/Csv2Json.groovy

import com.xlson.groovycsv.CsvParser
import groovy.json.JsonBuilder
import org.gradle.api.DefaultTask
import org.gradle.api.tasks.*

class Csv2Json extends DefaultTask {
  @InputFile
  File csvFile

  @OutputFile
  File jsonFile

  @TaskAction
  void convert() {
    def reader = csvFile.newReader()
    def data = CsvParser.parseCsv(reader).collect { row ->
      [name : row.Name,
       eyes : row.Eyes as int,
       short: row.Height ==~ /(?i)short/]
    }
    def json = new JsonBuilder(data)
    jsonFile.text = json.toPrettyString()
  }
}

build.gradle

task convertCsvToJson(type: Csv2Json) {
  csvFile = file('minions.csv')
  jsonFile = file("$buildDir/minions.json")
}

Example in action

minions.csv
Name Email Height Eyes
Bob bob@minions.com Short 2
Kevin kevin@minions.com tall 2
Stuart stuart@minions.com short 1

Example: YAML-based dynamic tasks

build.gradle

buildscript {
  repositories.jcenter()
  dependencies.classpath 'org.yaml:snakeyaml:1.18'
}

import org.yaml.snakeyaml.Yaml

task describeAll

def reader = file('minions.yaml').newReader()
def yaml = new Yaml().load(reader)
yaml.each { minion ->
  task("describe$minion.name") {
    describeAll.dependsOn it
    doLast {
      println "$minion.eyes-eyed $minion.name is ${minion.short ? 'short' : 'tall'}."
    }
  }
}

minions.yaml

- name: Bob
  eyes: 2
  short: true
- name: Kevin
  eyes: 2
  short: false
- name: Stuart
  eyes: 1
  short: true

Repositories

Local repositories

  • flat directories
  • mounted network shares
  • mavenLocal()

Example:

repositories {
  flatDir dirs: '/Volumes/mounted network drive/repository'
}

Remote repositories (Maven)

  • Resolves from mavenCentral() or custom XML:
<repositories>
  <repository>
    <url>https://jcenter.bintray.com</url>
  </repository>
</repositories>

<depencencies>
  <dependency>
    <groupId>my.group</groupId>
    <artifactId>my-name</artifactId>
    <version>0.1</version>
  </dependency>
</depencencies>

Remote repositories (Ivy)

  • Maven-like superpowers for Ant, Groovy, etc.
  • Resolves from any repository with flexible layout:

Best of all worlds (Gradle)

build.gradle

repositories {
  // local Maven cache
  mavenLocal()
  // JCenter, subsumes Maven Central
  jcenter()
  // Random other hosted Maven repository
  maven {
    url 'https://oss.jfrog.org/artifactory/repo'
  }
  // Random web server with Maven-compatible file tree
  ivy {
    url 'http://group.my/random/subdirectory'
    layout 'maven'
  }
  // some local subdirectory with Jar files
  flatDir {
    dirs "$rootDir/libs"
  }
  // Custom Ivy repo for manually installed Grapes
  ivy {
    url "${System.properties['user.home']}/.groovy/grapes"
    layout 'pattern', {
      artifact '[organisation]/[module]/[type]s/[artifact]-[revision].[ext]'
      ivy '[organisation]/[module]/ivy-[revision].xml'
    }
  }
}

Data Dependencies

Download data (TIMTOWTDI)

Retrieve file from URL via

  • Groovy stream

    new File('data.zip').withOutputStream { out ->
      out << new URL('http://group.my/random/data.zip').openStream()
    }
  • Groovy process

    'wget http://group.my/random/data.zip'.execute()
  • Gradle Exec spec

    exec {
      commandLine 'wget', 'http://group.my/random/data.zip'
    }
  • Gradle Download plugin

    plugins {
      id 'de.undercouch.download' version '3.2.0'
    }
    download {
      src 'http://group.my/random/data.zip'
      dest projectDir
    }

Declare data as dependency

  • Custom Ivy repository, configuration, dependency metadata
// base plugin
apply plugin: 'base'

// custom repository
repositories {
  ivy {
    url 'https://github.com/GITenberg'
    layout 'pattern', {
      artifact '[module]/archive/master.[ext]'
    }
  }
}

// custom configuration
configurations {
  data
}

// declare dependency for custom configuration
dependencies {
  data group: 'org.gutenberg', name: 'Warlord-of-Kor_17958', ext: 'zip', changing: true
}

// task to copy data into project directory
task getData(type: Copy) {
  from configurations.data
  into projectDir
}
Advantages
  • Caching
  • Metadata

Publishing

Publications

Dependencies
project “inputs”, resolved from repositories
Publications
project “outputs”, published to repositories

settings.gradle

rootProject.name = 'my-data'

gradle.properties

group=my.group
version=0.1

build.gradle

plugins {
  id 'distribution'
  id 'maven-publish'
}

// configure contents to distribute
distributions {
  main {
    contents {
      from 'data'
      include '*.json'
    }
  }
}

publishing {
  // where to publish
  repositories {
    maven {
      url "$buildDir/repo"
    }
  }
  // what to publish
  publications {
    data(MavenPublication) {
      artifact distZip
    }
  }
}

Publishes my.group:my-data:0.1@zip

Publishing in action

Versioning

Cross-project dependencies

Meta-project

  • Project my-fnord depends on project my-data

  • Nest in meta-project (via Git Submodules etc.)

settings.gradle

include 'my-fnord', 'my-data'

build.gradle

project(':my-fnord') {
  dependencies {
    data project(':my-data')
  }
}

Complexity creep!

Share via local repository

  • my-data/build.gradle

    apply plugin: 'maven-publish'

    Run gradle publishToMavenLocal

  • my-fnord/build.gradle

    repositories {
      mavenLocal()
    }
    dependencies {
      data 'my.group:my-data:0.1@zip'
    }

Manual publish/resolve cycle!

Composite build

  • my-fnord/build.gradle

    dependencies {
      data 'my.group:my-data:0.1@zip'
    }

    Run with gradle --include-build=../my-data

Agile FTW!

Next

Upcoming topics

  • End-to-end workflows
  • Build cache

Questions?