Best Practices for Reproducible Research (SS 2017)

Description

This project seminar will introduce students to techniques and workflows for streamlining research projects in ways that increase productivity, enhance collaboration, and promote portability and reproducibility.

Fundamentally, best practices from software engineering are transferred to research problems in speech and language processing. But they are also adapted to challenges specific to natural language research, including corpus management, analysis pipelines, and report generation.

Over the course of the seminar, participants will learn about topics including

  • Build automation
  • Source code management (SCM)
  • Remote repositories, dependency resolution, and artifact publishing
  • Data wrangling
  • Automated testing and continuous integration
  • Literate programming and document generation

To receive credit, participants will need to complete regular assignments, and submit a final written report.

Prerequisites

This seminar has no formal requirements, but experience with object-oriented programming (Java, Python, etc.), SCM (Git, etc.), LaTeX, and Linux (particularly shell interaction) will be invaluable.

Sessions

The project seminar takes place in building C7.1, room U15, on Wednesdays, 8:30 to 10:00.

2017-04-26

Slides

Assignment

Play around with JFortune on GitHub

2017-05-03

Slides

Assignment

  • Recreate the “real-world example” using any distributed SCM (except Git).
  • Then submit a short written report (PDF format) and present your experience in the next session.
  • Bonus points if you manage the process of writing the report using SCM!

2017-05-10

Canceled!

2017-05-17

Slides

2017-05-24

Slides

2017-05-31

Slides

2017-06-07

Canceled!

2017-06-14

Slides

2017-06-21

Slides

2017-06-28

Slides

Assignment

Based on https://bitbucket.org/psibre/bestpract-flaml:

  1. Develop plugin which adds build logic to any project
  2. Resolve specified FLAC+YAML file pair as data dependencies
  3. Extract utterances from YAML to text and label files
  4. Extract utterance audio from FLAC to wav files

2017-07-05

Canceled!

2017-07-12

Slides

2017-07-19

2017-07-26

Canceled!