Operations Research

Logo

A site for hosting software and data repositories associated with papers appearing in the journal _Operations Research_

View source on GitHub

Replication Tutorial

This tutorial was presented at the Workshop on Replication at INFORMS Pubs on October 25, 2025. It covers the basic principles of making experiments replicable and the process of putting together a high-quality replication package. It is targeted at authors submitting publications to journals published by INFORMS Pubs. The following is the basic content.

Table of Contents

Overview

Levels of Reproducibility

  1. Code only: Not reproducible (dependencies change)
  2. Code + dependency list: Somewhat reproducible (versions drift)
  3. Code + lock file: Good reproducibility (specific versions)
  4. Code + lock file + container: Very reproducible (includes OS)
  5. Code + Nix/Guix: Bit-for-bit reproducible (all dependencies pinned)

Version control

See more details here

  1. Code
  2. Dependency files
  3. Container definitions (Dockerfile)
  4. Environment specs
  5. Documentation

Batch Processing

See more details details here.

  1. Use scripting to automate workflows.

     #!/bin/bash
     # run_experiments.sh
    
     # Create results directory
     mkdir -p results
    
     for lr in 0.001 0.01 0.1; do
         echo "Running with lr=$lr"
        
         # Redirect output to log file
         python experiment.py --lr $lr \
             > results/experiment_lr${lr}.log 2>&1
        
         # Check exit status
         if [ $? -eq 0 ]; then
             echo "✓ Success: lr=$lr"
         else
             echo "✗ Failed: lr=$lr"
         fi
     done
    
     echo "All experiments complete!"
    
  2. Track experiment with unique IDs.

     import uuid
     from datetime import datetime
    
     # Timestamp-based
     exp_id = f"exp_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
    
     # UUID-based
     exp_id = f"exp_{uuid.uuid4().hex[:8]}"
    
     # Parameter-based
     exp_id = f"lr{lr}_bs{bs}_seed{seed}"
    
  3. Keep logs of all procedures.

     import logging
     from pathlib import Path
    
     def setup_logging(exp_dir):
         """Setup logging for experiment"""
         log_file = exp_dir / 'experiment.log'
        
         logging.basicConfig(
             level=logging.INFO,
             format='%(asctime)s - %(levelname)s - %(message)s',
             handlers=[
                 logging.FileHandler(log_file),
                 logging.StreamHandler()
             ]
         )
    
     # In experiment script
     setup_logging(exp_dir)
     logging.info(f"Starting experiment with params: {params}")
    
  4. Be sure to handle errors well to avoid missing data and undetected errors.

     def safe_run_experiment(params):
         """Run experiment with error handling"""
         try:
             result = run_experiment(params)
             return {'success': True, 'result': result}
         except Exception as e:
             logging.error(f"Experiment failed: {e}")
             return {'success': False, 'error': str(e)}
    
  5. Track required resources for specifying requirements.

     import psutil
     import time
    
     def monitor_resources(interval=60):
         """Monitor CPU and memory usage"""
         while True:
             cpu = psutil.cpu_percent(interval=1)
             mem = psutil.virtual_memory().percent
             logging.info(f"CPU: {cpu}%, Memory: {mem}%")
             time.sleep(interval)
    
     # Run in background thread
     import threading
     monitor_thread = threading.Thread(target=monitor_resources, daemon=True)
     monitor_thread.start()
    
  6. Keep results organized.

     experiments/
     ├── exp_001_20250101_120000/
     │   ├── params.json
     │   ├── metrics.json
     │   ├── model.pkl
     │   ├── logs/
     │   │   └── training.log
     │   └── plots/
     │       └── learning_curve.png
     ├── exp_002_20250101_130000/
     │   └── ...
     └── summary.csv
    

Build and Dependency Management

See more details here.

The goal of build and dependency management is to allow reviewers and other users to build (if required) and install your code, as well as replicating the environment in which the exeriments were performed as closely as possible (same version of dependencies). This is typically done using language-specific tools.

  1. Always Pin Versions

    Don’t:

    requests
    numpy
    boost
    

    Do:

    requests==2.31.0
    numpy>=1.24.0,<2.0.0
    boost~=1.82.0
    

    Why: Unpinned versions lead to “works on my machine” problems as dependencies update.

  2. Use Lock Files

    • Python: requirements.txt + poetry.lock or Pipfile.lock
    • Julia: Manifest.toml (auto-generated)
    • JavaScript: package-lock.json or yarn.lock
    • R: renv.lock
    • Rust: Cargo.lock

    Rule: Commit lock files to version control for applications, optional for libraries.

  3. Separate Direct and Transitive Dependencies

    Direct dependencies (what you import):

    # pyproject.toml
    [project]
    dependencies = [
        "requests>=2.28.0",
        "pandas>=2.0.0"
    ]
    

    Transitive dependencies (dependencies of dependencies):

    # Captured in lock file automatically
    urllib3==2.0.7  # dependency of requests
    numpy==1.26.0   # dependency of pandas
    
  4. Isolate Environments Per Project

    Never:

    • Install everything globally
    • Share environments between projects
    • Use system Python/R/Node for development

    Always:

    • Use project-specific environments (venv, conda, renv, Pkg)
    • One environment per project
    • Keep environments reproducible with lock files

Using Isolated Environments

See more details here.

The most reliable way to ensure replicability is to provide the reviewer with a way to construct an isolated environment where all dependencies can be automatically installed and the experiments run in a “clean” environment. This also makes it easy for authors to test their own replication package.

  1. Development: Language-specific tool (conda, renv, Pkg)
  2. Sharing: Docker or Jupyter Notebook
  3. Publishing: Docker + code on GitHub/Zenodo
  4. Optional: Nix for maximum reproducibility

Compiling Tables and Figures

See more details here.

Use scripted workflows (Python/R) for reproducibility. Export to LaTeX for tables, PDF for figures. Automate with Make or scripts. Never create tables/figures manually.

General Principles

  1. Separate data from presentation
    • Raw data → Processing → Tables/Figures
    • Never modify raw data files
  2. Version control everything
    • Scripts for generating tables/figures
    • Not the generated files themselves (usually)
    • Exception: Small final PDFs for paper
  3. Reproducible from raw data
    • Single command to regenerate all materials
    • Document dependencies and versions
  4. Use consistent formatting
    • Same font sizes across figures
    • Consistent color schemes
    • Matching decimal places in tables
  5. Follow journal guidelines
    • File formats (PDF, EPS, TIFF)
    • Resolution requirements (300+ DPI)
    • Color vs. grayscale
    • Maximum file sizes

Table Best Practices

Do:

Don’t:

Example good table:

\begin{table}[h]
\centering
\caption{Model performance on test set (mean ± 95\% CI)}
\label{tab:results}
\begin{tabular}{lcc}
\toprule
Model & Accuracy (\%) & F1-Score \\
\midrule
Baseline & 85.4 ± 1.2 & 0.851 ± 0.015 \\
\textbf{Model A} & \textbf{89.2 ± 0.9} & \textbf{0.887 ± 0.012} \\
Model B & 87.6 ± 1.1 & 0.872 ± 0.013 \\
\bottomrule
\end{tabular}
\end{table}

Figure Best Practices

Do:

Quality Checklist

Before submission:

Replication Package

See more details here.

The contents of the repository should be organized to make it as easy as possible for the associate editor to do the replication. This can mean that there should be scripts not only to replicate the experiments in the paper, but also to compile raw data into tables and figures. Ideally, there should be one script (that may itself call other scripts) that performs all experiments, compiles the data, and produces tables and figures. The scripts should correspond closely to the structure of the paper, so that it is clear where to find the part of the script specifically for generating a particular table or figure. There should also be a description of what the expected output is (what files should be produced, etc.) and some sort of indicator of progress. A script that runs for hours without printing anything to the screen is not ideal.

You may wish to have an additional README.md in any of the subdirectories to provide additional information.

Documentation

Documentation should be contained in a file called README.md with the following contents.

  1. How to reproduce the environment
  2. How to run experiments
  3. Expected outputs
  4. System requirements and dependencies
    • Exact dependency versions (lock files)
    • Runtime version (Python 3.9.7, not just 3.9)
    • OS version (if using containers)
    • Hardware requirements (GPU, memory)
    • Random seeds
     # README.md
    
     ## System Requirements
     - Python 3.9+
     - GCC 9+ (for C extensions)
     - CUDA 11.8 (for GPU support)
     - libpq-dev (PostgreSQL client)
    
     ## Installation
     ...
    

Examples

Common Pitfalls