Contributing to VoxelOps ========================= First off, thank you for considering contributing to VoxelOps! We welcome contributions of all kinds, from bug fixes to new features. This guide focuses on how to add a new "runner" to the project. What is a Runner? ----------------- In VoxelOps, a "runner" is a Python function that wraps a command-line tool, typically a Docker-based neuroimaging tool. The goal is to provide a simple, consistent Python interface for these tools. A runner is composed of three main parts: 1. **The Runner Function**: This is the main function that users will call. It takes an ``inputs`` object and an optional ``config`` object, builds and executes a command, and returns a dictionary of results. (e.g., ``src/voxelops/runners/qsiprep.py``) 2. **The Schemas**: These are dataclasses that define the inputs, default parameters, and expected outputs for the runner. They provide type hinting and validation. (e.g., ``src/voxelops/schemas/qsiprep.py``) 3. **Tests**: Each runner should have corresponding tests to ensure it works correctly. (e.g., ``tests/test_runners_qsiprep.py``) Step-by-Step Guide to Adding a New Runner ----------------------------------------- Let's say we want to add a new runner for a tool called ``mytool``. ### 1. Create the Schema File First, create a new file in ``src/voxelops/schemas/`` named ``mytool.py``. In this file, you'll define three dataclasses: - ``MyToolInputs``: Required inputs for your tool, like ``bids_dir`` or ``participant``. - ``MyToolDefaults``: Default parameters for the tool, like the Docker image name or the number of processors. - ``MyToolOutputs``: The expected outputs of the tool, like file paths. Here's an example for ``src/voxelops/schemas/mytool.py``: .. code-block:: python from dataclasses import dataclass from pathlib import Path @dataclass class MyToolInputs: bids_dir: Path participant: str output_dir: Path | None = None work_dir: Path | None = None @dataclass class MyToolDefaults: docker_image: str = "myorg/mytool:latest" nprocs: int = 2 @dataclass class MyToolOutputs: output_file: Path @classmethod def from_inputs(cls, inputs: MyToolInputs, output_dir: Path) -> "MyToolOutputs": return cls( output_file=output_dir / f"sub-{inputs.participant}" / "output.txt" ) ### 2. Create the Runner Function File Next, create the main runner file in ``src/voxelops/runners/``, also named ``mytool.py``. This file will contain the ``run_mytool`` function. This function should: - Accept ``inputs`` (``MyToolInputs``) and optional ``config`` (``MyToolDefaults``). - Use helpers from ``voxelops.runners._base`` to validate inputs. - Construct the full ``docker run`` command as a list of strings. - Call ``run_docker`` from the base module to execute the command. - Return the resulting execution dictionary, augmented with inputs, config, and expected outputs. Here is an example for ``src/voxelops/runners/mytool.py``: .. code-block:: python import os from pathlib import Path from typing import Dict, Optional, Any from voxelops.runners._base import ( run_docker, validate_input_dir, validate_participant, ) from voxelops.schemas.mytool import ( MyToolInputs, MyToolOutputs, MyToolDefaults, ) def run_mytool( inputs: MyToolInputs, config: Optional[MyToolDefaults] = None, **overrides ) -> Dict[str, Any]: config = config or MyToolDefaults() for key, value in overrides.items(): if hasattr(config, key): setattr(config, key, value) validate_input_dir(inputs.bids_dir, "BIDS") validate_participant(inputs.bids_dir, inputs.participant) output_dir = inputs.output_dir or (inputs.bids_dir.parent / "derivatives") work_dir = inputs.work_dir or (output_dir.parent / "work" / "mytool") output_dir.mkdir(parents=True, exist_ok=True) work_dir.mkdir(parents=True, exist_ok=True) expected_outputs = MyToolOutputs.from_inputs(inputs, output_dir) uid = os.getuid() gid = os.getgid() cmd = [ "docker", "run", "-ti", "--rm", "--user", f"{uid}:{gid}", "-v", f"{inputs.bids_dir}:/data:ro", "-v", f"{output_dir}:/out", "-v", f"{work_dir}:/work", config.docker_image, "/data", "/out", "participant", "--participant-label", inputs.participant, "--nprocs", str(config.nprocs), ] log_dir = output_dir.parent / "logs" result = run_docker( cmd=cmd, tool_name="mytool", participant=inputs.participant, log_dir=log_dir, ) result["inputs"] = inputs result["config"] = config result["expected_outputs"] = expected_outputs return result ### 3. Add the Runner to the ``__init__.py`` Make your new runner easily importable by adding it to ``src/voxelops/runners/__init__.py``: .. code-block:: python # src/voxelops/runners/__init__.py ... from .mytool import run_mytool ... And also to the main ``__init__.py`` in ``src/voxelops/__init__.py``: .. code-block:: python # src/voxelops/__init__.py ... from .runners import ( ... run_mytool, ) ... __all__ = [ ... "run_mytool", ] ### 4. Write Tests Finally, add tests for your new runner. Create a new file ``tests/test_runners_mytool.py``. You should at least test: - That the runner function runs without errors (you can mock the ``subprocess.run`` call). - That the Docker command is built correctly. - That input validation works as expected. Refer to existing tests like ``tests/test_runners_qsiprep.py`` for examples. ### 5. Add Validation VoxelOps includes a validation framework to ensure data quality. You should create a validator for your new procedure. Create ``src/voxelops/validation/validators/mytool.py``: .. code-block:: python """MyTool validator with pre and post validation rules.""" from voxelops.validation.rules.common import ( DirectoryExistsRule, GlobFilesExistRule, OutputDirectoryExistsRule, ParticipantExistsRule, ) from voxelops.validation.validators.base import Validator class MyToolValidator(Validator): """Validator for MyTool procedure.""" procedure_name = "mytool" pre_rules = [ # Validate inputs before execution DirectoryExistsRule("bids_dir", "BIDS directory"), ParticipantExistsRule(), GlobFilesExistRule( base_dir_attr="bids_dir", pattern="**/anat/*_T1w.nii.gz", min_count=1, file_type="T1w images", participant_level=True, # Search in sub-{participant} subdirectory ), ] post_rules = [ # Validate outputs after execution OutputDirectoryExistsRule("output_dir", "Output directory"), GlobFilesExistRule( base_dir_attr="output_dir", pattern="**/*.nii.gz", min_count=1, file_type="Output images", phase="post", participant_level=False, # output_dir is already participant-specific ), ] Then: 1. Export it in ``src/voxelops/validation/validators/__init__.py``: .. code-block:: python from .mytool import MyToolValidator __all__ = [ ..., "MyToolValidator", ] 2. Register it in ``src/voxelops/procedures/orchestrator.py``: .. code-block:: python from voxelops.validation.validators import MyToolValidator VALIDATORS = { ..., "mytool": MyToolValidator(), } 3. Add tests in ``tests/validation/test_validators_mytool.py`` See the **Validation Framework** documentation for detailed guidance. ### 6. Update Documentation If you've added a new runner, add it to the list of available procedures in ``docs/index.rst`` and create a new ``.rst`` file for your runner in the ``docs/source/`` folder. Final Words ----------- Once you've followed these steps, open a pull request on GitHub. We'll review your contribution and work with you to get it merged. For more information on validation, see :doc:`validation`. Thank you for helping us make VoxelOps better!