# Hyperparameter Search with 
## Material - Slides are here: [https://otaub.github.io/markdownslides/propulate_demo](https://otaub.github.io/markdownslides/propulate_demo) - Demo code is here: [https://github.com/otaub/Propulate_Demo](https://github.com/otaub/Propulate_Demo) - Software is here: [https://github.com/Helmholtz-AI-Energy/propulate](https://github.com/Helmholtz-AI-Energy/propulate)
# Motivation - Hyperparameter Search - Explore the space by evaluating many candidates - Grid Search? - As few as possible - Random Search? - Capitalize on already evaluated examples to guide the search and to propose future candidates - "Intelligent" search algorithms - Parallel search algorithms
## What is Propulate? - A tool for parallel hyperparameter search - Designed for HPC environments - One optimization job that performs many evaluations, not one job per evaluation - Not elastic - Using MPI for scaling and communication - Decentralized - Avoiding communication bottlenecks or load imbalances - Lazily synchronized - Avoiding idle workers
## Population Based Optimization 
## Asynchronous Communication for Parallel Search 
## Sampling Algorithm Example: Genetic Crossover 
# Setup on LUMI ```bash git clone https://github.com/Helmholtz-AI-Energy/propulate.git module use /appl/local/csc/modulefiles/ module load pytorch/2.5 python -m venv pvenv --system-site-packages source pvenv/bin/activate cd propulate pip install -e . ```
# Toy Example ```python import random from propulate import utils, Propulator def loss_fn(params): return params['x']**2 + params['y']**2 limits = {'x': (-3., 3.), 'y': (-3., 3.)} [...] propagator = get_default_propagator(pop_size, limits, rng) propulator = Propulator(loss_fn, propagator, rng, generations, checkpoint_path) propulator.propulate() ```
# Islands 
## Loss function for NAS - extract parameters from params argument - initialize neural network - set up training - run training - evaluate model - return loss to propulate (can also be a different metric, but smaller = better)
# Worker Parallelization #### for expensive evaluation it may make sense to parallelize ```python islands = Islands( ... ranks_per_worker=2 ) ``` Loss function has to accept an MPI communicator ```python def parallel_loss_fn(params, comm): return comm.allreduce(list(params.values())[comm.rank] ** 2) ```
# DDP - wrap the dataloader sampler, model, etc. in `distributed` - initialize process group - set environment variables `MASTER_ADDR` and `MASTER_PORT` - call `torch.distributed.init_process_group`