# Welcome Thank you to HelmholtzAI and LUMI AI Factory!
# Practicalities ## Material - Slides are here: [https://otaub.github.io/markdownslides/propulate_intro](https://otaub.github.io/markdownslides/propulate_intro) - Exercises are here: [https://github.com/otaub/Propulate_Tutorial](https://github.com/otaub/Propulate_Tutorial) - Reservation LUMI: HPO-tutorial - Reservation HAICORE: haicon / haicon-gpu8
## Schedule: - 20 minutes presentation - 30 minutes hands-on - 10 minutes presentation - 30 minutes hands-on
# Background 
## What is Propulate? - A tool for hyperparameter search - designed for HPC environments - using MPI for scaling and communication - no ray or dask - no SQL - decentralized - avoiding communication bottlenecks or load imbalances - lazily synchronized - avoiding idle workers
## Optimization - population based - distributed over many workers - evaluate individuals to inform future selection - algorithms to propose new candidates: - evolutionary search - CMA-ES - ...
## Optimization 
## Communication 
# Outlook - Next release: better checkpointing - bayesian search - convenience - log sampling
# Toy Example ```python def loss_fn(params): return params['x']**2 + params['y']**2 ``` ```python propagator = get_default_propagator(pop_size, limits, rng) propulator = Propulator(loss_fn, propagator, rng, generations, ...) propulator.propulate() ```
## Running on LUMI and HAICORE Job scripts are given for each exerise: ```bash sbatch run_
.sh ```
# Hands On
# Scale Up 
# NAS ```python limits = { "num_layers": (2, 10), "activation": ("relu", "sigmoid", "tanh"), "lr": (0.01, 0.0001), "d_hidden": (2, 128), "batch_size": ("1", "2", "4", "8", "16", "32", "64", "128"), } ```
## Loss function - extract parameters from params argument - initialize neural network - set up training - run training - evaluate model
# Propagator - try out the CMA-ES propagator ```python adapter = ActiveCMA() propagator = CMAPropagator(adapter, limits, rng=rng) ```
# Worker Parallelization - for expensive evaluation it may make sense to parallelize ```python def parallel_loss_fn(params, comm): return comm.allreduce(list(params.values())[comm.rank] ** 2) ``` ```python islands = Islands( ... ranks_per_worker=2 ```
# DDP - wrap the dataloader sampler in `distributed` - initialize process group - set environment variables `MASTER_ADDR` and `MASTER_PORT` - call `torch.distributed.init_process_group`
# Hands On