from d2l import torch as d2l
import logging
logging.basicConfig(level=logging.INFO)
from syne_tune.config_space import loguniform, randint
from syne_tune.backend.python_backend.python_backend import PythonBackend
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.experiments import load_experiment19.3 Asynchronous Random Search
As we have seen in the previous Section 19.2, we might have to wait hours or even days before random search returns a good hyperparameter configuration, because of the expensive evaluation of hyperparameter configurations. In practice, we have often access to a pool of resources such as multiple GPUs on the same machine or multiple machines with a single GPU. This begs the question: How do we efficiently distribute random search?
In general, we distinguish between synchronous and asynchronous parallel hyperparameter optimization (see Figure 19.3.1). In the synchronous setting, we wait for all concurrently running trials to finish, before we start the next batch. Consider configuration spaces that contain hyperparameters such as the number of filters or number of layers of a deep neural network. Hyperparameter configurations that contain a larger number of layers of filters will naturally take more time to finish, and all other trials in the same batch will have to wait at synchronisation points (grey area in Figure 19.3.1) before we can continue the optimization process.
In the asynchronous setting we immediately schedule a new trial as soon as resources become available. This will optimally exploit our resources, since we can avoid any synchronisation overhead. For random search, each new hyperparameter configuration is chosen independently of all others, and in particular without exploiting observations from any prior evaluation. This means we can trivially parallelize random search asynchronously. This is not straight-forward with more sophisticated methods that make decision based on previous observations (see Section 19.5). While we need access to more resources than in the sequential setting, asynchronous random search exhibits a linear speed-up, in that a certain performance is reached \(K\) times faster if \(K\) trials can be run in parallel.
In this notebook, we will look at asynchronous random search where trials are executed in multiple python processes on the same machine. Distributed job scheduling and execution is difficult to implement from scratch. We will use Syne Tune (Salinas et al. 2022), which provides us with a simple interface for asynchronous HPO. Syne Tune is designed to be run with different execution back-ends, and the interested reader is invited to study its simple APIs in order to learn more about distributed HPO.
INFO:root:AWS dependencies are not imported since dependencies are missing. You can install them with
pip install 'syne-tune[aws]'
or (for everything)
pip install 'syne-tune[extra]'
AWS dependencies are not imported since dependencies are missing. You can install them with
pip install 'syne-tune[aws]'
or (for everything)
pip install 'syne-tune[extra]'
INFO:root:Ray Tune schedulers and searchers are not imported since dependencies are missing. You can install them with
pip install 'syne-tune[raytune]'
or (for everything)
pip install 'syne-tune[extra]'
AWS dependencies are not imported since dependencies are missing. You can install them with
pip install 'syne-tune[aws]'
or (for everything)
pip install 'syne-tune[extra]'
19.3.1 Objective Function
First, we have to define a new objective function such that it now returns the performance back to Syne Tune via the report callback.
def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):
from d2l import torch as d2l
from syne_tune import Reporter
model = d2l.LeNet(lr=learning_rate, num_classes=10)
trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)
data = d2l.FashionMNIST(batch_size=batch_size)
model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
report = Reporter()
for epoch in range(1, max_epochs + 1):
if epoch == 1:
# Initialize the state of Trainer
trainer.fit(model=model, data=data)
else:
trainer.fit_epoch()
validation_error = d2l.numpy(trainer.validation_error().cpu())
report(epoch=epoch, validation_error=float(validation_error))Note that the PythonBackend of Syne Tune requires dependencies to be imported inside the function definition.
19.3.2 Asynchronous Scheduler
First, we define the number of workers that evaluate trials concurrently. We also need to specify how long we want to run random search, by defining an upper limit on the total wall-clock time.
n_workers = 2 # Needs to be <= the number of available GPUs
max_wallclock_time = 5 * 60 # 5 minutesNext, we state which metric we want to optimize and whether we want to minimize or maximize this metric. Namely, metric needs to correspond to the argument name passed to the report callback.
mode = "min"
metric = "validation_error"We use the configuration space from our previous example. In Syne Tune, this dictionary can also be used to pass constant attributes to the training script. We make use of this feature in order to pass max_epochs. Moreover, we specify the first configuration to be evaluated in initial_config.
config_space = {
"learning_rate": loguniform(1e-2, 1),
"batch_size": randint(32, 256),
"max_epochs": 10,
}
initial_config = {
"learning_rate": 0.1,
"batch_size": 128,
}Next, we need to specify the back-end for job executions. Here we just consider the distribution on a local machine where parallel jobs are executed as sub-processes. However, for large scale HPO, we could run this also on a cluster or cloud environment, where each trial consumes a full instance.
trial_backend = PythonBackend(
tune_function=hpo_objective_lenet_synetune,
config_space=config_space,
)We can now create the scheduler for asynchronous random search, which is similar in behaviour to our BasicScheduler from Section 19.2.
scheduler = RandomSearch(
config_space,
metric=metric,
mode=mode,
points_to_evaluate=[initial_config],
)INFO:syne_tune.optimizer.schedulers.fifo:max_resource_level = 10, as inferred from config_space
INFO:syne_tune.optimizer.schedulers.fifo:Master random_seed = 4170152748
Syne Tune also features a Tuner, where the main experiment loop and bookkeeping is centralized, and interactions between scheduler and back-end are mediated.
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
print_update_interval=int(max_wallclock_time * 0.6),
)Let us run our distributed HPO experiment. According to our stopping criterion, it will run for about 12 minutes.
tuner.run()INFO:syne_tune.tuner:results of trials will be saved on /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122
INFO:syne_tune.backend.local_backend:Detected 4 GPUs
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1 --batch_size 128 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/0/checkpoints
INFO:syne_tune.tuner:(trial 0) - scheduled config {'learning_rate': 0.1, 'batch_size': 128, 'max_epochs': 10}
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.03033640271278961 --batch_size 249 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/1/checkpoints
INFO:syne_tune.tuner:(trial 1) - scheduled config {'learning_rate': 0.03033640271278961, 'batch_size': 249, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 1 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.042434389941759125 --batch_size 165 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/2/checkpoints
INFO:syne_tune.tuner:(trial 2) - scheduled config {'learning_rate': 0.042434389941759125, 'batch_size': 165, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 0 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.013114173459011173 --batch_size 121 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/3/checkpoints
INFO:syne_tune.tuner:(trial 3) - scheduled config {'learning_rate': 0.013114173459011173, 'batch_size': 121, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 2 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.03784794208219196 --batch_size 43 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/4/checkpoints
INFO:syne_tune.tuner:(trial 4) - scheduled config {'learning_rate': 0.03784794208219196, 'batch_size': 43, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 3 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.6185299324920032 --batch_size 247 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/5/checkpoints
INFO:syne_tune.tuner:(trial 5) - scheduled config {'learning_rate': 0.6185299324920032, 'batch_size': 247, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 5 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.02174475254575398 --batch_size 213 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/6/checkpoints
INFO:syne_tune.tuner:(trial 6) - scheduled config {'learning_rate': 0.02174475254575398, 'batch_size': 213, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 4 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1289829536483372 --batch_size 34 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/7/checkpoints
INFO:syne_tune.tuner:(trial 7) - scheduled config {'learning_rate': 0.1289829536483372, 'batch_size': 34, 'max_epochs': 10}
INFO:syne_tune.tuner:tuning status (last metric is reported)
trial_id status iter learning_rate batch_size max_epochs epoch validation_error worker-time
0 Completed 10 0.100000 128 10 10.0 0.264339 49.573275
1 Completed 10 0.030336 249 10 10.0 0.898976 36.109143
2 Completed 10 0.042434 165 10 10.0 0.652981 42.715872
3 Completed 10 0.013114 121 10 10.0 0.900263 43.281537
4 Completed 10 0.037848 43 10 10.0 0.260921 76.531635
5 Completed 10 0.618530 247 10 10.0 0.191212 36.242573
6 InProgress 5 0.021745 213 10 5.0 0.899963 18.645404
7 InProgress 0 0.128983 34 10 - - -
2 trials running, 6 finished (6 until the end), 185.39s wallclock-time
INFO:syne_tune.tuner:Trial trial_id 6 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.01479449790006266 --batch_size 112 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/8/checkpoints
INFO:syne_tune.tuner:(trial 8) - scheduled config {'learning_rate': 0.01479449790006266, 'batch_size': 112, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 8 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.6811365707724747 --batch_size 63 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/9/checkpoints
INFO:syne_tune.tuner:(trial 9) - scheduled config {'learning_rate': 0.6811365707724747, 'batch_size': 63, 'max_epochs': 10}
INFO:syne_tune.tuner:Trial trial_id 7 completed.
INFO:syne_tune.backend.local_backend:running subprocess with command: /home/smola/d2l/d2l-neu/.venv-pytorch/bin/python3 /home/smola/d2l/d2l-neu/.venv-pytorch/lib/python3.11/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.5447622537715453 --batch_size 146 --max_epochs 10 --tune_function_root /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/tune_function --tune_function_hash 1e72837c1c0e2d86cf407d494085c042 --st_checkpoint_dir /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122/10/checkpoints
INFO:syne_tune.tuner:(trial 10) - scheduled config {'learning_rate': 0.5447622537715453, 'batch_size': 146, 'max_epochs': 10}
INFO:syne_tune.stopping_criterion:reaching max wallclock time (300), stopping there.
INFO:syne_tune.tuner:Stopping trials that may still be running.
INFO:syne_tune.tuner:Tuning finished, results of trials can be found on /home/smola/syne-tune/python-entrypoint-2026-04-24-01-52-50-122
--------------------
Resource summary (last result is reported):
trial_id status iter learning_rate batch_size max_epochs epoch validation_error worker-time
0 Completed 10 0.100000 128 10 10 0.264339 49.573275
1 Completed 10 0.030336 249 10 10 0.898976 36.109143
2 Completed 10 0.042434 165 10 10 0.652981 42.715872
3 Completed 10 0.013114 121 10 10 0.900263 43.281537
4 Completed 10 0.037848 43 10 10 0.260921 76.531635
5 Completed 10 0.618530 247 10 10 0.191212 36.242573
6 Completed 10 0.021745 213 10 10 0.899469 39.490966
7 Completed 10 0.128983 34 10 10 0.168894 83.170967
8 Completed 10 0.014794 112 10 10 0.899802 40.273930
9 InProgress 7 0.681137 63 10 7 0.214180 34.280768
10 InProgress 3 0.544762 146 10 3 0.285046 18.356503
2 trials running, 9 finished (9 until the end), 300.56s wallclock-time
validation_error: best 0.16889351606369019 for trial-id 7
--------------------
The logs of all evaluated hyperparameter configurations are stored for further analysis. At any time during the tuning job, we can easily get the results obtained so far and plot the incumbent trajectory.
d2l.set_figsize()
tuning_experiment = load_experiment(tuner.name)
tuning_experiment.plot()19.3.3 Visualize the Asynchronous Optimization Process
Below we visualize how the learning curves of every trial (each color in the plot represents a trial) evolve during the asynchronous optimization process. At any point in time, there are as many trials running concurrently as we have workers. Once a trial finishes, we immediately start the next trial, without waiting for the other trials to finish. Idle time of workers is reduced to a minimum with asynchronous scheduling.
d2l.set_figsize([6, 2.5])
results = tuning_experiment.results
for trial_id in results.trial_id.unique():
df = results[results["trial_id"] == trial_id]
d2l.plt.plot(
df["st_tuner_time"],
df["validation_error"],
marker="o"
)
d2l.plt.xlabel("wall-clock time")
d2l.plt.ylabel("objective function")Text(0, 0.5, 'objective function')
19.3.4 Summary
We can reduce the waiting time for random search substantially by distributing trials across parallel resources. In general, we distinguish between synchronous scheduling and asynchronous scheduling. Synchronous scheduling means that we sample a new batch of hyperparameter configurations once the previous batch finished. If we have stragglers - trials that take more time to finish than other trials - our workers need to wait at synchronization points. Asynchronous scheduling evaluates new hyperparameter configurations as soon as resources become available, and, hence, ensures that all workers are busy at any point in time. While random search is easy to distribute asynchronously and does not require any change of the actual algorithm, other methods require some additional modifications.
19.3.5 Exercises
- Consider the
DropoutMLPmodel implemented in Section 5.6, and used in Exercise 1 of Section 19.2.- Implement an objective function
hpo_objective_dropoutmlp_synetuneto be used with Syne Tune. Make sure that your function reports the validation error after every epoch. - Using the setup of Exercise 1 in Section 19.2, compare random search to Bayesian optimization. If you use SageMaker, feel free to use Syne Tune’s benchmarking facilities in order to run experiments in parallel. Hint: Bayesian optimization is provided as
syne_tune.optimizer.baselines.BayesianOptimization. - For this exercise, you need to run on an instance with at least 4 CPU cores. For one of the methods used above (random search, Bayesian optimization), run experiments with
n_workers=1,n_workers=2,n_workers=4, and compare results (incumbent trajectories). At least for random search, you should observe linear scaling with respect to the number of workers. Hint: For robust results, you may have to average over several repetitions each.
- Implement an objective function
- Advanced. The goal of this exercise is to implement a new scheduler in Syne Tune.
- Create a virtual environment containing both the d2lbook and syne-tune sources.
- Implement the
LocalSearcherfrom Exercise 2 in Section 19.2 as a new searcher in Syne Tune. Hint: Read this tutorial. Alternatively, you may follow this example. - Compare your new
LocalSearcherwithRandomSearchon theDropoutMLPbenchmark.