Move from interactive to batch¶
Analyses that cannot be considered interactive or semi-interactive, i.e. which run during many hours or even days, should be better executed as batch jobs. It is part of our future work to give users clear instructions on when to move to batch, as well as to ease the transition from interactive. For the latter, ideally the same code that runs interactively (and distributedly) should be able to execute as a batch job too. Analysis frameworks can help with this, since they offer high-level programming models that are able to exploit data parallelism under the hood, and that can run on multiple backends thanks to Dask. Thus, the same RDataFrame / coffea analysis code that runs on SWAN interactively could be wrapped as a batch job. Such a batch job would not only run the Dask workers (like in the interactive case) but the main program of the analysis and the Dask scheduler too, i.e. the whole execution would happen on the batch side.