Custom software environments¶
In the ATS SWAN prototype, the software available in a user session can come from one of two sources:
- LCG release: a software stack maintained by the EP department at CERN, containing hundreds of packages distributed via CVMFS.
- Custom software environment (CSE): a virtual environment defined by the user, containing a specific set of Python packages.
While (1) is the traditional way of providing software in SWAN, (2) is a new feature implemented in the ATS SWAN prototype. The next subsections describe how to create and use CSEs.
Creating an environment¶
A CSE is created by specifying the following parameters in the SWAN form:
- A Git repository URL, which must contain the package requirements for the environment. This repository may also contain notebooks or other files intended for use within the environment. Note that multiple users can base their CSEs on the same Git repository and potentially contribute new content.
- An Acc-Py release, which serves as the base environment. The user-specified packages are installed on top of this base layer.
Specifying requirements¶
Package requirements for a CSE must be defined using requirement files placed at the root of the Git repository. Two types of requirements file are supported:
requirements.in
: a high-level specification listing only the direct dependencies of the user's code. It follows the pip requirements file format. Example:matplotlib ipympl widgetsnbextension
requirements.txt
: a fully resolved list of all dependencies and their versions. After creating a CSE from arequirements.in
file, the user can generate arequirements.txt
from within the environment usingpip freeze --local > requirements.txt
. Example:contourpy==1.3.1 cycler==0.12.1 fonttools==4.57.0 ipympl==0.9.7 ipywidgets==8.1.6 jupyterlab_widgets==3.0.14 kiwisolver==1.4.8 matplotlib==3.10.1 pillow==11.2.1 pyparsing==3.2.3 widgetsnbextension==4.0.14
Note
If both requirements.in
and requirements.txt
are present, the latter is used. Using requirements.txt
usually speeds up environment creation.
Managing repositories¶
During session startup, the selected Git repository is used to build the CSE. On first use, SWAN clones the repository into the user's CERNBox directory under:
$HOME/SWAN_projects/name_of_the_repository
SWAN then uses the repository's requirements to construct the environment. Once ready, the user is directed to the JupyterLab interface, where notebooks and terminals have been configured to use the packages of the environment.
SWAN includes the JupyterLab Git extension, enabling users to manage the repository through the web interface — pulling updates, committing, and pushing changes — without needing to use the command line.
Note
After a repository is cloned into the user's CERNBox space, it is the user's responsibility to keep it synchronized with upstream changes if desired. Re-selecting the same repository will reuse the existing clone without automatically pulling updates.
Private / internal repositories¶
In addition to public Git repositories, CSEs can also be created from private or internal ones. To do this, users must first configure Git access credentials within their SWAN session so that the git clone
command can authenticate successfully. Here we describe two alternatives that can be applied to repositories hosted on gitlab.cern.ch
:
- Personal access tokens: create a token via GitLab's web interface (instructions here), then configure Git in the SWAN terminal to use such token:
echo "https://$USER:${TOKEN_VALUE_HERE}$@gitlab.cern.ch" >> $HOME/.git-credentials git config --global credential.helper store
- SSH keys: configure Git to use SSH for authentication. See these instructions for details.
Once configured, Git credentials will be reused automatically in future sessions.
NXCALS integration¶
The ATS SWAN prototype supports NXCALS users submitting computations to the NXCALS Hadoop cluster from within a CSE. To enable this:
- Include the
nxcals
Python package in the requirements file of the CSE. - Select the NXCALS cluster under "External Computing Resources" → "Spark clusters", in the SWAN form.
In such sessions, SWAN automatically installs the SparkConnector
and SparkMonitor
JupyterLab extensions, thus allowing users to connect to the NXCALS cluster and monitor Spark jobs directly from their notebook.