Running Spark Locally¶

In case you want to run some quick tests on a small dataset (without creating connection to CERN clusters), you can use an instance of Spark that is local to your SWAN session. In order to do so, Python notebooks have a SparkConf variable pre-defined (swan_spark_conf) that will allow you to use e.g. spark monitor integration. You can use that variable to create, for example, a SparkContext:

from pyspark import SparkContext
sc = SparkContext.getOrCreate(conf = swan_spark_conf)