PyJob

Python-controlled job execution across multiple platforms

PyPi Package Codecov Status Python Version

Installation

Latest official release

$ pip install pyjob

Source code

$ git clone https://github.com/fsimkovic/pyjob.git
$ cd pyjob
$ python setup.py install

Quickstart

Script creation

A Script is easily created by simply providing some optional information. Content can be stored just like any other Python list.

>>> from pyjob import Script
>>> script = Script(directory='.', prefix='example', stem='', suffix='.sh')
>>> script.append('sleep 5')
>>> print(script)
#!/bin/bash
sleep 5

The path to the Script can be retrieved by accessing the associated path attribute.

>>> print(script.path)
'./example.sh'

We could also write() the Script to disk, but do not worry, the Task would do this for you in case you forget before execution.

>>> script.write()

If we are provided with a script written to disk, i.e. reverse the previous few steps, we could simply use the read_script function, and obtain a Script instance. This would also allow us to conveniently edit a Script if necessary.

>>> from pyjob import read_script
>>> script = read_script('./example.sh')
>>> print(script)
#!/bin/bash
sleep 5

To create multiple scripts in parallel we can use the LocalScriptCreator, given a function to generate a single Script, an iterable containing the options for each script, and the number of processors to use. You can access the collector, which will return the ScriptCollector that can then be input directly into TaskFactory for execution (detailed below).

>>> from pyjob.script import LocalScriptCreator
>>> script_creator = LocalScriptCreator(func=example_function, iterable=example_iterable, processes=2)
>>> collector = script_creator.collector()

Execution of single script on a local machine

The Script created in the previous step can be easily executed across all supported platforms, i.e. operating systems and HPC queueing systems. To do so, we simply select a platform (local in the example below), provide one or more Script instances or paths to scripts, and then execute with the run() method. To simplify the selection of the correct platform, a TaskFactory is provided.

>>> from pyjob import TaskFactory
>>> with TaskFactory('local', script) as task:
...     task.run()

In the example, the Task is handled with a Python context, which is the recommended way to handle all Task instances.

Execution of multiple scripts on a local machine

>>> def dup_script(s, i=0):
...     s1 = s[:]
...     s1.stem = str(i)
...     return s1
>>> script1 = dup_script(script, i=0)
>>> script2 = dup_script(script, i=1)

This process is identical to the previous example, except that this time we provide the Script instances as list.

>>> with TaskFactory('local', [script1, script2]) as task:
...     task.run()

If we would like to use multiple processes, simply provide the processes keyword argument with the relevant count.

>>> with TaskFactory('local', [script1, script2], processes=2) as task:
...     task.run()

If a list of Script instances is inconvenient to maintain, or you would like to use the latest implementation, you could also use the ScriptCollector and provide it instead.

>>> from pyjob.script import ScriptCollector
>>> collector = ScriptCollector(script)
>>> for i in range(5):
...     script = dup_script(script, i=i)
...     collector.add(script)
>>> with TaskFactory('local', collector, processes=2) as task:
...     task.run()

Execution of multiple scripts on non-local platforms

>>> with TaskFactory('sge', [script1, script2]) as task:
...     task.run()

The first argument to TaskFactory, sge in this example, defines the platform on which the Task will be executed. Other options exist and you can try this by installing PyJob on such a machine and substituting any of below options in.

Platform Argument Task class
Local Machine local LocalTask
Sun Grid Engine sge SunGridEngineTask
Slurm slurm SlurmTask
Load Sharing Facility lsf LoadSharingFacilityTask
Portable Batch System pbs PortableBatchSystemTas
TORQUE Resource Manager torque TorqueTask

Execution of Python functions

This little nugget is simply an extension to multiprocessing.Pool to simplify and tidy imports in your own code. It also provides a backwards-compatible context for the multiprocessing.Pool, which is standard in Python3.

>>> import time
>>> def sleep(t):
...     time.sleep(t)
>>> from pyjob import Pool
>>> with Pool(processes=4) as pool:
...     pool.map(sleep, [10] * 8)

Default configuration

If you use PyJob frequently, you may find the manual definition of the same parameters for the system irritating. You are able to pre-define default configurations for your system by creating a YAML configuration file. To simplify the procedure of default-option setting, use:

$ pyjob conf platform:local processes:4

This would set the default platform to local and the number of processors to use to 4. You therefore do not need to define those in your constructors, unless you want to override them for a particular task.

If you decide that you would like to change a parameter, call the same command with a different parameter. Alternatively, to delete an option, simply set set the value, for example local or 4 in the example above, to None.

Indices and tables