[[_TOC_]]

# Stanford  WWW Scheduling Service (`stanford-www-schedule`)

## Overview

(Note: This project moved over from [stanford-server-tools][2].)


The WWW Scheduling Service allows users to define and schedule jobs that
run on the WWW servers. The software package in this repository
contains the code that runs the Job Manager and the Job Scheduler (indicated
by the box around those two components in the figure below).

```mermaid
graph TD
    U[Users] --> |create jobs| A
    D[database]
    W[Job Servers]
    subgraph www-scheduler
    A[Job Manager]
    B[Job Scheduler]
    end
    A --> |save jobs| D[Database]
    B --> |read jobs| D
    B --> |run job x| W
    W --> |get job x info| D
```

For client-facing documentation see ["Scheduling Commands and
Scripts"][1].

### The Job Manager

The Job Manager is a simple web application based on the Perl script
[`scheduler.cgi`](usr/share/www-scheduler/cgi-bin/scheduler.cgi).
The web application allows users to create, update, and delete
cron jobs. These cron jobs are saved to the Database (see also
section ["Database Schema"](#database-schema)).

When a user is creating or editing a scheduler job one of the attributes
of the job is a choice of Kerberos principals that the user is
administers. The scheduler job will be run on behalf of this principal.
The list is generated by a remctl call to the `afsdir` service. In most
cases, the principal is `<sunetid>/cgi` or `<sunetid>/cron`.

### The Job Scheduler

The Job Scheduler is the Perl script
[`cron-distribute`](usr/sbin/cron-distribute). The `cron-distribute`
script uses the third-party Perl library [`Schedule::Cron`][4] to
decide when to run jobs. It reads the cron jobs from the Database and
starts a process that acts as its own cron service running each job
according to its schedule.

Every so often the Job Scheduler reads the Database to get all the jobs and
updates the running `Schedule::Cron` object with any changes (e.g., deleting
jobs for any jobs removed from the Database).

The Job Scheduler does not run the jobs itself, rather, it tells a
Job Server to run the job.

(Is the following still true?)
According to the legacy documentation when `cron-distribute` reloads from
the database it leaves zombie processes. In light of this it is a good
idea to restart `cron-distribute` every so often, for example, at three
minutes past 1:00 AM because no cron job is scheduled to run at three
minutes past an hour.

### The Job Servers

The Job Servers are the WWW servers. They run a remctl service that the Job
Scheduler uses to tell a Job Server what command to run. The software
used by the remctl servers on the job servers is in the
[stanford-server-www Debian package][5].

Question: When there are multiple job servers how is the job server chosen?

Answer: First, all Job Servers are queried using [lbcd][6] to get their current load.
Any server with a load larger than `maxload` is skipped as being too busy
to accept new scheduler jobs. The server with the lightest load is chosen to be
the one to accept the job. Note that it can happen that _no_ server is
available (e.g., all the servers are too loaded). In that case the job is
not scheduled.

## Web Server Authentication and Attributes

For the Job Manager to work properly the authenticated user's SUNetID
*must* be set in the environment variable `uid`. This is typically
accomplished via web authentication (e.g., SAML). No other user attributes
need to be supplied by the web server.

## Configuration

Configuration is defined in the `www-scheduler-config.yaml` YAML file
stored in the `/etc/www-scheduler/` directory. See also the example
configuration file
[`/etc/www-scheduler/www-scheduler-config.yaml.example`](etc/www-scheduler/www-scheduler-config.yaml.example).

### Database connection settings

The database connections settings are stored under the `db` element and are
self-explanatory. Here is an example:
```
db:
  host:   127.0.0.1
  port:   3306
  dbname: s_www_cron_dev
  username: s_www_cron_dev
  password-file-path: /etc/www-scheduler/db-password
```

The file containing the database password should be a single line with that line
being the password. Example:
```
$ cat /etc/www-scheduler/db-password
3dzaO9vD39n6hnIx1kEf
$
```

Note that the file the scheduler reads the database password from can be
changed by changing the value of `db:password-file-path`.

### Kerberos settings

The Kerberos settings are straight-forward:


```
kerberos:
  keytab-file-path: /etc/www-scheduler/keytab
  principal:        service/www-scheduler-dev@stanford.edu
  cache-file-path:  /tmp/www-scheduler-cc
```

### Remctl settings

```
remctl:
  afsdir:
    host: example1.com
    service: host/example.com@stanford.edu
  afstools:
    host: example2.com
    service: service/afs-tools@stanford.edu
```

The setting `remctl:afsdir:host` should be the fully-qualified name for the
host with the remctl service `afsdir`. See ["The `afsdir` remctl
service"](#the-afsdir-remctl-service) below for details on how this
service is used. The setting `remctl:afsdir:service` should be the
Kerberos principal that the `remctl:afsdir:host` runs under.

The setting `remctl:afstools:host` should be the fully-qualified name for
the host with the remctl service `pts examine`. The setting
`remctl:afstools:service` should be the Kerberos principal that the
`remctl:afstools:host` runs under. See ["The `pts-examine` and
`pts-membership` remctl
services"](#the-pts-examine-and-pts-membership-remctl-services) below for
details on how this service is used.


### Mail send settings

```
send_email:
  smtp_host: localhost
  smtp_port: 465
  from: noreply@stanford.edu
  to_override: bogus@stanford.edu
  auth_enabled: yes
  auth_username: service/something
  auth_password_file: /etc/www-scheduler/send-email-password
```

The www-scheduler web application does not need to send mail. However,
the `cron-maint` utility _does_ need to send mail when it expires a
job. This section of configuration settings concerns the mails that
`cron-maint` sends.

The `send_email:to_override` setting should be set only when you want
_all_ mail to be sent to a single recipient rather than to the job
requester. This is used when testing the job expiration function of
`cron-maint` and you want to avoid sending mail to actual users.

Set `send_email:auth_enabled` to the string `yes` if you need to
authenticate when sending mail through the SMTP server. In this case set
`send_email:auth_username` to the username of the credentials used to
authenticate. This is usually a Kerberos principal name. Put the password
in the file pointed to by `send_email:auth_password_file`. If
`send_email:auth_enabled` is not set to `yes` the configuration directives
`auth_username` and `auth_password_file` are ignored.

### The Job Scheduler settings

The Job Scheduler settings are under the `cron-distribute` element. Example:
```
cron-distribute:
  maxload: 4
  sleeptime-minutes: 5
  cronservers:
    - web11
    - web12
    - web13
    - web14
    - web15
    - web16
    - web17
```

#### `cronservers`

The Job Servers are listed under `cron-distribute:cronservers`. Jobs are
scheduled on one of these servers based on server load. For example, to
set the three servers `www01.stanford.edu`, `www02.stanford.edu`, and
`www03.stanford.edu` as the job servers:
```
cron-distribute:
  cronservers:
    - www01.stanford.edu
    - www02.stanford.edu
    - www03.stanford.edu
```

### `sleeptime-minutes`

How long in minutes to wait between Database reads when updating the
`Schedule::Cron` object.

### `maxload`

When scheduling a job on a job server, if the job server's load is greater
than `maxload` that server is skipped. If no server has a load less than
`maxload` then the job will not be scheduled.

The server load used is the one-minute load average on that job server.
Thus, if `maxload` is set to `4`, then no server with a one-minute load
average greater than 4 will be scheduled.

### Protected Jobs

Threr are some jobs that we never want de-activated, for example,
jobs used for monitoring the scheduler service itself. Jobs listed
under the `protected_ids` configuration directive will be skipped by the
`cron-main expire` process. Example:

```
protected_ids:
  - 1379
  - 1381
  - 1383
```

### Test mode

Test mode is useful when testing the application as it gives you control
over which jobs will be sent to a job server.

To put the Job Scheduler in "test mode" set `cron-distribute:test-mode` to
"true". In test mode when it is time for the Job Scheduler to send a job
to a job server the scheduler checks to see if the job's id is in the list
`cron-distribute:test-cron-ids`. If so then the job is sent to a job
server as normal; if not then the job is _not_ sent and instead a message
is logged indicating the job has been skipped. If
`cron-distribute:test-cron-ids` is not defined or is the empty array then
no jobs will be sent to a job server.

### Impersonation

Normally, the only jobs shown in the Job Manager are jobs the current
authenticated user has access to, that is, jobs that user has created or
jobs using a service principal that user has access to. For
troubleshooting it is useful for an administrator to pretend to be another
user to see what that user sees. The www-scheduler has such an
impersonation feature. To impersonate a user in the Job Manager follow
these two steps:

1. Set the environment variable `IMPERSONATE_USER` to the string `yes`
(case is ignored), and

2. put the sunetid you want to impersonate as the first (and only)
line in the file `/etc/www-scheduler/impersonate.txt`.

After restarting the web service if you go to the path
`/impersonate/scheduler.cgi` you will be able to manage the user set in
`IMPERSONATE_USER`. Note that these settings _only_ affect the application's
`/impersonate/scheduler.cgi` path so anyone using the normal path of
`/scheduler.cgi` will _not_ be affected.

## Necessary external services

The Scheduling Service uses several external services in order to properly
run the Job Manager and to schedule jobs. IEDO does not run these
services, but for the Scheduling Service to work it must be able to reach
these external services.

### The `lbcd` service on Job Servers

The Job Scheduler runs `lbcdclient` against each Job Server to establish
server load in order to decide where to send a Job. Thus, each Job Server
(i.e., the WWW servers) must be running [`lbcd`][6] and have made the
`lbcd` port (4330/UDP) accessible by the Job Scheduler.


### The `afsdir` remctl service

The `afsdir` remctl service is (currently) hosted on the
`lsdb2.stanford.edu` server. The script
[`cgi-bin/scheduler.cgi`](usr/share/www-scheduler/cgi-bin/scheduler.cgi)
uses this service to get the list of group and department AFS directories
for which the current user is an admin.
See also ServiceNow ticket TASK00328971.

### The `pts-examine` and `pts-membership` remctl services

The `pts-examine` and `pts-membership` remctl services are (currently)
hosted on the `afs-tools.stanford.edu` server. These services return [AFS
PTS][7] information from Stanford's AFS instance. This is used, in part,
to determine if the current user has a `cgi` or `cron` Kerberos principal.

Their use is most easily explained with examples:
```
$ remctl afs-tools pts-examine
No arguments supplied. Provide a valid PTS entry name.

$ remctl afs-tools pts-examine adamhl
Name: adamhl, id: 52777, owner: system:administrators, creator: 11, membership: 25, flags: S----, group quota: 19.

$ remctl afs-tools pts-examine service.www
Name: service.www, id: 70781, owner: system:administrators, creator: 11, membership: 3, flags: S----, group quota: 20.

$ remctl afs-tools pts-membership
No arguments supplied. Provide a valid PTS entry name.

$ remctl afs-tools pts-membership adamhl
Groups adamhl (id: 52777) is a member of: workgroup:uit-et-eit_iedo-staff iedo-admins workgroup:acs_ldap_backup_read workgroup:itservices_authz-admins workgroup:acs_unix-team workgroup:acs_team networking:dnshacks mysqltest-admins ...more...

$ remctl afs-tools pts-membership networking:lnas
Members of networking:lnas (id: -25387) are: reuling randy lorinda sukoua wilhelmi watsonj ...many more...
```
See also ServiceNow ticket TASK00328971.

## Job expiration and activation

A job will continue to run if it is active, that is, if its `cr_active`
field is set to `Yes`. A job can be decativated manually by the job's
requester through the web interface.

Jobs are also deactivated by running the `cron-maint expire` command. This
command looks for all active jobs that have a last-modified date more than
365 days in the past. For all such jobs found it deactivates them by setting
their `cr_active` field to `No` and sends an mail to the requester
alerting them that the job has been deactivated and explaining how to
re-activate; the `cron-main expire` command does _not_ change the
jobs last-modified date. Exception: jobs that are listed in the
[`protected_ids`](#protected-jobs) configuration directive will never be
expired.

It is recommended that `cron-maint expire` be run periodically, say once
per day.


## The `cron-maint` utility

The script [`/usr/sbin/cron-maint`](usr/sbin/cron-maint) is used for
database maintenance and to expire jobs. See the [`cron-maint` man
page](docs/cron-maint) for more information on how to use `cron-maint`.


## Database Schema

```
+-----------------+--------------------------------------+------+-----+-------------------+-----------------------------+
| Field           | Type                                 | Null | Key | Default           | Extra                       |
+-----------------+--------------------------------------+------+-----+-------------------+-----------------------------+
| cr_id           | int(7)                               | NO   | PRI | NULL              | auto_increment              |
| cr_requester    | varchar(30)                          | NO   |     | NULL              |                             |
| cr_pts_group    | varchar(40)                          | NO   |     | NULL              |                             |
| cr_command      | varchar(200)                         | NO   |     |                   |                             |
| cr_principal    | varchar(30)                          | NO   |     | NULL              |                             |
| cr_active       | set('Yes','No')                      | YES  |     | NULL              |                             |
| cr_email        | varchar(60)                          | YES  |     | NULL              |                             |
| cr_email_output | int(1)                               | YES  |     | 0                 |                             |
| cr_description  | text                                 | YES  |     | NULL              |                             |
| cr_type         | enum(SEE NOTE BELOW)                 | YES  |     | NULL              |                             |
| cr_months       | set('*','1','2', ...,'11','12')      | YES  |     | *                 |                             |
| cr_days_month   | set('*','1','2', ... ,'30','31')     | YES  |     | *                 |                             |
| cr_days_week    | set('*','0','1', ... ,'5','6')       | YES  |     | *                 |                             |
| cr_hours        | set('*','0','1', ... '22','23')      | YES  |     | *                 |                             |
| cr_minutes      | set('*','0','5','10', ... ,'55')     | YES  |     | *                 |                             |
| cr_modified     | timestamp                            | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-----------------+--------------------------------------+------+-----+-------------------+-----------------------------+
```

NOTE: The `enum` for the `cr_type` field is `('QuarterHourly',
'HalfHourly', 'Hourly', 'Daily', 'Weekly', 'Monthly', 'Yearly', 'Custom')`.

## Testing

### Running a Job on a Job Server

Use the included [`/usr/sbin/run-job`](usr/sbin/run-job) to manually run a
Job on a specific Job Server.

1. Find the Job id of the job you want to run, e.g., 1367.

1. Pick a Job Server, e.g., `www-dev.stanford.edu`.

1. Run the job

        $ /usr/sbin/run-job  1367 www-dev.stanford.edu


[1]: https://uit.stanford.edu/service/sharedcomputing/scheduling

[2]: https://code.stanford.edu/debian-packages/stanford-server-tools

[3]: https://ikiwiki.stanford.edu/service/www/cron/

[4]: https://metacpan.org/pod/Schedule::Cron

[5]: https://code.stanford.edu/et-iedo/debian-packages/stanford-server-www

[6]: https://code.stanford.edu/legacy-git/system/lbcd

[7]: https://docs.openafs.org/Reference/1/pts.html
