Package metadata
================

Package metadata contains mostly static information about a package, imported
straight from the archive. This includes the package name, maintainer email,
uploader emails (if any) as well as the version control information
(vcs type, URL, subpath) and optionally popularity.

The "schedule" job regularly imports package metadata. On Debian, this information
comes from UDD. On other Debian-like distributions, it's imported from the
apt sources file.

The importing has two components:

 * A script that can output Package() protobufs (see janitor/package_metadata.proto) to standard out
 * An importer that reads these protobufs on standard in and updates the database (janitor.package_metadata)

Candidates
==========

Once the janitor knows about a package, candidates can be created. A candidate
is a bit of data that a particular suite (TODO: better name) (e.g. lintian-fixes)
can be run on a particular package and that there is some chance it will yield
changes.

Candidates include information like:

 * value: a relative number that explains how useful this change would be
 * success_chance: an estimate of how likely this change is to succeeed and result in a build
 * context: some indicator of the current state of the world. Used to avoid retrying
     builds if nothing has really changed. e.g. for new upstream releases, this
     is the upstream version number of the latest release

Like package metadata, candidates are generated by a script that writes
protobufs to standard output. The ``janitor.candidates`` module
then reads that output and updates the database. Candidate generation scripts
can be really complicated - allowing for more optimal scheduling - or really
simple, in which case they just output a candidate for each package in a suite
with fixed settings for value and succes_chance.

Scheduling
==========

Once candidates have been created, the schedule job (``janitor.schedule``)
inserts new entries into the queue, taking into account a variety of factors:

 * success chance
 * value
 * popularity of the package if known (from popcon)
 * previous success rate (for the suite/package combination and the package itself)
 * previous run duration
 * whether the context has changed since the last run

The queue consists of prioritized buckets. Manually requested runs, runs triggered
by the publisher (e.g. to resolve merge conflicts) and retried runs are always
executed before runs that were scheduled by the scheduler.
