Previously, jobs were executed in FIFO order on MongoDB, and LIFO on
Postgres, with no way to configure this behavior.
This PR makes FIFO the default on both MongoDB and Postgres and
introduces the following new options to configure the processing order
globally or on a queue-by-queue basis:
- a `processingOrder` property to the jobs config
- a `processingOrder` argument to `payload.jobs.run()` to override
what's set in the jobs config
It also adds a new `sequential` option to `payload.jobs.run()`, which
can be useful for debugging.
This adds support for running multiple job queue tasks in parallel
within the same workflow while preventing conflicts. Previously, this
would have caused the following issues:
- Job log entries get lost - the final job log is incomplete, despite
all tasks having been executed
- Write conflicts in postgres, leading to unique constraint violation
errors
The solution involves handling job log data updates in a way that avoids
overwriting, and ensuring the final update reflects the latest job log
data. Each job log entry now initializes its own ID, so a given job log
entry’s ID remains the same across multiple, parallel task executions.
## Postgres
In Postgres, we need to enable transactions for the
`payload.db.updateJobs` operation; otherwise, two tasks updating the
same job in parallel can conflict. This happens because Postgres handles
array rows by deleting them all, then re-inserting (rather than
upserting). The rows are stored in a separate table, and the following
scenario can occur:
Op 1: deletes all job log rows
Op 2: deletes all job log rows
Op 1: inserts 200 job log rows
Op 2: insert the same 200 job log rows again => `error: “duplicate key
value violates unique constraint "payload_jobs_log_pkey”`
Because transactions were not used, the rows inserted by Op 1
immediately became visible to Op 2, causing the conflict. Enabling
transactions fixes this. In theory, it can still happen if Op 1 commits
before Op 2 starts inserting (due to the read committed isolation
level), but it should occur far less frequently.
Alongside this change, we should consider inserting the rows using an
upsert (update on conflict), which will get rid of this error
completely. That way, if the insertion of Op 1 is visible to Op 2, Op 2
will simply overwrite it, rather than erroring. Individual job entries
are immutable and job entries cannot be deleted, thus this shouldn't
corrupt any data.
## Mongo
In Mongo, the issue is addressed by ensuring that log row deletions
caused due to different log states in concurrent operations are not
merged back to the client job log, and by making sure the final update
includes all job logs.
There is no duplicate key error in Mongo because the array log resides
in the same document and duplicates are simply upserted. We cannot use
transactions in Mongo, as it appears to lock the document in a way that
prevents reliable parallel updates, leading to:
`MongoServerError: WriteConflict error: this operation conflicted with
another operation. Please retry your operation or multi-document
transaction`
Previously, our job queue system relied on `payload.*` operations, which
ran very frequently:
- whenever job execution starts, as all jobs need to be set to
`processing: true`
- every single time a task completes or fails, as the job log needs to
be updated
- whenever job execution stops, to mark it as completed and to delete it
(if `deleteJobOnComplete` is set)
This PR replaces these with direct `payload.db.*` calls, which are
significantly faster than payload operations. Given how often the job
queue system communicates with the database, this should be a massive
performance improvement.
## How it affects running hooks
To generate the task status, we previously used an `afterRead` hook.
Since direct db adapter calls no longer execute hooks, this PR
introduces new `updateJob` and `updateJobs` helpers to handle task
status generation outside the normal payload hook lifecycle.
Additionally, a new `runHooks` property has been added to the global job
configuration. While setting this to `true` can be useful if custom
hooks were added to the `payload-jobs` collection config, this will
revert the job system to use normal payload operations.
This should be avoided as it degrades performance. In most cases, the
`onSuccess` or `onFail` properties in the job config will be sufficient
and much faster.
Furthermore, if the `depth` property is set in the global job
configuration, the job queue system will also fall back to the slower,
normal payload operations.
---------
Co-authored-by: Dan Ribbens <DanRibbens@users.noreply.github.com>
This adds new `payload.jobs.cancel` and `payload.jobs.cancelByID` methods that allow you to cancel already-running jobs, or prevent queued jobs from running.
While it's not possible to cancel a function mid-execution, this will stop job execution the next time the job makes a request to the db, which happens after every task.