This adds support for running multiple job queue tasks in parallel within the same workflow while preventing conflicts. Previously, this would have caused the following issues: - Job log entries get lost - the final job log is incomplete, despite all tasks having been executed - Write conflicts in postgres, leading to unique constraint violation errors The solution involves handling job log data updates in a way that avoids overwriting, and ensuring the final update reflects the latest job log data. Each job log entry now initializes its own ID, so a given job log entry’s ID remains the same across multiple, parallel task executions. ## Postgres In Postgres, we need to enable transactions for the `payload.db.updateJobs` operation; otherwise, two tasks updating the same job in parallel can conflict. This happens because Postgres handles array rows by deleting them all, then re-inserting (rather than upserting). The rows are stored in a separate table, and the following scenario can occur: Op 1: deletes all job log rows Op 2: deletes all job log rows Op 1: inserts 200 job log rows Op 2: insert the same 200 job log rows again => `error: “duplicate key value violates unique constraint "payload_jobs_log_pkey”` Because transactions were not used, the rows inserted by Op 1 immediately became visible to Op 2, causing the conflict. Enabling transactions fixes this. In theory, it can still happen if Op 1 commits before Op 2 starts inserting (due to the read committed isolation level), but it should occur far less frequently. Alongside this change, we should consider inserting the rows using an upsert (update on conflict), which will get rid of this error completely. That way, if the insertion of Op 1 is visible to Op 2, Op 2 will simply overwrite it, rather than erroring. Individual job entries are immutable and job entries cannot be deleted, thus this shouldn't corrupt any data. ## Mongo In Mongo, the issue is addressed by ensuring that log row deletions caused due to different log states in concurrent operations are not merged back to the client job log, and by making sure the final update includes all job logs. There is no duplicate key error in Mongo because the array log resides in the same document and duplicates are simply upserted. We cannot use transactions in Mongo, as it appears to lock the document in a way that prevents reliable parallel updates, leading to: `MongoServerError: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction`
30 lines
817 B
TypeScript
30 lines
817 B
TypeScript
import type { WorkflowConfig } from 'payload'
|
|
|
|
export const parallelTaskWorkflow: WorkflowConfig<'parallelTask'> = {
|
|
slug: 'parallelTask',
|
|
inputSchema: [],
|
|
handler: async ({ job, inlineTask }) => {
|
|
const taskIDs = Array.from({ length: 500 }, (_, i) => i + 1).map((i) => i.toString())
|
|
|
|
await Promise.all(
|
|
taskIDs.map(async (taskID) => {
|
|
return await inlineTask(`parallel task ${taskID}`, {
|
|
task: async ({ req }) => {
|
|
const newSimple = await req.payload.db.create({
|
|
collection: 'simple',
|
|
data: {
|
|
title: 'parallel task ' + taskID,
|
|
},
|
|
})
|
|
return {
|
|
output: {
|
|
simpleID: newSimple.id,
|
|
},
|
|
}
|
|
},
|
|
})
|
|
}),
|
|
)
|
|
},
|
|
}
|