System Information
- Strapi Version: 4.7.1
- Operating System: macos
- Database: postgres
- Node Version: 18.13.0
- NPM Version:
- Yarn Version:
I have a system that processes data about companies.
It starts by importing the public data about the company and then enriching the data about the company. Different jobs add information by scraping their web page and looking up information about the company in various registers.
The number of companies that are to be processed is about a million (all organizations in Norway).
All these lookup jobs are done as cron tasks, and there are many of them. Each job is responsible for looking up a API and adding data about the company.
The company record has a field, jobName, that defines what job should process the company record next.
When a job starts, it picks the 100 companies that have the jobName that the job is processing and then processes them.
Then the jobName is set to the name of the next job, and this goes on until all the jobs have been processed and the company record has data from all kinds of registers.
To prevent cron from starting more than one instance of a job, I have a table, mergeJob, that contains a list all jobs that can be started.
For each job in mergeJob, there is the status of the job. The field mergeJob.jobStatus can be “Running”, “Ready”, “Finished,” and so on.
This means that if cron should start a job that is already running, it will not start a new instance of the job.
This worked fine during my testing and development. So I know that the code for looking up and storing all kinds of data works.
When testing with all the data, I get problems related to database connections. It runs for a while and then it runs out of database connections. I have set DATABASE_POOL_MAX=80 so that I have 80 connections to the database. I can monitor the postgres database and I se that when all connections are used I get the error:
“KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a “transacting” (trx) call?”
It seems that the jobs are not releasing the database connections. One of my jobs is accessing a PostgreSQL database directly using pg. Here I call the function pg.end() to release the database connection.
But how is this done in strapi?
So my question is: where and how do I release the database connections in Strapi?
Below is an extract of my code. I use the job maestro as the example. maestro is the job that moves companies from one job to the next.
The system is implemented as a strapi plugin.
// from the run_maestro() function I call strapi.entityService.findMany and strapi.entityService.update etc.
// do I need to release the database connection after each call to strapi.entityService.findMany and strapi.entityService.update etc.?
try {
strapiResponse = await strapi.entityService.findMany('api::merge-job.merge-job', {
fields: ['id', 'jobName', 'jobStatus'],
filters: { jobName: jobName },
});
if (strapiResponse.length == 1) {
mergeJobRecord = strapiResponse[0];
}
catch (e) {
console.log("Error in findMany mergeJobRecord e=", e);
debugger;
}
if (strapiResponse.jobStatus != "Running") {
// Update the mergeJobRecord to Running
// get all companies that has jobName=="maestro" and process them
// ...
// Update the mergeJobRecord to Finished
}
This is how I start the cron tasks:
// config/server.ts has loads my cron-tasks.ts file that has the following code:
export default {
maestro: {
task: async ({ strapi }) => {
let returnString = "";
returnString = await strapi
.plugin('urbalurba')
.service('urbalurbaMergeService')
.maestro();
},
options: {
rule: '*/2 * * * *', // Every 2 minutes
tz: 'Europe/Oslo',
},
}
};
// src/plugins/urbalurba/services/urbalurbaMergeService.ts
export default ({ strapi }: { strapi: Strapi }) => ({
async maestro() {
let returnText = await run_maestro();
return returnText;
}
});