Optimizing and scaling Strapi

System Information
  • Strapi Version: v.3.4.6
  • Operating System: Ubuntu 18.04
  • Database: Mariadb 10.4
  • Node Version: 12.20.1
  • NPM Version: 6.14.10

Hey!

So I’ve been using Strapi to make a service to which some “systems” send data every few seconds. Data comes in JSON, its different every time - it may or may not have some certain fields. So I iterate through the JSON and make changes to the database accordingly.
So far I had been working with 2 systems and whatever FE requests are made. It has been mostly ok, CPU usage has been around 25%. Then I added another system that sends data to my server.
Running
$ top | grep node

 1613 strapi    20   0  828376  45064  14528 S  0.3  2.2 134:35.84 node                                  
 4672 strapi    20   0  633276  43088  30200 S  0.3  2.1   0:00.46 node                                  
24182 strapi    20   0  596928  57344  30308 S  0.3  2.8   0:24.86 node                                  
 3282 strapi    20   0  996652 212216  41224 S 45.5 10.4  22:25.87 node                                  
 3282 strapi    20   0  991788 207628  41224 S 37.9 10.2  22:27.01 node                                  
 4672 strapi    20   0  633276  42328  30200 S  0.3  2.1   0:00.47 node                                  
 3282 strapi    20   0  982316 198232  41224 S 71.8  9.7  22:29.17 node                                  
 4672 strapi    20   0  633276  42592  30200 S  0.3  2.1   0:00.48 node                                  
 3282 strapi    20   0  978220 193656  41224 S 38.7  9.5  22:30.33 node                                  
 3282 strapi    20   0  977452 193420  41224 S 42.5  9.5  22:31.61 node                                  
 1613 strapi    20   0  828376  44304  14528 S  0.3  2.2 134:35.85 node                                  
 3282 strapi    20   0  997164 212460  41224 R 57.5 10.4  22:33.34 node                                  
 3282 strapi    20   0  989740 205120  41224 S 53.8 10.0  22:34.96 node                                  
 3282 strapi    20   0  989996 205120  41224 S  0.3 10.0  22:34.97 node                                  
 3282 strapi    20   0 1011244 226120  41224 R 50.5 11.1  22:36.49 node                                  
 3282 strapi    20   0  977712 193100  41224 R 54.8  9.5  22:38.14 node                                  
 1613 strapi    20   0  828376  44568  14528 S  0.3  2.2 134:35.86 node                                  
24182 strapi    20   0  596928  57344  30308 S  0.3  2.8   0:24.87 node                                  
 3282 strapi    20   0  996908 212464  41224 S 33.6 10.4  22:39.15 node                                  
 3282 strapi    20   0 1001004 216168  41224 R  9.3 10.6  22:39.43 node                                  
 3282 strapi    20   0 1012012 227068  41224 R 57.1 11.1  22:41.15 node                                  
 3282 strapi    20   0 1013804 229012  41224 S 35.1 11.2  22:42.21 node                                  
 3282 strapi    20   0  997932 213376  41224 R 15.0 10.5  22:42.66 node                                  
 3282 strapi    20   0 1015852 230624  41224 S 57.8 11.3  22:44.40 node                                  
 1613 strapi    20   0  828376  44568  14528 S  0.3  2.2 134:35.87 node   

CPU usage is much higher (around 50%) which is insane knowing that this server will have to support 100x the amount of systems and much more users of the FE .I also started to get errors which don’t really tell me where the problem is or when it occurs, which worries me:

(node:3282) UnhandledPromiseRejectionWarning: Error: ValidationError
at Object.validateEntityUpdate (/home/strapi/next-cam/node_modules/strapi/lib/services/entity- 
validator/index.js:179:25)
(node:3282) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error 
originated either by throwing inside of an async function without a catch block, or by rejecting a 
promise which was not handled with .catch(). To terminate the node process on unhandled promise 
rejection, use the CLI flag `--unhandled-rejections=strict` (see 
https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 187)

My guess would be that the problem lies within the controller I’ve written, specifically the amount of times I call strapi.services. The other thing I fear is that maybe the CPU is carrying too big of a load and can’t resolve promises in time or something. In any case, here is the controller I wrote, there is definitely a lot to improve, however I did not expect it to perform that badly:

'use strict';
const { parseMultipartData, sanitizeEntity } = require('strapi-utils');
const haversine = require('haversine');


/**
 * Read the documentation (https://strapi.io/documentation/developer-docs/latest/concepts/controllers.html#core-controllers)
 * to customize this controller
 */

module.exports = {

  /**
   * Create a record.
   *
   * @return {Object}
   */

  async create(ctx) {

    const data = ctx.request.body.data
    console.log("~~~ cam-data ~~~", data)
    if ( !data.mac ) {

      return ctx.unauthorized('No mac address - no party.')
    }

    const mac = data.mac

    const cam = await strapi.services.cameras.findOne({ mac })
    //console.log(cam)

    if ( cam === null) {
      await strapi.services.cameras.create({ "mac": mac })
    }

    let cameraToUpdate = {}

    if ( cam['board_STATUS'] === 'OFFLINE') {

      cameraToUpdate["offline_sent"] = 0
      cameraToUpdate["board_STATUS"] = 'ONLINE'
    }

    for ( const key in data ) {

		//BATTERY
	if ( key == 'box_V') {
    cameraToUpdate["board_V"] = data[key]

	} else if ( key == 'box_I') {
    cameraToUpdate["board_I"] = data[key]
	  strapi.services['board-log-i'].create({ "camera": cam.id, "value": data[key] })

	} else if ( key == 'box_TTF' ) {
    cameraToUpdate["board_TTF"] = data[key]
	  strapi.services['board-log-ttf'].create({ "camera": cam.id, "value": data[key] })

	} else if ( key == 'box_TTE' ) {
    cameraToUpdate["board_TTE"] = data[key]
    strapi.services['board-log-tte'].create({ "camera": cam.id, "value": data[key] })

  } else if ( key == 'box_BMS') {
    strapi.services['board-log-bms'].create({ "camera": cam.id, "value": data[key] })
	  cameraToUpdate["board_BMS"] = data[key]

  } else if ( key == 'box_T') {
    strapi.services['board-log-t'].create({ "camera": cam.id, "value": data[key] })
	  cameraToUpdate["board_T"] = data[key]

	} else if ( key == 'box_C') {
    cameraToUpdate["board_C"] = data[key]
    strapi.services['board-log-c'].create({ "camera": cam.id, "value": data[key] })

  } else if ( key == 'box_SOC') {

	  let notified = cam['board_SOC_notified']

	  if ( parseInt(data[key]) < parseInt(cam['custom_charge_percentages']['orange_from']) ) {
	    if ( !cam['board_SOC_notified'] ) {

              let usersToNotify = cam.users
              for (let user in usersToNotify) {
                if (!usersToNotify[user].notify_about_system) {
                  usersToNotify.splice(user, 1)
                }
              }

	      strapi.services['notification'].create({ "camera": cam.id, "event": "Low battery.", "event_type": 1, "users": usersToNotify })
	      notified = 1
	    }
	  } else {
	    notified = 0
    }

    cameraToUpdate["board_SOC"] = data[key]
    cameraToUpdate["board_SOC_notified"] = data[key]

    strapi.services['board-log-soc'].create({ "camera": cam.id, "value": data[key] })

	} else if ( key == 'box_STATUS' ) {
    cameraToUpdate["board_STATUS"] = data[key]

    strapi.services['board-log-status'].create({ "camera": cam.id, "value": data[key] })

	  if (data[key].toString() === 'UNPLUGGED') {
	    let usersToNotify = cam.users
	    for (let user in usersToNotify) {
	      if (!usersToNotify[user].notify_about_system) {
		usersToNotify.splice(user, 1)
	      }
	    }
	    strapi.services['notification'].create({ "camera": cam.id, "event": "Camera has been unplugged.", "event_type": 1, "users": usersToNotify })
	  }

	} else if ( key == 'cv' && !cam.config_version) {
    cameraToUpdate["config_version"] = data[key]

  } else if ( key == 'ncell' ) {
    cameraToUpdate["ncell"] = data[key]

	} else if ( key == 'cell' ) {
	  cameraToUpdate["cell"] = data[key]

  } else if ( key == 'lat') {

	  if ( haversine({ latitude: data['lat'], longitude: data['lon']}, { latitude: cam.gps_lat, longitude: cam.gps_lon }) > (cam.loc_diff + data['hdop']) ) {

	    let usersToNotify = cam.users
            for (let user in usersToNotify) {
              if (!usersToNotify[user].notify_about_system) {
                usersToNotify.splice(user, 1)
              }
            }

            strapi.services['notification'].create({ "camera": cam.id, "event": "System Moving", "event_type": 1, "users": usersToNotify })

	  }

	  let lon = data['lon']
	  let lat = data['lat']

	  if (!lon) {
	    lon = '24.117770203344044'
	    lat = '56.98139173494777'
	  }
    cameraToUpdate["loc_lon"] = lon
    cameraToUpdate["loc_lat"] = lat
    cameraToUpdate["fix"] = data['fix']
    cameraToUpdate["loc"] = data['loc']
    cameraToUpdate["gps_hdop"] = data['hdop']

  } else if ( key == 'mob' ) {
    cameraToUpdate["mobile_state"] = data[key]

	} else if ( key == 'co' ) {
    cameraToUpdate["camera_online"] = data[key]

  } else if ( key == 'wo' ) {
    cameraToUpdate["wifi_only"] = data[key]

  } else if ( key == 'tun0' ) {
    cameraToUpdate["clients_vpn"] = data['tun0']

  } else if ( key == 'wlan0' ) {
    cameraToUpdate["wlan_ip"] = data['wlan0']

  } else if ( key == 'vpn_cam' ) {
    cameraToUpdate["camera_ip"] = data[key]

  } else if ( key == 'vpn_ssh' ) {
    cameraToUpdate["ssh_ip"] = data[key]

  } else if ( key == 'script' ) {
    cameraToUpdate["script_version"] = data[key]

  } else if ( key == 'linux' ) {
    cameraToUpdate["sw_vers"] = data[key]

  } else if ( key == 'mob' ) {
    cameraToUpdate["mobile_state"] = data[key]

  } else if ( key == 'apn' ) {
    cameraToUpdate["apn"] = data[key]

  } else if ( key == 'wifi' ) {
    cameraToUpdate["wifi_params"] = data[key]

  } else if ( key == 'mobotix_text' ) {
    cameraToUpdate["mobotix_text"] = data[key]

  } else if ( key == 'imei' ) {
    cameraToUpdate["imei"] = data[key]

  } else if ( key == 'update_status' ) {
    cameraToUpdate["update_status"] = data[key]

  } else if ( key == 'rtsp' ) {
    cameraToUpdate["rtsp"] = data[key]

  } else if ( key == 'force_wifi' ) {
    cameraToUpdate["force_wifi"] = data[key]
  }
}

  ctx.response.status = 200

Maybe someone has some ideas what I should check out/ look out for.

Question for you, is this a single Strapi backend instance or are you scaling it horizontally/vertically?

Galera cluster (MariaDB 10.4) of 3 servers, each with a Strapi instance. All currently running on the 10$ droplet / server on DigitalOcean, Linode and Vultr (2GB ram, 1 shared CPU (found out yesterday that its shared, maybe thats a part of the problem)). The data I’ve provided is from one Strapi instance, there is currently no load balancing or anything of the sort. Everything is being tested on this server only.

Plan is to keep the current number of servers (3), and scale vertically. Not sure if just upgrading the server specs will help - I would still be using only one core, would it handle 100-300 requests every 2-3 seconds to the controller I’ve posted above? I am starting to get a bit sceptical.

Is there an implementation with worker threads that someone has done with strapi? I was thinking maybe I could delegate one thread for database requests, one for logical/ math operations (like the haversine function). But frankly, I’ve never done workers on node and I wouldn’t want to screw something up beyond repair, although that might help a lot.

P.S. I actually think something might be more than off, these are the Strapi logs (pm2) after editing controller (see below):

0|strapi  | [2021-04-15T06:55:45.389Z] debug POST /cam-data (1023 ms) 200
0|strapi  | [2021-04-15T06:55:52.958Z] debug POST /cam-data (771 ms) 200
0|strapi  | [2021-04-15T06:55:57.023Z] debug POST /cam-data (2249 ms) 200
0|strapi  | [2021-04-15T06:55:57.043Z] debug POST /cam-data (2273 ms) 200
0|strapi  | [2021-04-15T06:55:59.547Z] debug POST /cam-data (753 ms) 200
0|strapi  | [2021-04-15T06:56:03.860Z] debug POST /cam-data (1310 ms) 200
0|strapi  | [2021-04-15T06:56:06.632Z] debug POST /cam-data (955 ms) 200

I edited the controller to see whats taking up so much processing power:

async create(ctx) {

  const data = ctx.request.body.data

  if ( !data.mac ) {

    return ctx.unauthorized('No mac address - no party.')
  }

  const mac = data.mac

  const cam = await strapi.services.cameras.findOne({ mac }) // another test below without "const cam = await "
  return {} // made a new return earlier, to isolate the problem-maker

second test without await and saving returned data to variable:

0|strapi  | [2021-04-15T07:03:03.676Z] debug POST /cam-data (13 ms) 200
0|strapi  | [2021-04-15T07:03:05.475Z] debug POST /cam-data (10 ms) 200
0|strapi  | [2021-04-15T07:03:12.324Z] debug POST /cam-data (12 ms) 200
0|strapi  | [2021-04-15T07:03:14.158Z] debug POST /cam-data (10 ms) 200
0|strapi  | [2021-04-15T07:03:16.043Z] debug POST /cam-data (18 ms) 200
0|strapi  | [2021-04-15T07:03:17.316Z] debug POST /cam-data (13 ms) 200
0|strapi  | [2021-04-15T07:03:18.371Z] debug POST /cam-data (26 ms) 200

I mean, one to two seconds to find a record, I must be doing something seriously wrong, right? It was like 80 ms a few days ago if running the full code of the controller.

Found the problem:

when I call const cam = await strapi.services.cameras.findOne({ mac })
is when the trouble begins. Turns out that the cam had accumulated ~3500 notifications which come in a form of a relation. So whenever I asked for this system’s data, I also got the data of 3500 notifications.

That is why it was so slow. But this makes me think:

Even if I had 3500 notifications associated with the system, would I still experience slower speeds if I would specify the fields I need using the populate argument? (Will report back here, when done experimenting)

 strapi.query('restaurant').find(params, populate);

I’ve noticed that using await can be quite blocking, maybe there is a way to use a callback function instead of await. Would it really matter though and make requests faster to proccess?

1 Like

There is some diminishing returns on that, in testing (I have a Threadripper 1950x 16c/32t) and using PM2 clusters I didn’t see much benefit beyond a Strapi PM2 cluster larger than about 12 due to the overhead in PM2 and distributing the load to the cluster.

Only if you aren’t using PM2 clusters

Not currently no but again PM2 clusters

If you don’t need to populate those relations you can modify that to be:
const cam = await strapi.services.cameras.findOne({ mac }, []) which won’t populate the notifications (or any relation).

1 Like

If you don’t need to populate those relations you can modify that to be:
const cam = await strapi.services.cameras.findOne({ mac }, []) which won’t populate the notifications (or any relation).

I’m not sure I understood correctly - are you saying that in this case (empty array brackets as populate argument) I will get all cam data except all relations? Thats really cool!

P.S. Sorry for the late reply, but thank you so much @DMehaffy, you’ve been a huge help!