Strapi backend application keeps getting `ECONNRESET`

Gid · April 19, 2022, 3:21am

System Information

Strapi Version: “^3.6.6”
Operating System: Amazon Linux (deployed in ECS fargate). Dockerfile is using strapi/base
Database: mysql
Node Version: >=10.16.0 <=14.x.x
NPM Version: ^6.0.0
Yarn Version:

We are using strapi for our service but we keep encountering errors that are not replicable but often occur during low traffic. Any tips are appreciated. Thank you

Whole Flow Setup

Frontend (FE)
- FE AWS ALB → Apache (=mod_rewrite + mod_proxy) → Nuxtjs (FE)
Backend (BE) = Strapi
- Nuxt.js (FE) → BE AWS ALB → Strapi (BE)

Errors

Triggering requests from FE to BE will often result in HTTP 502 (ECONNTIMEOUT/ETIMEDOUT)
We only observe that it will only happen during early morning hours where-in the requests load is not that huge
Experiencing the same problem mentioned in the following articles
- https://github.com/nodejs/node/issues/27363
- Dealing with Intermittent 502's between an AWS ALB and Express Web Server · Adam Crowder
- It seems the connection from BE AWS ALB → Strapi (BE) is terminated forcibly without the ALB acknowledging that BE already closed the connection causing 502 errors.
Below is the sample error from our logs for ECONNRESET error.

Message: [2022-04-12T02:56:04.741] [ERROR] default - FetchError: request to https://URL_HERE failed, reason: read ECONNRESET
   at ClientRequest.<anonymous> (/PATH_TO_FILE/index.js:1461:11)
   at ClientRequest.emit (events.js:400:28)
   at TLSSocket.socketErrorListener (_http_client.js:475:9)
   at TLSSocket.emit (events.js:400:28)
   at emitErrorNT (internal/streams/destroy.js:106:8)
   at emitErrorCloseNT (internal/streams/destroy.js:74:3)
   at processTicksAndRejections (internal/process/task_queues.js:82:21) {
 type: 'system',
 errno: 'ECONNRESET',
 code: 'ECONNRESET'

Solutions that were implemented but did not fixed the issue

The same strapi’s keepAliveTimeout and headersTimeout value
- strapi service’s load balancer idle timeout: 300 seconds
- strapi service’s keepAliveTimeout: 301 seconds
- strapi service’s headersTimeout: 301 seconds
The strapi’s headersTimeout greater than keepAliveTimeout
- strapi service’s load balancer idle timeout: 300 seconds
- strapi service’s keepAliveTimeout: 301 seconds
- strapi service’s headersTimeout: 302 seconds
Disable both headersTimeout and keepAliveTimeout
- strapi service’s load balancer idle timeout: 300 seconds
- strapi service’s keepAliveTimeout: 0 seconds
- strapi service’s headersTimeout: 0 seconds

DMehaffy · April 19, 2022, 6:53am

502 are gateway failures, typically in a proxy layer where the proxy can’t talk to the upstream service. You say that you have apache with mod_proxy for your frontend but I see no proxy layer except ALB there, is that the only proxy layer?

Gid · April 19, 2022, 8:32am

Yes. We have setup this via apache.

Frontend (FE)
- FE AWS ALB → Apache (=mod_rewrite + mod_proxy) → Nuxtjs (FE)

A front end application will process request and for someEndpoint requests it will be processed by the Nuxtjs application. This is configured with apache as shown below.

        SetEnv proxy-nokeepalive 1																										
        SetEnv proxy-initial-not-pooled 1																										
        ProxyPass /someEndpoint http://localhost:3000/someEndpoint																										
        ProxyPassReverse /someEndpoint http://localhost:3000/someEndpoint

and the Nuxtjs application will then communicate to our strapi backend through ALB which is described in the below flow

Nuxt.js (FE) → BE AWS ALB → Strapi (BE)

However, our strapi service sometimes responds the ECONNRESET error. This is happening on requests that are only communicated with strapi service and during low traffic.

DMehaffy · April 26, 2022, 6:46pm

ECONNRESET is generally a network layer issue where the connection was closed/reset. It could be happening between the ALB or between Strapi and it’s database but I would need more information to know for sure.

devpascoe · April 29, 2022, 10:53pm

I’m trying to resolve what appears to be the same. Getting intermittent 502 failed to fetch in my Sentry error logs only 1 or 2 a day. Tried different configurations for Ram and CPU since ECS is hosting the strapi v3 app.

How did you configure strapi keep alive timeout? Can’t find any info on where to toggle that.

Roy · February 12, 2023, 9:57am

To troubeshoot the problem, you can firstly put nuxt infront of ELB and temporary remove apache.
You should be able to tune up Nuxtjs with a larger KeepAliveTimeout and headersTimeout with a hook nuxtjs config that larger than that of ELB.

In Strapi, also doing the same things in it’s bootstrap.

Roy · February 12, 2023, 10:00am

You can try to set it in /src/index.js, You can bootstrap it, with the following code.
I think turning on keepAlive is necessary. Hope this help, for any detail setting, feel free to leave me message.

bootstrap({ strapi }) { strapi.server.httpServer.keepAliveTimeout = 35000; strapi.server.httpServer.headersTimeout = 36000; strapi.server.httpServer.keepAlive = true; },

v1talii-dev · December 13, 2024, 3:39pm

@Gid Did this solution help you or what was the problem in the end?

Gid · December 13, 2024, 4:31pm

Sorry, I’m no longer associated with my previous company where I encountered this issue, and wasn’t able to check on this for the longest time due to difference of domain/tech stack in my current work.

IIRIC we did adjust the keepAliveTimeout, headersTimeout , and keepAlive settings of strapi but it did not totally fixed the issue. We tried different combination of timeout configurations but we still couldn’t fix and/or understand the issue. It did lessen(not significantly) the occurrence of the error but the issue still occur.

v1talii-dev · December 16, 2024, 6:06am

Thank you for your detailed answer.