Database Migration / Deployment Questions

dirtybirdnj · October 13, 2020, 10:24pm

Hello,

I am working with a small team to use Strapi as our API for development on a project. One of the things I’m used to with other frameworks is the ability to control changes to the database.

It seems to me a completely missing feature of Strapi to not have any control over generating schema files, creating a fresh database from a given codebase, and then filling this database with seed data. This is all well and good for local development, but this becomes problematic when you deploy new code attached to a database schema created by old code.

The main thing I’ve been struggling to find is a tool that will let me dump the schema of the database that Strapi generates. This would allow me to check something into the repo that could be run during local setup / build processes.

Can anybody shine a light on how these things are handled in active development or production?

Right now we are in development so it’s ok to plan to backup a local dev db, pull down latest changes to our strapi codebase on the server, and then restore the local DB the server to ensure correct structure. I would prefer to just run a command that told strapi to update it’s database after I pulled the latest code down.

An additional fold to this is usually seeding comes after migration… any ideas on how teams are handling seeding and migration would be greatly appreciated.

My dream scenario would be some sort of CLI tool that allowed you to create knex migrations from existing database schema. That would totally solve my problem… but it doesn’t exist. If I want to use knex migration tools with strapi, unless I’m missing something I’d have to manually create schema files and also manually keep track of any DB changes strapi makes and know to make new migration files any time a model has changed.

Working with strapi so far has been AMAZING but this is the first real issue I haven’t been able to find a solution to in the docs or on stackoverflow

chocksy · October 14, 2020, 2:54pm

From the discussions i saw on strapi github and the forum there is no way currently for this to work.

After contacting the community on slack i received a message that we should continue the discussion on the forum. So instead of making a new thread i think we should follow up here.

What i want to propose is that we team up and build set plugin to handle this.

Some of the good ideas:

use a strict setting that enables this so strapi does not make new columns or anything automatically so we can control the db changes
create the CLI commands for handling migrations like: strapi generate migration, strapi run migration, strapi schema load
see how we can handle these generators and commands by calling the existing DB layer that currently is using knex
knexjs already has a migration process with cli methods and everything: http://knexjs.org/#Migrations-CLI we should try and get a layer on top of this or suggest a change in the existing strapi DB layer to call these.

I would love some feedback on this so we know what would be the next steps here.

dirtybirdnj · October 14, 2020, 7:45pm

What you are proposing is very cool. I’d be interested in helping out with adding some of this functionality to the strapi codebase.

The list you provided offers a perfect roadmap of what’s missing:

strict mode / automigrate on or off (admin setting? .env config?)
cli commands for:

Generating migration files from existing database schema
Executing the migration against the database
Executing knex seeds against the now fresh migrated but empty database
I don’t know if “strapi schema load” is necessary because the admin CMS is what’s creating those files to begin with. I am hoping to focus effort on creating the knex artifacts required to reproduce schema and data independently.

I agree wholeheartedly with using knex as a base for this functionality, it’s the node way of handling this stuff and seems like a natural fit.

I’m available on the Strapi Slack as “Mat Gilbert” please reach out to me there. I’d love to get connected with whoever can point me in the right direction on creating tickets / proposals for this functionality that would lead to pull requests on the strapi repo.

bykof · October 14, 2020, 8:23pm

So the whole topic is really complex if you start to work on it… here are some problematic points that I see:

If we implement a migration plan for Strapi, what do we do with the existing models? For example all the models of strapi-admin or strapi-upload… Surely we can create a “starting” migration, but this will lead also to pretty hard problems
Strapi Models and Strapi Migrations should go hand in hand, there is no opt in… If you create a model over the interface there should be always a migration created otherwise this will lead to inconsistency. For example if I create a model with the UI strapi creates me a table in the database. Now add some field X to the model. Now I want to rename X in Y. At this point Strapi would create just the field Y and leave X as it is. Now I would write a migration to copy all data from X to Y. After that I delete the field X with another migration. And here is the problem: If I setup a new server with this strapi models and migrations this would create an error. Strapi would create all fields automatically, so the model with field Y is created. The field X would never have been created and therefore the migrations would become useless.
The best example for migrations are Django Migrations. You write a new model. After that you run: ./manage.py createmigrations then django will compare the model with the database and will create an automatic generated migration for you unless there are questions. And here comes the nice stuff: If you want to have a new field which is not nullable you surely have to define a default value. You can define it over the CLI or programmatically, but Django tries really to automate all the stuff as far as it is able to do so.
First we have to define what we really need. I think for now it’s really hard to fulfill all interests. We should talk about following points:

- what is the purpose of migrations in strapi?
- what is realistic?
- what fits into the strapi ecosystem?

Cheers guys, looking forward to read from you

dirtybirdnj · October 14, 2020, 9:07pm

At least for my use case, I’m less interested in any of the strapi base models and only concerned with things that are created through the content builder. The only tables / models that should show up in the migration artifacts should be ones that are not created by strapi itself / base plugins. Let strapi do it’s thing to manage internals, give developers control over how the new_field gets added to the developer created model.

The main purpose behind my request is a config file to dictate how the database gets built, while still allowing users / developers to leverage the benefits of the content builder UI. There are lots of forum posts about extra fields / tables being left behind and having explicit control over what fields exist in the migration file… wouldn’t it be nice to be able to delete those fields from the migration, reload the DB and have them be cleanly gone?

One thing I’d like to clarify on migrations is the concept of an initial migration vs incremental development migrations. As a starting point, the first goal would be to have a single migration file that gets added to / updated. This would allow avoiding the overhead of writing code that looked at strapi models, looked at migrations and decided what commands went into the second, third etc knex migration file created.

As far as what’s needed the main “thing” to make at this point is something that looks at a relational DB (mysql, postgres) and generates a knex migration .js file. There is nothing particularly strapi about this… in fact I bet other frameworks would benefit from this functionality.

To bridge the gap between migrations and (somewhat) unrelated seeding… if you have explicit control over what the structure of the DB is at build time, you have a much better time creating seed files that will work with the underlying schema.

Maybe I’m looking at this wrong and I just need to hear some guidance on how people have managed these DB issues in real world scenarios? I’m part of a small team using strapi for the first time so there are definitely things I’m not familiar with / aware of yet. We are working on an early product so the data model changes frequently and this is where strapi shines. The way strapi auto-migrates stuff makes me nervous and looking forward to supporting something we build in strapi makes me want to know more about how to control what it does to the DB in a predictable, granular way. This could be a new fancy CLI tool, or it could also just be as simple as a readme that covers a few workflows like seeding a new DB or deploying changes that modify existing models to a “production” dataset.

To provide some context, here’s our current git / db changes workflow:

ssh into dev server
running strapi on our dev server with pm2 via npm run start. This apparently runs strapi in a way where the auto-reload feature doesn’t function… and conveniently seems to prevent DB changes from occurring. This functionally provides me that “automigrate on/off” I described in a previous post.
When we want to update the app, pm2 stop strapi to halt the process
git pull from the project directory.
Then when I run npm run develop, the app makes the changes to the database and the new changes are reflected in the UI.
Once that’s confirmed I kill the manually started process and pm2 start strapi to put the admin UI back in “don’t make changes” mode.

Thank you guys so much for the constructive discussion on this!

chocksy · October 16, 2020, 3:55pm

In the meantime here is what I did to make the missing pieces work. I created this knexfile.js into the root of my project:

  const strapi = require("strapi")({ serveAdminPanel: false });

  module.exports = {
   client: "pg",
   connection: strapi.config.database.connections[strapi.config.database.defaultConnection].settings,
   pool: {
     min: 2,
     max: 10,
   },
   migrations: {
     tableName: "knex_migrations",
   },
 };

This allows me to run the knex migration CLI commands so I can leave strapi with creating the columns as it usually does but also be able to generate migrations to delete columns or update them if needed.

So the above means that I can run: yarn knex migrate:make test to generate a new migration.

And I can later to this yarn knex migrate:latest to run set migration.

Read more here: http://knexjs.org/#Installation-migrations
Tutorial: Database Migrations with Knex

This should temporarily solve the problem for most people but I believe we still should try and address the plugin solution so we can have an option to stop strapi from generating new columns on it’s own or if it does that to generate a migration file instead of just adding the column and there is no actual history of it’s actions.

For me at least that would be enough. I don’t think we should try and solve the problem of renaming a column that is too complex and it requires knowing the past column name. If a rename happens we can have strapi generate a new column and then we can make a new migration as I did above that would copy the data from the old column and later drop that.

dirtybirdnj · October 16, 2020, 4:43pm

chocksy,

Unless I’m unaware about a feature of knex… the only thing missing from your workflow is something that will generate the knex migration files for you. Are you manually looking at each model.settings.json file and creating the initial migration file from that?

so we can have an option to stop strapi from generating new columns on it’s own or if it does that to generate a migration file instead of just adding the column and there is no actual history of it’s actions

100% agree, the end goal right now should be focused on reproducing the latest state of the data model and not figuring out the difference since the last migration

DMehaffy · October 16, 2020, 4:58pm

I agree we need to maintain a history of the model state, we do store some of current state within the core_store table (though that is probably just legacy storage, I honestly couldn’t tell you where it’s used) but I entirely agree here.

Store a snapshot of the state
Make file changes
Update some state history of the change
Generate the Knex migrations

Just wanted to say thank you guys for digging into this and I look forward to seeing what you guys discover.

bykof · October 19, 2020, 6:44am

In the meantime I also discussed this topic with my team and we came to following solution:

We will also use Knex.JS (with a knexfile.js like @chocksy already mentioned) to write, migrate and rollback migrations.

Before Strapi starts we will start the migrations (we use Docker and will write a small startup.sh, which will run yarn knex migrate:latest before yarn start.

Since Strapi always tries to create tables and fields to ‘the current state’ we just have to migrate the database to the ‘current state’ and then we are able to use migrations as optin solution, which is brilliant.

I would suggest that we/I write a blog or documentation post about how to integrate migrations into strapi. Then we don’t have to change the core or touch anything.

Regarding the plugin solution, I am not really sure, if a plugin would be so useful here. I think the major reason for a plugin is something visual with models, but in this case we want to go really deep into the application which should be “static” and not “dynamic” or smth. which is shown to the user.

Cheers guys, love the discussion here

DMehaffy · October 19, 2020, 5:48pm

I would love to take a look at anything you write, depending on the structure and content we can either include it in our documentation or would happily publish a blog post.

chocksy · October 20, 2020, 3:29pm

Thanks a lot @bykof & @DMehaffy for the details. I followed your messages in the thread and yes i also believe we should add these details to the documentation. At least there is some kind of solution even though not ideal.

I wanted to provide the final database.js and knexfile.js file formats i ended up with. I have my application on heroku and i had to adjust to use the DATABASE_URL variable so it’s easier to set up.

The database.js file:

const url = require("url");

let settings = {
  client: "postgres",
  host: `${process.env.DATABASE_HOST || "127.0.0.1"}`,
  port: `${process.env.DATABASE_PORT || 5432}`,
  database: `${process.env.DATABASE_NAME || "obituare_staging"}`,
  username: `${process.env.DATABASE_USERNAME || "postgres"}`,
  password: `${process.env.DATABASE_PASSWORD || "postgres"}`,
};

if (process.env.DATABASE_URL) {
  const parsed = url.parse(process.env.DATABASE_URL, true);
  const [username, password] = parsed.auth.split(":");

  settings.host = parsed.hostname;
  settings.port = Number(parsed.port);
  settings.database = parsed.pathname.substr(1);
  settings.username = username;
  settings.password = password;
  settings.ssl =
    process.env.DATABASE_SSL === undefined
      ? { rejectUnauthorized: false }
      : JSON.parse(process.env.DATABASE_SSL);
}

module.exports = {
  defaultConnection: "default",
  connections: {
    default: {
      connector: "bookshelf",
      settings,
      options: {
        useNullAsDefault: true,
        pool: {
          acquireTimeoutMillis: 10000,
          createTimeoutMillis: 10000,
        },
      },
    },
  },
};

And here is the knexfile.js:

const strapi = require("strapi")({ serveAdminPanel: false });

let connection = strapi.config.database.connections[strapi.config.database.defaultConnection].settings;
let db_ssl = false;

if (process.env.DATABASE_URL) {
  connection = process.env.DATABASE_URL;
  db_ssl = true;
}

module.exports = {
  client: 'pg',
  connection: connection,
  ssl: db_ssl,
  pool: {
    min: 2,
    max: 10,
  },
  migrations: {
    tableName: "knex_migrations",
  },
};

As you can see both look for DATABASE_URL and use that if it exists. There have been some issues with ssl on heroku so that’s how i set those also.

Also in order to make sure developers on our team run migrations first we changed package.json develop script to this: "develop": "knex migrate:latest; strapi develop", though not required.

Thanks.
PS: i’ll try and still look at a way to maybe have strapi generate migrations that would make things perfect.

DMehaffy · October 20, 2020, 4:39pm

Yeah I’m wondering what is the best way to handle that myself as I’ve never used the knex migrations. Are these files generally something we would commit into source control?

chocksy · October 20, 2020, 8:23pm

Yes they need to be commited into the source control. These are a way for the database to be in sync for the whole development team of the project. So if i a developer make a change to the database everybody gets that easily by running one command.

That’s how changes to the database are handled with relational databases. I usually develop using rails and that has also a schema.rb file that has a view of the whole database. You can run one command and you will get the whole DB structure loaded.

jorisw · October 21, 2020, 7:53am

Precisely what I’m looking for. A schema file of the content types, ready to be deployed along with the code, meaning Strapi sets up the database structure and api on first run, post-deploy.

Custom controllers/models could still live in version controlled code, perhaps separately from the code Strapi generates.

bykof · October 21, 2020, 7:55am

Still I am really concered about manual migration and the automigration of strapi.
I see here following problem:
I want to write a migration to inject some data into my application.
Let’s say I have following model:

Notes
- description

Now I write a migration with knex:

exports.up = function(knex) {
    return knex('notes').insert([
      {
        description: 'my first note',
      },
      {
        description: 'my second note',
      },
    ]);
}

exports.down = function(knex) {
  return knex('notes').del();
}

This migration should run after strapi started, because then the field exists in the database, thanks to strapi automigration.

Now I want to rename the field from description to text.
I create the field in the notes.settings.json or add it via the content-type manager and delete the created column in the database.

Then i write following migration:

exports.up = function(knex) {
  return knex.schema.table('notes', (table) => {
      table.renameColumn('description', 'text');
  })
}

exports.down = function(knex) {
  return knex.schema.table('notes', (table) => {
      table.renameColumn('text', 'description');
  })
}

This migration should run before strapi starts otherwise it would create the field before hand and the migration would fail, because there is already an existing migration.

By writing this problem I found the solution.
One could use seed files with knex:
knex seed:run

The seed file from the 1st. migration would look like following:

exports.seed = function (knex) {
  return knex("area").insert([
    {
      areaId: "MyNewAreaId",
      name: "My first area",
    },
    {
      areaId: "MySecondAreaId",
      name: "My second area",
    },
  ]);
};

Edit: Ok problem here is that the run of seed files isn’t stored in the database, so that knex doesn’t know if a seed was already executed or not.
So seeds are used only for first time data inserting.

Maybe a first “rule” could be derived.
Migrations should always run before strapi starts up, seeds should always run once and after strapi starts up.

baptisteArno · December 23, 2020, 9:53am

I came to the same conclusion. We need to deal with Strapi automigration.

The best way not to think about whether you should run the migrations before or after strapi starts is to always run the migrations after and follow this procedure:

Make changes on the Strapi Content Type Builder
Check the database state
Create a migration that cleans up what’s not necessary anymore

Then if you take your first example. If you make the corresponding changes in the Content Builder first, you’ll see that Strapi will duplicate description column to a new text column. Then the only migration you need to do is to drop the description column:

exports.up = function (knex) {
  return knex.schema.table('notes', (table) => {
      table.dropColumn('description');
  })
};

exports.down = function (knex) {
  return knex.schema.table("notes", (table) => {
    table.renameColumn("text", "description");
    // Will this remove the "text" column? I'm not sure
  });
};

TasosBak · March 11, 2021, 2:59pm

Hello everyone! What a great input for everybody! I learnt so much from this thread! Has anybody come up with anything new on this issue?

DMehaffy · March 11, 2021, 9:06pm

Just as a remind, in Q3 of 2021 we will be working on native database migration support for Strapi (among other massive database changes)

Maxime_Dufresne · March 23, 2021, 5:50pm

Until the end of the year (my guess) when we could have the migrations built into Strapi, I would be very interested on a blog post or documentation page about a way to handle content-type changes and content migration after first deployment.

I’m relatively new to web development and I must admit I was a bit confused reading the different possible solutions provided in this thread.

What I’m getting with my current understanding is that I should not change my content types after deployment.

bykof · March 24, 2021, 8:25am

Changing your content-types after deployment is only possible, if you start Strapi in “develop” mode.
The question is, if you want to allow users to change fields while running Strapi in production. I would in 99% say no. It’s better to have a repository, where you store your current strapi application. Developers define the content-types in develop mode, commit, push and deploy that to a server. On the server you start strapi in production mode with “yarn start”. Then you won’t be able to modify the content-types anymore, what gives you the safety that no one of the editors will break your instance.

And here comes the problem of database migrations.
Let’s say you have already deployed, for example, a Product model with fields: Title, Description, Price.
Now you want to add the field Category to the Product, which is mandatory and can’t be null, you will expect a problem, because Strapi could create the field in the database, but will never know how to assign categories to your products properly, which will result in a database error, because you cannot alter a non-nullable field with null (unknown) values.
So what you have to write is a migration before Strapi creates the fields in the database.

A migration for that problem would look like following:
First create the field as nullable field, set the values for all products, make the field non-nullable.

I hope I could explain you a bit the problematic, which we get here
Cheers!