Altering tables in postgres

Joey_Baruch · November 1, 2020, 5:35pm

System Information

Strapi Version: 3.2.5
Operating System: multiple - dev: mac/win, docker; prod: heroku
Database: postgres
Node Version: 12
NPM Version: multiple
Yarn Version: ?

Altering tables in postgres

Hi peeps!
I wanted to ask if anyone can shed some light into the intricacies and caveats of manually altering PG tables.

Motivation

Managing data integrity is always a good thing, and postgres (as well as any db) is very well equipped to do so. Strapi does not support composite primary keys, nor composite unique constrains, both of which are integral in enforcing data integrity.
Supporting a shared tables multi-tenancy approach for those of us building a SaaS product. To explain it quickly, this approach entails adding a tennantID column to almost every table, but in order to mitigate security risks and potential human error from uncareful devs, it would be prudent to leverage postgres’ row level security policies and validate that the data requests’ origin matches the tenant for each row.
Scalability - given the multitenancy, it would be prudent to force the clustered index to include the tennantID - this way each tennant’s data is very likely to end up in a single shard - avoiding network io in joins, transactions, and foreign key constraints.

Question

As someone who is not familiar with strapi internals, I’m asking the community for advice in this matter. Could anyone give me a heads up with pitfalls, gotchas, what to look out for, what to avoid?

DMehaffy · November 2, 2020, 9:28pm

I think the information you are able to provide to us (and likewise us to you) would be a good topic for an RFC: GitHub - strapi/rfcs: RFCs for Strapi future changes

First off please forgive my ignorance when it comes to complex PostgreSQL DBA topics, I am more of a MySQL/MariaDB guy myself, you are probably far more knowledgeable there

I absolutely agree that data integrity is a great thing, almost all of our supported databases have at least some method to do so (Yes even MongoDB )
1a. Regarding primary keys, currently we don’t allow customization of this due to some hard coded variables, although we do plan to add this back in (it was removed when we went from beta => stable). This is more so a refactor of the database layer which we have planned this quarter: Database layer (v4) - Roadmap | Product Roadmap however we are determining the scope of that.
2a. I’m not sure I fully understand the composite unique constraints, could you give me a brief summery? We do already add in some unique constraints.
Shared tables might a bit more complex topic once we plan to introduce the multi-tenancy feature, for the moment we don’t really have any suggestions on this however I do agree the row level security policies would be awesome to have, the only caveat is when we add features, we generally require that all of our supported databases must also be included in at least some capacity. If this can’t be done at the database level for every database then it’s implemented at the application layer. (Where possible, we try to keep the split between SQL and NoSQL as minimal as possible).
This one I’m not entirely sure I agree with, but I’m largely unfamiliar with PG clusters and have more experience in MariaDB Galera clusters where DB Host <=> DB Host IO isn’t really all that difficult since every read and write node are all kept in sync and a single query won’t leave the DB Host it originally connected with.

Joey_Baruch · November 4, 2020, 4:02pm

Hi Derrick,

Thanks for your response!

rfcs

I’m not sure what you mean by this, as I’m not requesting a code change, but rather, as you mentioned, just some information/insights. I’m trying to figure out what would happened if I made those changes in the PG DB from strapi’s perspective.

composite unique constraints

As strapi works with surrogate primary keys, the way to avoid duplications in place of a composite natural PK would be composite unique constraint. For example, in a system where we have multiple tenants, and each tenant has classes

class:

Key	Column
PK	class_id
	tenant_id
	class_name

for each tenant, class_name should be unique. Therefore, a natural candidate key here is the pair (tenant_id, class_name), but since we have a surrogate PK, the matching composite unique constraint here is for the same pair. To elaborate further:

In the following table, you can have:

class_id	tenant_id	class_name
1	t1	“morning yoga”
2	t2	“morning yoga”

but if then you try to add (3, t1, “morning yoga”) it should throw an error, despite having a unique PK.

postgres specifics

I understand that you want to keep the same functionality across DBs, but postgres does have some features that makes it very advanced - row level security being one of them.
How would you recommend that we expand upon the postgres specific database layer? If not to contribute back to the community, then at least for our own use.

sharding

my comment about scalability was not about read replicas, but rather about partitioning of tables, or sharding. Having tenant_id as part of the PK would cause the sharding/paritioning mechanism to keep all tenant data in a single shard. I believe MariaDB has a similar mechanism but I’m just not sure.

should tenant data partition across shards, then each transaction, join, or FK constraint update/cascade would require network I/O to communicate between the partitions/shards that contain the data

raghavmalawat · May 7, 2021, 12:55pm

Hi,

Is this issue resolved? Need something similar for Postgres that @Joey_Baruch mentioned in the last reply.