Config logic overhaul
The config logic has become a bit of a monster in complexity so much that it's hard to make any change without worrying about a cascade of side effects, some of which can result in severe disruption.
The `config.json` file is also affected by filesystem level race conditions where two processes could potentially write/read concurrently without any form of locking (e.g. a CLI command running while the server is reading).
As a result the config could get in a bad state, or even completely reset to defaults. (See linked issue)
The goal of this ticket is to do a complete review of the config logic and possibly improve it by either eliminating race conditions or at least removing unnecessary (and possibly erroneous) writes.
QA Test Steps
Btw, I found another nice nugget, which I guess was already noted by another customer somewhere before.
In a clustered environment, with config in DB, if a config update happens, the config is persisted in the DB, and broadcasted to all nodes. But when it reaches to the other nodes, each node again persists the config in DB. So essentially, each config update becomes N times the number of nodes in a cluster.
Essentially, the config update isn’t aware whether the backing store is shared or not. Something that should be done in the overhaul.
I just did the following:
edit config in vim
run sudo systemctl restart mattermost
and the config got wiped. the service file is the one from our documentation.
this happened on 5.32.1
- You have just proved my theory correct I am talking about the fact that you actually don’t even need to restart Mattermost after you edit config.json. Mattermost picks up the changes automatically for you. Unfortunately, this automatic logic brings with it a host of race conditions and bugs.
But it’s clear that customers are anyways restarting Mattermost, thereby not using this feature at all. So we should be good to remove this feature.
Happy to explain more if you have any more questions. Just DM me
- I might be misunderstanding you here but from a support perspective a majority of our customers are editing the config.json file directly and restarting Mattermost. A very small % that open tickets use config in the db and an even smaller % know how to mmctl/cli config changes.
I could be misunderstanding your comment, but just wanted to share that perspective.
Reminder: as part of this ticket it would be good to verify all possible redundant events triggered by config changes.
From a customer:
“on an actual change, config_changed still happens once per server node and plugin_statuses_changed still happen twice per server node for some reason”