Idempotence / self-healing: the system should be built in such a way that it tries to reach the correct end state, even if the current state is wrong. For instance, every time our system gets an update, it will re-evaluate the calculation from first principles, instead of doing a diff based on what was there before. This prevents bad data from snowballing and becoming a catastrophe.
Giving yourself knobs to twiddle in production: at work we have ways of triggering functionality in the system on request. Basically calling a method directly on the running process. This is so, so useful in prod issues, especially when combined with the above. We can basically tell the system “reprocess this action/command/message” at any time and it will do it again from first principles.
Debugging: I always first try and find a way to replicate it quickly. Then, I try and simplify it one tiny step at a time until it’s small enough I can understand in one go. I never combine multiple steps per re-run and always verify whether the bug is there or not at every single stage. This can be quite a slow approach but it also means I am always making progress towards finding the answer, instead of coming up with theories which are often wrong, and getting lost in the process.
Would you be willing to give an example of the second? I feel like my boss would throw a shitfit if I told him I wrote anything that even remotely alter prod
Idempotence / self-healing: the system should be built in such a way that it tries to reach the correct end state, even if the current state is wrong. For instance, every time our system gets an update, it will re-evaluate the calculation from first principles, instead of doing a diff based on what was there before. This prevents bad data from snowballing and becoming a catastrophe.
Giving yourself knobs to twiddle in production: at work we have ways of triggering functionality in the system on request. Basically calling a method directly on the running process. This is so, so useful in prod issues, especially when combined with the above. We can basically tell the system “reprocess this action/command/message” at any time and it will do it again from first principles.
Debugging: I always first try and find a way to replicate it quickly. Then, I try and simplify it one tiny step at a time until it’s small enough I can understand in one go. I never combine multiple steps per re-run and always verify whether the bug is there or not at every single stage. This can be quite a slow approach but it also means I am always making progress towards finding the answer, instead of coming up with theories which are often wrong, and getting lost in the process.
Would you be willing to give an example of the second? I feel like my boss would throw a shitfit if I told him I wrote anything that even remotely alter prod