• ambitiousslab@lemmy.ml
    link
    fedilink
    English
    arrow-up
    7
    ·
    11 hours ago

    Idempotence / self-healing: the system should be built in such a way that it tries to reach the correct end state, even if the current state is wrong. For instance, every time our system gets an update, it will re-evaluate the calculation from first principles, instead of doing a diff based on what was there before. This prevents bad data from snowballing and becoming a catastrophe.

    Giving yourself knobs to twiddle in production: at work we have ways of triggering functionality in the system on request. Basically calling a method directly on the running process. This is so, so useful in prod issues, especially when combined with the above. We can basically tell the system “reprocess this action/command/message” at any time and it will do it again from first principles.

    Debugging: I always first try and find a way to replicate it quickly. Then, I try and simplify it one tiny step at a time until it’s small enough I can understand in one go. I never combine multiple steps per re-run and always verify whether the bug is there or not at every single stage. This can be quite a slow approach but it also means I am always making progress towards finding the answer, instead of coming up with theories which are often wrong, and getting lost in the process.

    • Garbagio@lemmy.zip
      link
      fedilink
      arrow-up
      1
      ·
      8 hours ago

      Would you be willing to give an example of the second? I feel like my boss would throw a shitfit if I told him I wrote anything that even remotely alter prod