I find this only is the direction sought by half baked devs who aren’t bothering to actually proof read the stuff their agents churn out.
They “trust” it without any true proof of trust.
Agents are INCREDIBLY prone to fudging and faking “success” metrics, especially when put under context pressure.
Ive seen everything from commenting out tests to fake passes, or changing the asserts on tests to fake a success, to just slapping “to be implemented later” and then calling that done.
You fundamentally cannot automate away proving that an agent actually did its job right, full stop. You can make it write tests, but now how do you know the tests were written right?
At some point you HAVE to actually sit and read the code, read the diffs, and check the work. If you don’t, you are opening yourself up to all manner of problems, especially if whatever you are working on is remotely sensitive. If the tool/app/whatever has any kind of auth or handles any kind of sensitive data, you MUST still be auditing every change.
And thus, the IDE continues to still be the tool I prefer to sit and sanity check the code as it gets produced.
Doesnt matter which one I use, I need the ability to live read and diff code and steer the agent away from disaster.
If you blindly trust agents without constantly auditing their code, you are just setting yourself up for failure.
I find this only is the direction sought by half baked devs who aren’t bothering to actually proof read the stuff their agents churn out.
They “trust” it without any true proof of trust.
Agents are INCREDIBLY prone to fudging and faking “success” metrics, especially when put under context pressure.
Ive seen everything from commenting out tests to fake passes, or changing the asserts on tests to fake a success, to just slapping “to be implemented later” and then calling that done.
You fundamentally cannot automate away proving that an agent actually did its job right, full stop. You can make it write tests, but now how do you know the tests were written right?
At some point you HAVE to actually sit and read the code, read the diffs, and check the work. If you don’t, you are opening yourself up to all manner of problems, especially if whatever you are working on is remotely sensitive. If the tool/app/whatever has any kind of auth or handles any kind of sensitive data, you MUST still be auditing every change.
And thus, the IDE continues to still be the tool I prefer to sit and sanity check the code as it gets produced.
Doesnt matter which one I use, I need the ability to live read and diff code and steer the agent away from disaster.
If you blindly trust agents without constantly auditing their code, you are just setting yourself up for failure.