Lemmy: Bestiverse
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RSS BotMB to Hacker NewsEnglish · 2 months ago

Sir-Bench – benchmark for security incident response agents

arxiv.org

external-link
message-square
0
link
fedilink
3
external-link

Sir-Bench – benchmark for security incident response agents

arxiv.org

RSS BotMB to Hacker NewsEnglish · 2 months ago
message-square
0
link
fedilink
SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents
arxiv.org
external-link
We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 anonymized incident patterns with expert-validated ground truth, SIR-Bench measures not only whether agents reach correct triage decisions, but whether they discover novel evidence through active investigation. To construct SIR-Bench, we develop Once Upon A Threat (OUAT), a framework that replays real incident patterns in controlled cloud environments, producing authentic telemetry with measurable investigation outcomes. Our evaluation methodology introduces three complementary metrics: triage accuracy (M1), novel finding discovery (M2), and tool usage appropriateness (M3), assessed through an adversarial LLM-as-Judge that inverts the burden of proof -- requiring concrete forensic evidence to credit investigations. Evaluating our SIR agent on the benchmark demonstrates 97.1% true positive (TP) detection, 73.4% false positive (FP) rejection, and 5.67 novel key findings per case, establishing a baseline against which future investigation agents can be measured.

Comments

alert-triangle
You must log in or # to comment.

Hacker News

hackernews

Subscribe from Remote Instance

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !hackernews@lemmy.bestiver.se
lock
Community locked: only moderators can create posts. You can still comment on posts.

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

Source of the RSS Bot

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 383 users / day
  • 1.7K users / week
  • 4.14K users / month
  • 9.94K users / 6 months
  • 2 local subscribers
  • 4.96K subscribers
  • 52.6K Posts
  • 28.2K Comments
  • Modlog
  • mods:
  • patrick
  • RSS Bot
  • BE: 0.19.15
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org