Built an offline AI assistant for security work in air-gapped environments (SCIFs, classified networks, etc.). Runs entirely local - no API calls, no telemetry. Technical approach:

  • RAG with 360k embedded chunks (sentence-transformers: all-MiniLM-L6-v2)
  • FAISS for vector similarity search
  • Local LLM inference via Ollama (Llama 3.1 8B quantized)
  • Three-tier retrieval: dictionary → SQLite FTS5 → FAISS semantic search
  • Parses security tool output (Nmap XML, Volatility, Metasploit, etc.)

Architecture:

  • Embed user query (384-dim vector)
  • FAISS search across 360k chunks, retrieve top 8
  • Build prompt: context + query
  • Local LLM generation (no external calls)
  • Response with tool-specific recommendations

Knowledge sources indexed:

  • CVE database (2014-2025, SQLite + FAISS)
  • ExploitDB (~50k exploits)
  • Security tool documentation (Volatility, Metasploit, BloodHound)
  • HackTricks, GTFOBins, LOLBAS, PayloadsAllTheThings
  • Custom tool integration guides

Interesting challenges solved:

  • Preventing RAG noise with high-frequency findings (tiered indexing)
  • Fast CVE lookup (dict → FTS5 → vector search cascade)
  • Tool output parsing without rigid schemas (regex + context awareness)
  • Keeping vector DB under 2GB while indexing 360k chunks

Current limitations:

  • Windows-focused (Linux experimental)
  • ~8GB RAM requirement
  • Tool parsers are brittle (working on this)
  • Alpha quality - learning project by self-taught dev

Code: <a href=“https://gitlab.com/sydsec1/Syd” rel=“ugc”>https://gitlab.com/sydsec1/Syd</a> (MIT) Docs: <a href=“https://www.sydsec.co.uk” rel=“ugc”>https://www.sydsec.co.uk</a> Interested in feedback on:

  • RAG architecture choices (FAISS vs alternatives for this use case)
  • Noise reduction strategies for continuously-indexed findings
  • Tool output parsing approaches (current method: regex, considering AST/structured)
  • Offline model selection (currently Llama 3.1 8B Q4, open to alternatives)

Happy to discuss implementation details. Comments