Built an offline AI assistant for security work in air-gapped environments (SCIFs, classified networks, etc.). Runs entirely local - no API calls, no telemetry. Technical approach:
- RAG with 360k embedded chunks (sentence-transformers: all-MiniLM-L6-v2)
- FAISS for vector similarity search
- Local LLM inference via Ollama (Llama 3.1 8B quantized)
- Three-tier retrieval: dictionary → SQLite FTS5 → FAISS semantic search
- Parses security tool output (Nmap XML, Volatility, Metasploit, etc.)
Architecture:
- Embed user query (384-dim vector)
- FAISS search across 360k chunks, retrieve top 8
- Build prompt: context + query
- Local LLM generation (no external calls)
- Response with tool-specific recommendations
Knowledge sources indexed:
- CVE database (2014-2025, SQLite + FAISS)
- ExploitDB (~50k exploits)
- Security tool documentation (Volatility, Metasploit, BloodHound)
- HackTricks, GTFOBins, LOLBAS, PayloadsAllTheThings
- Custom tool integration guides
Interesting challenges solved:
- Preventing RAG noise with high-frequency findings (tiered indexing)
- Fast CVE lookup (dict → FTS5 → vector search cascade)
- Tool output parsing without rigid schemas (regex + context awareness)
- Keeping vector DB under 2GB while indexing 360k chunks
Current limitations:
- Windows-focused (Linux experimental)
- ~8GB RAM requirement
- Tool parsers are brittle (working on this)
- Alpha quality - learning project by self-taught dev
Code: <a href=“https://gitlab.com/sydsec1/Syd” rel=“ugc”>https://gitlab.com/sydsec1/Syd</a> (MIT) Docs: <a href=“https://www.sydsec.co.uk” rel=“ugc”>https://www.sydsec.co.uk</a> Interested in feedback on:
- RAG architecture choices (FAISS vs alternatives for this use case)
- Noise reduction strategies for continuously-indexed findings
- Tool output parsing approaches (current method: regex, considering AST/structured)
- Offline model selection (currently Llama 3.1 8B Q4, open to alternatives)
Happy to discuss implementation details. Comments
You must log in or register to comment.

