NLP-SIEM Pipeline¶
The NLP-SIEM Pipeline translates raw network telemetry strings into vectorized observation spaces or natural language prompts specifically for Blue agents.
Mechanics¶
- Event Generation: The
ConflictResolutionEngineprocesses successful and failed state modifications, emitting raw text signatures matching Sysmon/Windows Event Logs (e.g., Event ID 4624 forLogon). - Log Buffer: These raw strings are appended to a rolling global event queue (
siem_log_buffer), which simulates the real-time logging pipeline. - Vectorization (RL Agents): For standard neural architectures, the most recent $N$ logs in the buffer are mapped through a static TF-IDF encoder or dense embedding layer, transforming the strings into a fixed-size
BlueAgentObservationVector. - Text Translation (LLM Agents): For foundation models, the
state_to_textfunction directly pulls raw text from the buffer and injects it into a structured natural language prompt, bypassing the vectorization phase entirely.