Reference · 8 Defensive Operations & Governance

SIEM Architecture — Reference.

Reference architecture for a working SIEM: ingestion, normalization, detection layer, response loop — with the cost and quality trade-offs at each junction.

Ingestion layer

Source-tier prioritization.
- Tier 1, full retention (1 year+): identity (AD/Okta auth), endpoint EDR telemetry, cloud control-plane (CloudTrail, AzureActivityLog, GCP AuditLog), DNS, firewall flow.
- Tier 2, hot 30 days + cold archive: web-proxy, network IDS, application audit.
- Tier 3, sample or summarize: debug-level app logs, NetFlow at volume, syslog from systems without security relevance.
Schema-on-write vs schema-on-read.
- Schema-on-write (Splunk-classic, ES with mappings): faster query, expensive change, hard to evolve.
- Schema-on-read (Splunk SPL, ES Runtime, Snowflake on raw): cheap ingest, slower query, easier to evolve.
- Hybrid (most modern): land raw + extract critical fields at ingest; full schema available at query.
Cost levers. Drop verbose-but-useless fields at ingest (chatty Windows event payloads). Per-source filter (don't ingest 4624 from low-value workstations at full volume). Hot/warm/cold tiering — query latency proportional to cost.

Normalization — where SIEMs succeed or fail

Canonical schema. ECS (Elastic Common Schema), OCSF (Open Cybersecurity Schema Framework), or a vendor's CIM (Splunk Common Information Model). Pick one and enforce.
Key normalized fields.
- user.name + user.id + user.email — consistent across sources.
- source.ip / destination.ip — direction matters, set both sides correctly.
- event.action + event.outcome — what happened, did it succeed.
- host.name — same host means same hostname across endpoints, network logs, EDR.
- process.executable + process.command_line + process.parent.executable — process-tree pivot.
Anti-pattern. Each source has its own field names downstream; every rule has to write conditions per source. Result: rules don't get written, or get written for one source and decay for the rest.
Enrichment at normalization time. User → role/group (from IAM). IP → asset (from CMDB). Hostname → owner team. Without these, every rule has to lookup and most don't.

Detection layer

Rule-based (Sigma, native SPL/KQL). Highest precision when written from a real attack pattern (MITRE ATT&CK technique). Sigma converted to SIEM-native query enables portability.
Statistical baselines. Volume per user per hour, byte count per host per day. Catches commodity attacks; high FP on legitimate change.
Behavioral / UEBA. Per-entity baseline, surface deviation as score. Hunt input, not blocking decision.
Sequence / correlation. Event A then event B within window. Most valuable, hardest to maintain (state explosion at scale). Use sparingly for known kill chains.
Threat-intel match. IP/domain/hash watch list. High FP on stale lists; tier by confidence and recency.

Response loop

Alert. Routed to SOAR / ticket system with normalized fields + recommended playbook.
Triage. Analyst confirms TP / FP / benign-but-unusual.
Playbook automation. Enrichment (user context, asset criticality, related alerts), containment options (disable user, isolate endpoint), evidence collection automated.
Decision. Analyst (or automated rule for high-confidence cases) executes containment / escalation.
Feedback. Disposition fed back to detection authors. FP rate per rule tracked. Noisy rules retired or tuned, not allowed to rot.

Cost analysis

Per-GB ingest vs per-GB stored. Vendor model determines optimization. Splunk → cut ingest. Sumo / DataDog → cut stored. Sentinel → table-tier choice.
Hot vs cold. Hot data costs N× cold. Searchable cold tier acceptable for after-the-fact investigation, not for live detection.
Long-tail rule maintenance. Each detection has carrying cost (FP-triage hours + rule-update hours per quarter). Retire detections whose carrying cost exceeds their value.

Rule of thumbIf you can't write one detection rule that runs unchanged across firewall, DNS, EDR, and cloud-audit logs because the field names differ, your normalization layer isn't done. Fix that before adding sources or rules.

Related notes in this domain

From reference to evidence