New The 2026 Continuous Validation Methodology Paper is now available. Read the paper →

SIEM Architecture — Reference.

Reference architecture for a working SIEM: ingestion, normalization, detection layer, response loop — with the cost and quality trade-offs at each junction.

Ingestion layer

  • Source-tier prioritization.
    • Tier 1, full retention (1 year+): identity (AD/Okta auth), endpoint EDR telemetry, cloud control-plane (CloudTrail, AzureActivityLog, GCP AuditLog), DNS, firewall flow.
    • Tier 2, hot 30 days + cold archive: web-proxy, network IDS, application audit.
    • Tier 3, sample or summarize: debug-level app logs, NetFlow at volume, syslog from systems without security relevance.
  • Schema-on-write vs schema-on-read.
    • Schema-on-write (Splunk-classic, ES with mappings): faster query, expensive change, hard to evolve.
    • Schema-on-read (Splunk SPL, ES Runtime, Snowflake on raw): cheap ingest, slower query, easier to evolve.
    • Hybrid (most modern): land raw + extract critical fields at ingest; full schema available at query.
  • Cost levers. Drop verbose-but-useless fields at ingest (chatty Windows event payloads). Per-source filter (don't ingest 4624 from low-value workstations at full volume). Hot/warm/cold tiering — query latency proportional to cost.

Normalization — where SIEMs succeed or fail

  • Canonical schema. ECS (Elastic Common Schema), OCSF (Open Cybersecurity Schema Framework), or a vendor's CIM (Splunk Common Information Model). Pick one and enforce.
  • Key normalized fields.
    • user.name + user.id + user.email — consistent across sources.
    • source.ip / destination.ip — direction matters, set both sides correctly.
    • event.action + event.outcome — what happened, did it succeed.
    • host.name — same host means same hostname across endpoints, network logs, EDR.
    • process.executable + process.command_line + process.parent.executable — process-tree pivot.
  • Anti-pattern. Each source has its own field names downstream; every rule has to write conditions per source. Result: rules don't get written, or get written for one source and decay for the rest.
  • Enrichment at normalization time. User → role/group (from IAM). IP → asset (from CMDB). Hostname → owner team. Without these, every rule has to lookup and most don't.

Detection layer

  • Rule-based (Sigma, native SPL/KQL). Highest precision when written from a real attack pattern (MITRE ATT&CK technique). Sigma converted to SIEM-native query enables portability.
  • Statistical baselines. Volume per user per hour, byte count per host per day. Catches commodity attacks; high FP on legitimate change.
  • Behavioral / UEBA. Per-entity baseline, surface deviation as score. Hunt input, not blocking decision.
  • Sequence / correlation. Event A then event B within window. Most valuable, hardest to maintain (state explosion at scale). Use sparingly for known kill chains.
  • Threat-intel match. IP/domain/hash watch list. High FP on stale lists; tier by confidence and recency.

Response loop

  1. Alert. Routed to SOAR / ticket system with normalized fields + recommended playbook.
  2. Triage. Analyst confirms TP / FP / benign-but-unusual.
  3. Playbook automation. Enrichment (user context, asset criticality, related alerts), containment options (disable user, isolate endpoint), evidence collection automated.
  4. Decision. Analyst (or automated rule for high-confidence cases) executes containment / escalation.
  5. Feedback. Disposition fed back to detection authors. FP rate per rule tracked. Noisy rules retired or tuned, not allowed to rot.

Cost analysis

  • Per-GB ingest vs per-GB stored. Vendor model determines optimization. Splunk → cut ingest. Sumo / DataDog → cut stored. Sentinel → table-tier choice.
  • Hot vs cold. Hot data costs N× cold. Searchable cold tier acceptable for after-the-fact investigation, not for live detection.
  • Long-tail rule maintenance. Each detection has carrying cost (FP-triage hours + rule-update hours per quarter). Retire detections whose carrying cost exceeds their value.
Rule of thumbIf you can't write one detection rule that runs unchanged across firewall, DNS, EDR, and cloud-audit logs because the field names differ, your normalization layer isn't done. Fix that before adding sources or rules.

From reference to evidence

Run this against your own environment.