Background · 7 AI, Data & Emerging Risk

ML & Cyber Analytics.

Vendor-neutral landscape map: model families, training pipelines, deployment patterns — plus which statistical/ML models fit which security-analytics problems and where they reliably fail.

Model families

Linear (LR, Lasso, Ridge). Cheap, interpretable, baseline for any tabular task. Coefficients show which feature drives the score.
Tree-based (Random Forest, XGBoost, LightGBM, CatBoost). Dominant for tabular security data (alerts, log events). Handles missing values, mixed types, non-linear interactions. SHAP for per-prediction explanation.
Deep neural (MLP, CNN, RNN/LSTM). Good for sequences (network flows, process trees), images (icon-similarity for malware family), and raw bytes (deep-learning malware classifiers).
Transformer. State of the art for log understanding, code analysis, text-heavy security tasks. Expensive; often distilled or used for offline batch enrichment rather than real-time scoring.
Graph neural networks. Authentication graphs, lateral-movement graphs, malware-similarity graphs. Niche but growing.

Training pipelines

Offline batch. Daily/weekly retrain on accumulated labeled data. Lowest operational complexity. Standard for most security ML.
Online learning. Update on each new labeled example. Useful where labels arrive faster than batch cycle (real-time fraud). Beware label-quality drift poisoning the model.
Federated. Train across customer tenants without centralizing data. Compelling story for security vendors; significant engineering cost. Honest claim only if model improves measurably from federation versus per-tenant baseline.
Active learning. Model queries analyst for labels on uncertain examples. Maximizes labeling ROI when SOC time is the bottleneck.

Deployment patterns

In-product real-time. Score every event inline. Latency budget tight (<10 ms typical). Model must be small + cached features pre-computed.
Sidecar / async. Event published to queue, scored async. Higher latency budget (seconds). Larger models possible.
Batch scoring. Periodic enrichment job over historical data. Largest models. Used for hunt and ranking, not for blocking.
Edge / on-device. Endpoint-side ML. Constrained model size. Examples: on-device URL classifier, on-device process-behavior classifier.

Analytics fit — what works and what fails

Anomaly detection. Works on stable baselines (user login patterns, network egress volume by host). Fails on adversarial drift: attacker observes baseline, stays inside it. Also fails on concept drift: baseline itself moves due to legitimate change (new application, holiday traffic) producing false positives.
Supervised classification. Works when labeled data is large + threat distribution stable + features extractable. Examples: domain-classification (DGA vs legit), URL phishing detection, malware family identification. Fails when novelty rate exceeds retraining cadence.
Clustering. Useful for triage (group similar alerts, surface representative example) and for exploration (cluster newly seen samples to find emerging family). Weak as primary decision surface: no ground truth = no measurable accuracy.
Ranking. Strong fit for SOC triage. Train on analyst dispositions (true positive / false positive) → rank new alerts by predicted disposition. Measurable in alert-to-resolution time reduction.
UEBA-style risk scoring. Combine multiple weak signals into composite risk. Useful as a hunt input. Often over-claimed as a detection.

Failure modes per technique

Data-quality dependency. Garbage labels → garbage model. Vendor "AI" trained on biased label set fails on your data.
FP/FN bias cost. Threshold choice is a business question, not a model question. Authentication: false-positive cost = user friction; false-negative cost = breach. Threshold drives behavior.
Model maintenance cost. 3-year deployment requires retraining cadence, feature-pipeline maintenance, drift monitoring, label-quality auditing. Most vendor demos ignore this.
Adversarial drift. Attackers test against deployed models. Detection-as-code rulesets (Sigma) and ML models both decay; neither is "set and forget."

Evaluating vendor ML claims

"What features?" Vendor unwilling to disclose = often the model is shallow.
"What's the FP rate on your data, on my data?" Customer-side measurement against a known representative window.
"How often retrained, on what data?" Federated claims = verify or discount.
"What's the explainability surface for an analyst?" SHAP, top-k feature contributions, or pure black box?
"What's the operational cost of a wrong decision and how is the loop closed?" Labeling feedback path or fire-and-forget?

Rule of thumbML in security earns its keep when it ranks and prioritizes analyst attention. It struggles when promoted to autonomous decision-maker on novel threats. The honest deployment shape is "model surfaces candidates, analyst decides, decisions feed back into training."

Related notes in this domain

From reference to evidence