root@sovietghost:/blog/031-soc# cat post.md

Title: What is SOC

Author: SovietGhost

Date: 9/23/2025

Description: A deep, practical, no-nonsense guide to Security Operations Centers (SOC): what they do, how they work, the tech and people that run them, and how to build one that actually reduces risk.

Tags: [security, SOC, cybersecurity, operations, incident-response]

Status: published

> What is SOC_

A Security Operations Center (SOC) is where organizations stop pretending security is someone else's problem and start treating it like the continuous, high-stakes job it really is. In plain terms: SOC = people + process + tools who together detect, investigate, and respond to cyber threats 24/7 (or during business hours if you like living dangerously).

This post is a thorough, pragmatic breakdown of SOC — not marketing fluff, and not academic vapor. If you want to know what a SOC does, how it’s staffed, which tools actually matter, and what mistakes companies keep repeating, read on.

## Core purpose (TL;DR)

>Detect threats: turn noisy telemetry into actionable alerts.
>Triage & investigate: separate false alarms from actual intrusions.
>Respond & contain: stop breaches fast and limit blast radius.
>Hunt proactively: find attackers before they trigger alerts.
>Learn & improve: harden systems and update playbooks after every incident.

If your org treats security like a checkbox, a SOC will make it uncomfortable — on purpose.

## Brief history and evolution

>Pre-SOC era: ad-hoc defense — logging existed but was rarely used.
>Early SOCs: centralized NOC-like teams focused on logs and alerts (SIEM-heavy).
>Modern SOCs: layered detection (SIEM + EDR/XDR + network telemetry), automation (SOAR), threat intelligence, and dedicated hunters. Cloud and SaaS have forced SOCs to evolve quickly — now they must operate across ephemeral infra, APIs, and hybrid estates.

## Who works in a SOC (roles & responsibilities)

>SOC Manager / Head of SOC — strategy, budgeting, SLA, reporting to CISO.
>Tier 1 Analyst (Alert Triage) — reads alerts, enriches data, and escalates. Entry-level but driven by caffeine and stubbornness.
>Tier 2 Analyst (Investigation) — deep dives, forensic evidence collection, and persistence analysis. Knows how to pivot across logs, endpoints, and network flows.
>Tier 3 / Threat Hunter / Incident Responder — hunts unusual behaviors, performs containment, and leads IR for live incidents. Often ex-DFIR folks.
>Threat Intelligence Analyst — curates indicators, maps attacker TTPs (MITRE ATT&CK), and feeds IOC/TTP into detection.
>SOAR Engineer / Automation Dev — builds playbooks to automate repetitive tasks safely.
>Forensics Specialist — when evidence matters for legal/regulatory work.
>IAM/Cloud/Network SMEs — domain experts pulled in for complicated incidents.

Staffing model note: you can outsource part or all of this to an MSSP, but you still need internal ownership for communications, asset context, and remediation decisions.

## Day-to-day SOC workflow

>Alert ingestion — logs and telemetry flow into the detection layer (SIEM, EDR, network sensors).
>Triage — Tier 1 filters obvious false positives, enriches alerts (user, asset, geolocation, threat intel).
>Investigation — Tier 2 assembles timeline, hunts lateral movement, checks persistence.
>Contain & eradicate — isolate endpoints, block C2, revoke creds, patch or replace compromised assets.
>Recovery & validation — restore services, validate no persistence.
>Post-incident — lessons learned, playbook update, exec reporting.

Good SOCs make triage frictionless and provide analysts with context (asset owners, criticality, recent changes). Bad SOCs drown analysts in raw alerts.

## Essential technology stack

>SIEM (Security Information and Event Management): centralizes logs, correlation rules, and long-term retention. Still the plumbing.
>EDR/XDR (Endpoint Detection & Response / Extended Detection): endpoint visibility and response actions (isolate, kill process).
>Network detection: NDR, NetFlow, packet capture for lateral movement and C2 detection.
>SOAR (Security Orchestration, Automation, and Response): playbooks, case management, automated enrichment. Use it to reduce toil — not to replace thought.
>Threat Intelligence Platform (TIP): ingest feeds, manage IOCs, and score relevance.
>Identity & Access telemetry: logs from IdP (SAML/OAuth), MFA, and IAM changes.
>Cloud logging / CSPM: cloud-native telemetry, gating misconfigs, IAM risk.
>Vulnerability & patch management: integrates with SOC to prioritize exposed high-value assets.

Tool selection rule: pick tools that reduce mean time to detect/mean time to respond (MTTD/MTTR) — not tools that improve your slide deck.

## Detection engineering & playbooks

>Detection engineering builds reliable detections with low false positive rates. Relying only on vendor rules is lazy and expensive.
>Playbooks codify repeatable IR steps. A good playbook includes:
- >Trigger conditions and severity levels.
- >Enrichment sources and automated enrichment steps.
- >Triage checklist (evidence to collect).
- >Containment options and rollback plan.
- >Communications: who to notify (legal, PR, exec).
- >Post-incident actions and remediation owners.

Example minimal playbook snippet (conceptual):


terminal
playbook: suspicious_remote_admin
trigger: alert.source == "EDR" && alert.rule == "remote_admin_unusual_hours"
steps:
  - enrich: [user_lookup, asset_owner, last_30_logins]
  - if: enrichment.user == "service_account"
    then: escalate to Tier2
  - containment_options: ["isolate_endpoint", "disable_account", "block_ip"]
  - notify: ["incident_response@corp", "infra_oncall"]

## KPIs that matter (and the ones that don't)

Useful:

>MTTD (Mean Time To Detect) — how fast you notice something.
>MTTR (Mean Time To Respond) — how fast you stop it.
>Noise ratio / false positive rate — alerts per analyst per shift.
>Containment time — time from detection to isolation.
>Coverage % — percent of critical assets monitored.

Useless vanity metrics:

>Total number of alerts (without context)
>Number of blocked IPs — sounds good but means nothing if you didn't reduce risk.

## SOC maturity model (practical)

>Level 0: Ad-hoc — reactive, no central logs.
>Level 1: Basic — SIEM + basic correlation, on-call IR, lots of alerts.
>Level 2: Managed — EDR, triage process, documented playbooks.
>Level 3: Proactive — threat hunting, detections tuned, SOC led by metrics.
>Level 4: Adaptive — automated containment, continuous improvement, telemetry everywhere.

Aim for Level 2 as baseline. Level 3+ is where you actually reduce risk materially.

## Building vs buying (in-house SOC vs MSSP)

>In-house pros: full control, domain knowledge, faster internal coordination.
>In-house cons: expensive — hiring, 24/7 coverage, tooling costs.
>MSSP pros: scale, shift coverage, lower immediate cost.
>MSSP cons: less contextual knowledge, ticketing friction, data residency and trust issues.

Hybrid approach: keep core IR and asset context in-house, outsource monitoring for non-critical assets or when you need scale.

## Common failure modes (aka what to avoid)

>Too many false positives — analysts burn out and ignore alerts.
>Poor asset inventory — you can’t protect what you can’t see.
>No playbook ownership — playbooks rot if not exercised and updated.
>Tool overload — handing analysts 10 consoles to check is cruel.
>Lack of executive reporting — SOC becomes an island without funding or influence.
>No tabletop exercises — don’t assume your playbook works — practice it.

## Legal, compliance & privacy considerations

>Log retention policies must align with GDPR, PCI-DSS, HIPAA as applicable.
>Forensics may require preserving chain-of-custody for legal actions.
>Data minimization: ingest what you need and protect logs (they contain PII).
>Cross-border log storage: check regulations before sending everything to a US-based SIEM.

## The future: trends reshaping SOCs

>AI-assisted detection and triage — reduces analyst toil but requires careful evaluation (adversarial robustness matters).
>Cloud-native SOC — native telemetry, serverless IR playbooks, infrastructure-as-code for detections.
>XDR consolidation — unified telemetry across endpoints, network, and cloud.
>Automation-first with human oversight — automate the routine, keep humans for judgement calls.
>Zero Trust integration — SOCs feed into automated access decisions for containment.

## Quick checklist to evaluate your SOC (do this now)

>Do you have an up-to-date asset inventory? (If not, stop reading and fix this.)
>Can you detect and isolate a compromised endpoint within 30 minutes? If no, measure current MTTR.
>Are your playbooks tested via tabletop exercises every 6 months?
>Do analysts have a consolidated console with enrichment data and runbooks?
>Is threat intel contextualized (true positives > false positives)?

If you answered “no” to more than one, you have work to do.

## Final thoughts

A SOC is not a single product or a team you can check off. It’s a discipline — organized, measured, and relentlessly practical. The difference between a SOC that reduces risk and one that just increases spending is leadership, telemetry coverage, and the discipline to say “no” to noisy detections and “yes” to testing playbooks.

End of Blog Post.

root@sovietghost:/blog/031-soc# ls -la ../

← Back to blog index