Whether it is to support compliance efforts for regulatory mandated logging, to feed daily security operations center (SOC) work, to support threat hunters or bolster incident response capabilities, security telemetry data stands as the lifeblood of a healthy cybersecurity program. But the more security relies on data and analysis to carry out its core missions, the more data it must manage, curate and protect—while keeping data-related costs tightly under control. As such, security data management and security data architecture is quickly becoming a key competency that CISOs must build out over time. This will take careful consideration and action at both the tactical and strategic levels. The following are some best practices that security leaders should be mindful as they seek to improve security data management to get the most out of their security data for the least amount of investment.
Normalization and Correlation Can Be a Heavy Lift
With so many sources of data—log data from varying systems, telemetry data from security monitoring, threat intelligence from numerous internal and external sources among them—one of the hardest parts of security data management is in simply normalizing this data so it can be mashed up and queried consistently across the lot of it.
“The biggest mistakes security operations teams make today involve underestimating the complexity of integrating diverse security data sources and not prioritizing the effective normalization and correlation of data, leading to inefficiencies and potential security gaps,” says John Pirc, Vice President at Netenrich, a San Jose, Calif.-based security and operations analytics SaaS company.
Before SOCs pick out and start using shiny new data-driven, they’ve got to think carefully about whether they’ll play nicely with existing systems and data streams. Data ingestion and mobility can quickly spiral into costly expenses—and a lot of it has to do with barriers to integration and correlation that stem from normalization and data quality issues.
“For SOCs evaluating or deploying data-focused tools, the most important best practices are ensuring the tool’s scalability and compatibility with existing systems and verifying that it provides actionable insights rather than just data collection,” Pirc says.
Standard Field Scheme for Log Data
One way that a security team can extend its ability to use more tooling and get the most out of the data sources available for security analysis is to be proactive about normalization.
“Security operations teams should establish a clear and standardized default field scheme for all log data within the organization, recommends Or Saya, cybersecurity architect at CardinalOps, a detection posture management company. “This involves defining the standard set of fields that should be present in every log entry, such as timestamp, source IP, destination IP, user, and action taken. Ensure consistency across different log sources to facilitate correlation and analysis.”
As Saya explains, this standardization can help analysts map even the most obscure log sources to an understandable model, which makes it easier to build detection and correlation content around new sources. But this will take investment, as someone will need to babysit the process to continuously validate that the data is normalized against the scheme. If it isn’t validated then the organization is likely to suffer from blind spots that will be tough to pick up on.
Capabilities for Creating Content on Top of Data Streams
Relying solely on pre-built AI detection rules provided by a security product may not adequately address the organization’s specific threat landscape and unique risks. It is important to acknowledge that while AI detection rules in security products are valuable, they may not cover all scenarios. SOC teams should implement a strategy for creating custom detection rules tailored to the organization’s environment, industry, and specific risks. These custom rules can enhance the precision of threat detection and response by addressing context-specific threats that may not be covered by generic AI rules.
Training Data Lineage to Assure Trustworthy AI-Backed Correlation
Security data correlation and detection capabilities have come a long way through the use of data science—and that is only bound to accelerate through the intelligent use of artificial intelligence (AI) and large language models. (LLMs).
“The area of security operations most ripe for automation is the extraction of security-relevant signals from what looks like a pile of noise,” says Brian Neuhaus, CTO of Americas for Vectra AI. However, the reliability of AI and LLMs in crunching security data for meaningful signals will hinge on a lot of data lineage and data management issues.
“Companies that don’t have any experience with language models are beginning to integrate them into their products to analyze and reason about security incidents without understanding how those models operate, what data they were trained on, or why LLMs can hallucinate answers to the questions they shouldn’t be able to answer, as well as hallucinating answers to questions they should be able to answer,” Neuhaus says. “Poorly integrated AI and LLM capabilities will result in people having an ersatz sense of security, without actually being secured.
Security leadership will need to vet AI-driven security correlation tooling carefully with an eye toward the data lineage of the training data that went into developing the models.
Evaluate Data Sources With an Eye Toward Costs
Ingesting poor quality data into the SIEM or other security tooling can be expensive and distract security analysts from making meaningful insights. Security operations should be thinking carefully about the sources they lean on to do analysis—evaluating and choosing sources with a sense of purpose and an eye toward costs.
“Defining clear objectives and requirements and how exactly more or better quality data will drive better decision making will greatly benefit organizations,” says Balazs Greksza, threat response lead at Ontinue, a managed detection and response (MDR) provider. “Data integrations should serve a purpose and having a perceived value beforehand to help prioritizing the meaningful ones. Balancing lower TCO with security value and time-to-value, while integrating with all important internal data sources and tools is a difficult equation that needs to be solved.”
Beware Garbage Data
As organizations evaluate the data sources that feed their detection and correlation engines, organizations should be on the hunt for excising the noise from data streams.
“We really try to suppress garbage data from getting even near our environment,” explains Greg Notch CISO of Expel, a managed detection and response (MDR) firm, and a longtime security veteran who served as CISO for the National Hockey League prior to this job, explaining that this is the data that aren’t high fidelity or don’t point toward meaningful outcomes.
Some examples of garbage data include network detections that don’t come from highly restricted environments and untuned Windows logs—beside authentication, he says.
“These alerts are not high fidelity. They’re not going to help us deliver a security outcome for you, so we’re going to ignore it,” he says, explaining the process his team takes to eliminate garbage data. “We’ve got very smart folks who are thinking about that data ingestion, what to take, what to leave behind, what things matter, how they fit together, so how an alert from your EDR would fit together with an alert from your network connectivity, and only taking the pieces of that that matter to make that correlation and give you the package data.”
Cross-Pollinate SecOps Teams with Data Science Expertise
Picking the right data sources for effective analysis comes—and then coming up with the detection content to use those sources effectively will require a blend of both security and data science know-how. Whether it is by hiring security analysts with strong data science knowledge, training existing analysts in these concepts, hiring data science pros to work side-by-side with the security experts or some combination of the three, security operations team will increasingly need to cross-pollinate their skillset with data science expertise.
In a robust organization such as an MSP or large enterprise, adding data scientists to the mix is increasingly best practice.
“There’s a yin and yang to the data science part of it and the people who are doing the security part of it,” Notch says, explaining that the right combination will feed more cost effective design of security data architecture and execution of security data management. “The people who are building the detections that are both for a specific tool and span multiple tools, they understand what data they need to build those detections. They look for it in the data sets, and they communicate with the data science people who are are very much about the cost optimization of the data pipelines. They’re saying, ‘Well, all right, we can get you just the pieces of that you need without having to bring along all of the other logging and all of the other telemetry information that comes along with it, or you can go query this other system where we don’t have to pull it in.’
Decouple Data for Flexibility
For decades now many security strategists have been grabbing for the elusive brass ring of security data consolidation. That was for so long the promise of SIEM—to provide a ‘single pane of glass’ look into security-related data and offer a unified platform for data correlation and detection. But data ingestion and data egress costs across enterprise architecture, along with issues of normalization and parsing have all contributed to clouding these waters. Some experts say that security need to rethink the consolidation narrative—at least for the short- and medium-term.
“What you want to be able to do is decouple your analytics, your data and your detection components, and even the incident response so that you can start mixing and matching them and basically removing them and adding them as you need to,” says Oliver Rochford, a longtime security industry analyst and security futurist.
A Data Lake for More Cost-Effective Observability
As a part of that decoupling, an increasing number of security organizations are layering in security data lakes into their analytics architecture. These unstructured pools of security data provide a flexible place to quickly and cheaply ingest new data sources that are still directly queryable and upon which new security analytics capabilities can be built or integrated into.
“Security data lakes provide security teams more flexibility and faster time to value as they are not having to monkey with they back-end data architectures. A lot of legacy SIEMS require full-time employees just to manage the data infrastructure and it requires a lot of care and feeding particularly as you add new data sources,” explains Ken Westin, field CISO of Panther Labs, warning at the same time to be careful not to get caught in the weeds with implementation. “One mistake I have seen organizations make is to try and roll their own security data lake, which becomes a science project taking their security team’s attention off of finding threats and more time as system administrators.”
Capabilities for Creating Content on Top of Data Streams
Telemetry and log data all play a role in the security data ecosystem, but it’s the detection content on top of that which is prized by the SOC analysts. As Pirc recommends, teams should be seeking data-driven security tools that provide those detection rules and security analysis content right out of the box. But pre-built rules are probably not going complete an organization’s need for sifting through the data to find unique risks to them. No matter the architecture, organizations also need to pair their security data management capabilities with the ability to create good content on top of the data pipeline.
“It is important to acknowledge that while AI detection rules in security products are valuable, they may not cover all scenarios. SOC teams should implement a strategy for creating custom detection rules tailored to the organization’s environment, industry, and specific risks,” says Saya. “These custom rules can enhance the precision of threat detection and response by addressing context-specific threats that may not be covered by generic AI rules.”
Future Proof for New Data Sources
With the security market moving so quickly and the pace of development of new digital systems that must be monitored and logged rapidly advancing, security teams are going to need to future-proof their security analytics capabilities. This is why security leaders are going to should be examining their analytics and data management tooling based not just on today’s needs but the flexibility to handle the unknown future needs without ripping and replacing.
“We don’t know what will be the key data sources will need in five years from now,” says Olivier Spielmann, global lead of managed detection and response services at Kudelski Security. “So it is important that we have some capabilities to have a platform and services to be able to ingest those new unknown security controls that will be put in place and without having to change every two years.”
Source: Original Post
“An interesting youtube video that may be related to the article above”