The Conundrum of Security Remediation
Over the years, product security teams have tapped into various methods of security backlog enumeration. Security issues come from a myriad of sources; they are identified by the security team in-house, tools find issues at scale, external sources such as bug bounty and vendor pentests complement the security team’s efforts, and customers themselves may report security issues against the product they use. Not only do security issues come from a variety of sources, they also come with varying priorities and severities that compete for the precious time of the security team and that of the engineering team that needs to fix it.
In this post, we explore how the remediation strategy can be pulled in so many directions and how to find a balanced approach to manage the security backlog within an organization.
A security issue is prioritized and then remediated if it is considered urgent. Security is a discipline of constant prioritization and the highest priority item is driven in order for an organization to respond swiftly to security gaps. The challenge is that the definition of urgency differs for different people and teams. A technical security practitioner sees things differently than a compliance auditor, who may see things differently than a customer using the service. This leads to an interesting situation where the same issue holds different significance for different teams.
For a team or organization dealing with a security backlog, the following remediation strategies usually come into play due to varying perspectives:
Risk Reduction Strategy: In this approach the objective is to take risk off the table. This means the primary focus of this strategy is to work on the highest risk items after it has been evaluated in the context of the product. The idea is not to be driven by CVSS scores alone, but to contextualize the risk and evaluate its applicability to the organization. If truly risky, the security team pushes for engineering teams to remediate the issue.
For example, working on a potential remote code execution issue right away may be more important than other issues in the backlog and prioritizing that first, in order to reduce the risk, is the most logical step to take instead of trying to meet SLAs alone.
Pros: The advantages of this strategy is that it is proactively addressing risk that matters the most, and it is ignored if it only leads to a perception of security.
Cons: A risk-centric strategy at times can lack alignment with the business and the customer may feel their voice is not heard if their issues are not getting addressed per their expectations.
SLA Driven Strategy: A security team also subscribes to the need for time-based remediation where the time is decided as per the criticality of the issue. These timelines are usually documented in a security standard. For example, teams may require fixing critical issues in 7 days, high priority issues in 30, mediums in 90 and lows in 180 days, as per the organization’s security standard.
Pros: The advantages of this strategy is that it is very fast to implement, metrics and measurements are easy to put together, the need to go deep is greatly reduced, and it aligns with the security standards.
Cons: Fixing risk in the wrong order due to a ticking clock and ignoring customer’s priority issues are a clear disadvantage of this strategy. Also, some issues may require a combination of a mitigating fix and a deeper architectural change. By focussing only on the short term, time-bound mitigating fix, the engineering team may not be incentivized to work on the architecture effort in the long run once the issue has been fixed from a SLA perspective.
Customer Centric Strategy: It goes without saying that “the customer is king” and like all aspects of product satisfaction, if they have a security concern, it is important to address it. At times, these may be low priority issues that don’t pose real risk to the service, but not fixing them could lead to a perception of lax security.
Pros: Fixing issues using this strategy keeps customers happy and eliminates reasons to purchase a service by a competitor. This is good for the business and the growth of a company.
Cons: The disadvantage is the bandwidth that goes into fixing the issue is spent at the cost of not reducing risk that would have eventually mattered.
Which Strategy Wins?
This is like the tricky multiple choice quiz question where the correct response is “all of the above”. However much we may desire a singular answer, there is no wrong or right strategy here. It can be risky to over-subscribe to only one style of remediation while ignoring the others. You can’t afford to leave a known risk unfixed or a customer unhappy or not fixing an issue in a timely manner. This is where the security team plays a key role.
The security team should not only specialize in risk enumeration through various techniques and sources, but they also hold the responsibility to push for risk remediation such that the delicate balance between all strategies is maintained at all times. The security team, in this case, is the orchestrator of risk that utilizes engineering bandwidth optimally, with underlying goals of customer satisfaction, risk reduction and timeliness of security fixes.
Finding That Balance
Security teams hold a precarious responsibility to reduce risk that is “urgent” without creating too much unnecessary noise and creating ticketing fatigue. Not doing this right leads to friction between engineering and security teams, and sometimes not doing enough leads to poor security outcomes. At times it may be OK for a security team to fine-tune the noise, and in other cases it may be very important to make plenty of noise if the risk deserves to be amplified.
It is not scalable to play the game of perpetual prioritization, and sometimes security teams need to think more holistically, rather than only work on the outstanding backlog that has been created through various sources. Below are a few approaches to tackling the backlog:
Take a breather from Whack-a-mole
The constant balance between the various remediation strategies may seem like a journey that never ends. Sometimes it is important to take a pause and invest in strategic remediations rather than tactical ones.
For example, eradicating a class of vulnerabilities through a deeper change (such as moving from deny listing to allow listing) may fix a large chunk of the backlog in a single sweep. The security team needs to analyze issues for patterns that are best resolved through changes that eliminate all related issues and not just one. Changing focus from individual tickets to investing in architectural changes is important in order to move away from the feeling of playing whack-a-mole in the security space.
Find the problem sooner
A problem reported as soon as it is introduced is much simpler to fix, than one that is reported months later. A security team should make capability investments in order to report security issues sooner, which increases the likelihood of the risk being fixed, timeframes being met and the issue not even reaching the customer in the first place.
This can be done by having a well defined Secure Development Lifecycle (SDLC). As a part of the SDLC, the security team can scan for vulnerabilities, secrets in code, run DAST scans, find cloud security & host gaps, etc at a regular cadence. Performing consistent and regular checks at scale will catch issues a lot sooner than security activities that are undertaken infrequently and inconsistently.
By introducing checks for the most crucial security mistakes in the build pipeline itself or as a part of a security sign off, the time to find the issue and get it remediated will be reduced even further.
Create a curated, engineering-facing backlog
There are security issues and then there are issues that really matter. It is important for the security team to be aware of the entire backlog. Some engineering teams may like to see such a view as well if they plan security work as a part of their regular sprint. However, if you are working with a team that only wants to see stuff that matters, then it is perfectly fine to create an engineering-facing dashboard of prioritized security issues only, thereby killing the noise that can overwhelm teams. These would be a combination of risk, SLA and customer centric issues that have been curated by the security team.
Security teams should help prioritize the backlog and engineering teams should readily agree to fix the issues. Once an agreement has been negotiated, it is important that the remediation plan be executed as planned. It helps to document and sign-off on any delays using formal exceptions to drive accountability. Deviations from the plan should be an exception rather than the norm, and non-execution against the plan should have leadership’s acceptance of risk as well.
A security issue’s life cycle starts when it is found, then it is duly prioritized to meet different stakeholder’s expectations and lastly, it is fixed for the customer to benefit from. A security team should not be overly focused on the task of enumeration alone. Understanding expectations around remediation and orchestrating issues is an important function of the security organization. Further, to relieve the bandwidth pressures on both the engineering and security teams, it is important to subscribe and execute upon plans that bring structure and long-term thinking to remediation. Playing whack-a-mole is fun, but it gets tiring very quickly for teams that have features to fix and products to ship.
As a VP of Product Security at Sprinklr, Mohit Kalra leads a team that is responsible for securing the product from application all the way to infrastructure. His team manages the end-to-end Secure Product Lifecycle, bringing in capabilities and expertise to Sprinklr teams that ensures security is proactively baked into the products. The team manages security activities such as architecture reviews, threat modeling, security tooling, cloud security architecture assessment, vulnerability disclosure program, pentesting, SecChamp program and more to ensure Sprinklr's security program has technical depth and provides a risk-first, scalable approach to product security.