12 min read

Integrating IT and OT Security - A 6 Step Cycle

Rob Labbé : Feb 16, 2023 12:00:00 AM

Guidence OT

Integrating IT and OT Security - A 6 Step Cycle

For those of you who missed our first webinar of the year, I discussed a process for integrating IT and OT security —specifically, extending your IT security program into OT. While this approach is designed around mining, and I'll use mining as the example industry, the approach should generally hold up across critical infrastructure and other industrial sectors.

Why Integrate IT and OT Security?

Traditionally, even in companies that derive most of their revenue and carry most of their risk in their operational technology environments, cybersecurity teams are scoped down to IT networks only, leaving OT out of scope. Once upon a time, this approach made sense. In the early days of OT, OT networks relied on arcane, proprietary, and specialized networks and technologies. Today, that has evolved.

Current OT networks rely on standard IT technology – Windows and Linux, IP networks, virtualization technology and, increasingly, the cloud as foundational technology. With the commoditization of OT, it should come as no surprise that OT impacts resulting from IT cyber attacks are becoming increasingly common across critical infrastructure and other OT domains.

With this change comes significant strategic and tactical benefits to extending your IT security capabilities to your OT networks, which require breaking down any silos that may exist. These include:

Streamlining detection and response – The vast majority (well over 90%) of attacks that impact OT start in IT land, whether through a phishing email or other means. The integration of IT and OT security enables attacks to be tracked across both domains. Unifying detection and response across those domains will only shorten the time to detect, respond to, and remediate an attack, reducing the overall impact on the business.

Make the best use of limited human resources – Integrating IT and OT security will help you optimize your security team. It is challenging enough to recruit and retain skilled security teams in major cities; this problem compounds significantly in remote rural areas where mining operations are often located. In addition, by extending your IT teams into OT, you provide an avenue for growth, continued learning and the ability to focus on the key risks the company faces. This will help you retain the security talent already in your organization.

Optimize investment in security tools – Integration between IT and OT security enables you to extend your tool investment across both environments. Not only will this lead to significant efficiencies over using separate toolsets for security, but with good planning, you may also find security toolsets can be used within the operation to help solve operational challenges.

Simplified, centralized risk management – Risk management in OT is always challenging; however, the vast majority of mines and plants have very mature risk management programs in place. By integrating security into OT, you can begin to identify and incorporate OT security risks into the existing plant and enterprise risk management systems. This centralization of risk (or at least the communication of risk in the same terms) can only lead to a better understanding of cyber risk across the company, thereby improving capital allocation decisions.

The Path to Integrating IT and OT Security – Security at the Speed of Trust

Integrating IT and OT security is not a simple task. One of the things that makes it more challenging is the tendency for most security practitioners to approach the issue as a technical challenge. However, OT security is, in my opinion, one of the most human of all cyber security domains. Even with strong senior executive sponsorship, many OT security integration projects have failed due to this misaligned approach.

Building trust is essential to reducing safety, environmental and other acute risks inherent in most industrial environments. If you approach IT/OT integration otherwise, you are setting yourself up for failure.

Step 1 – Building Relationships and Personal Trust

Before you can begin to influence, recommend or discuss security at site – let alone dictate security – you need to earn your right to be present at the table. That starts with building strong personal relationships which are critical to your ability to build professional trust. This may take considerable time, particularly if the site is in another region, with language and cultural differences to bridge.

OT security is a ministry of presence. The first step is showing up and not just for an hour or a day, but being present for extended periods – a week or more, multiple times. Your goals during this time have nothing to do (directly) with security, but rather:

Learning the process: Understand the process from beginning to end. For example: Where do the required inputs (raw materials, fuel, parts) come from? How does the finished product get to market? Are there any seasonal pressures on production? Where are the bottlenecks? Where are the largest safety and environmental risks?
Getting to know the people: Get to know all the key people at site, not just the general manager and senior leaders. Get to know who the key influencers are, who the site process control savant is. What are people most proud of? What keeps them up at night? Buy coffee, lunches and drinks – and listen.

I can't overstate the importance of this step. It will take as long as it takes. In dangerous and high-risk environments, people rely on people, especially those they know and trust.

Step 2 – Establish Trust in your and your Team's Abilities and Approach

Once you have built the necessary relationships and you have earned the right to have a seat at the table, you can now start to talk about cyber security. This is the time when you can identify some "quick win" security projects. You will no doubt see all sorts of things that are not "right"; security professionals often approach a problem knowing the "right" way to do something. However, in OT, the "right" way is often not the correct approach. Some examples of where you might want to initially tread lightly include:

Patch management: In IT, we are used to being able to apply patches very frequently — often monthly or faster. In OT, because of the focus on reliability, stability, and safety, you'll find your windows to patch being much less frequent, perhaps quarterly or even annually, as well as at a much higher bar to test. Assume this to be the case, and work out your compensating controls.
Legacy systems: We have largely rooted out legacy OSs and Systems in the IT world. However, it is not unusual to find a 20-year lifespan on equipment in OT. Because of this you will find Windows XP (embedded and full OS) or even older OSs in OT. Before you speak about the dangers of legacy OS, and push management to replace, take the time to learn why that OS is still there, why it has not been replaced before now. Once you learn that, you will probably discover a more pragmatic approach is needed.

The best candidates for initial security projects are not security projects at all, rather they are projects championed by the site to improve stability, reliability, availability and safety that also have happy security side effects. If you have done a good job learning and building relationships in the first phase, you should have come up with a list.

Step 3 – Cover your Assets

Once you have had some initial success and demonstrated your understanding of, and focus on, the site's operational needs, you are ready to build and deploy your security program. The next four steps focus on that program.

Effective asset management is important to any cyber security program. This is even more critical in OT environments. Take the time to build an effective asset inventory. In IT environments, you will often find this in a Configuration Management Database (CMDB) or similar system. In OT you will find it much more varied. Often you will find asset lists in a variety of places – maintenance systems, Excel files, SharePoint lists, somebody's whiteboard, etc. Don't expect your source to be the master, rather work with the site on an effective process to get the information from the system of recording into a place you can work from.

You will also find that these lists will be incomplete. Take a few site visits and look for assets. Some methods of looking for assets:

War walking/war driving: Look for unexpected or unusual wireless networks set up at a departmental level or by vendors and service providers at site.
Use passive network monitoring: Look for new IP addresses/MAC addresses popping up on the network and chase those down to their asset. There are a number of tools designed to do this passively in OT networks off a SPAN port, however with good logs you can do this yourself as well. (Note: Never use an active scanner in OT environments. Many older PLCs and other OT devices in layer 0/1 cannot handle the unexpected network input and may crash.)
Don't forget SHODAN: It is not uncommon for vendors to supply equipment with SIM card slots, connected to the LTE network for management and monitoring. Sometimes these connections are in addition to your industrial wireless network, sometimes they are instead of your own industrial network. Shodan still remains one of the best places to find these devices.

Step 4 – Quantify Risk

Before you can begin any security remediation projects, or really even start deploying a detection and response capability, you must understand and communicate risk.

This step is critical to continuing to build and/or maintain the trust of the site’s leadership. It is important to remember that one of the central superpowers of a site general manager is the complete immunity to fear, uncertainty and doubt (FUD). These people get told every day about how the sky is falling and how the plant will come to a stop if they don't invest in x, how the union will strike if they don't do y. Communicating vague "high" and "critical" risks will accomplish nothing for your program or your reputation.

Select a model like FAIR to assist you in getting risk quantified into an accurate number ($) that all businesses can relate to. Make sure you involve the business in all the inputs to your model, so that when, during a site leadership meeting, the general manager asks about the conclusions, all his/her senior staff can say they provided the input numbers.

Use that risk framework to justify all your cyber security projects, be them remediation/improvement projects or even the extension of your team's detect and response function into the OT environment (our next step).

Step 5 – Extend your Prevent/Detect/Respond TTPs into OT

It is only once you have built the necessary relationships and trust, gained sufficient understanding of the process, technical environment and risk profile that you can really start to extend the day-to-day security function into the site.

Runbooks/Playbooks

Start with adapting your IR runbooks and playbooks for OT. Some of the things to consider include:

Who are the process experts at each site? How will you contact them in case of an incident? Who are their back-ups?
What is the process to contain a machine? Most playbooks for infected machines involve some sort of network containment. Often in IT networks, this is fairly low risk, however, in OT environments this can be very high risk. The containment of a system could eliminate a critical production or safety system.
What are the optimal forensics processes? Most IT systems run with sufficient headroom to allow for active live forensics activities. This cannot be assumed to be the case in OT. Running forensics on live, running systems may cause instability and unpredictable behavioural impacts – both undesired in OT environments.
How do you respond to the worst case? In the worst case, if safe production cannot be assured, who has the authority to approve a site shut down? What information will they need to make that decision? What situations and circumstances would lead to that action?

Log Data

Critical to detection and response processes are logs. The collection of logs in OT can be much more challenging than in IT environments, as many of the system components do not generate log files. However, working to your advantage is the clear text, full trust nature of most OT network traffic. Good network logs can give great visibility into what is happening on an OT network, particularly in zones 0 and 1.

Getting those logs from OT may be tricky. There may not be switch capacity for SPAN ports, there may not be dark fibre available to run that traffic to your systems, there may be a general resistance to "touching" the OT network equipment. Again, this is where your relationships can help.

Many control systems engineering teams struggle with solving intermittent issues in the control system and lack the tools to allow them to look at large volumes of network data to find anomalies. The good news is, this is where a lot of OT specific and even general IT security tools shine. Offer the industrial networking and control systems teams access to your anomaly detection or other toolsets, give them dashboards that help them find the operational anomalies in the network traffic. Give them that ability and they will move heaven and earth to get that network traffic data to you.

Tools

Once you have sorted out log data and your processes, you can select tools that will support those processes and consume that data. Why did we leave tools to last? There is no point in buying expensive tools when you don't have the processes you want to enable figured out or you don't have quality data to feed them. When looking at OT environments there are two major categories of tools you'll be looking to purchase initially: Endpoint and Network.

Endpoint

When looking for endpoint solutions, there are a few workable options to choose from. This is an area where there are huge benefits to having a consistent platform across IT and OT, and the natural desire will be to pull the EDR solution you have in place in IT into OT. This is a good approach if your solution meets the key requirements for an OT EDR solution:

On-premise controller options: All modern EDR solutions require access to a central controller. Most of the time, this is designed to be a cloud controller – a perfect solution for most in IT – particularly in this era of hybrid and remote work. To be effective in OT, the solution will need to offer either an on-premise controller option or some sort of proxy solution that can be deployed at the appropriate place; often the OT DMZ(s) to allow for communication with the OT endpoints.
Feature control: As you deploy in OT there will be some features, particularly those in the prevent and respond categories, that will be high risk. You are going to want to be able to shut off features that do highly intensive forensics or process manipulation, like automatic containment or process blocking, to enable a smoother initial deployment in a "detect only" mode. Then come in behind and (after extensive testing) start slipping in those features one at a time, in a highly tuned manner.
Transparency: As you deploy the endpoint solution, you are going to need to deploy to highly safety-sensitive areas. There is a good chance deployment into these areas will require engineering sign off. You will struggle to get your engineer of record to sign off on a black box. You need a vendor willing to disclose enough of how it works, how the solution is tuned, etc. to get that engineering sign off.
Release control: EDR agents and solutions need to get updated – often. However, particularly in areas that need engineering sign off, you may need to have an alternate release schedule to allow for additional testing of the new agent in the OT environments. Therefore, you need an EDR solution that will allow for agents to be in assorted release states, from current in IT environments to as much as a year old in some OT environments. In OT, tools like SCCM may not be available which means flexible ways to update in different environments will be important.

Network

Network security systems have come a long way in OT. You will probably want to select an OT specific solution to monitor the OT network traffic, with bonus points if the solution also can be helpful in IT. Again, there are multiple options in the space. Some key features to look for on your shopping list include:

OT protocol aware: The solution should be aware of all the OT protocols used at your sites. The ability to deconstruct that OT traffic and identify anomalous behavioural actions is critical.
Useful for operations and security: Not all OT anomalies have security impact. Many are the root cause of operational challenges. A solution that provides the ability to construct non-security use cases, dashboards and alerts for the site process control teams will have much more support at site then a solution that is only for security.
Asset discovery and management: Given the challenges of asset management in OT environments, where the traditional IT discovery processes based upon port scanning are ill-advised at best, passive discovery capabilities in OT network security products can help you identify the assets, the asset type, and even in some cases firmware information.

In all cases, regardless of the solution you choose, deploy slowly, conservatively and incrementally. Start with the deployment of solutions to the lowest risk environments, in as passive a configuration as possible, and increase coverage first followed by capability. Remember that a solution deployed to 100% of the environment with 20% of the capabilities enabled is still better than no solution at all. And, it gives you a solid basis for expanding capability as you mature.

Step 6 – Testing and Validation

Once you get a site deployed, with your defensive TTPs and tooling in place, the final (and often skipped) step is validation. In the validation phase, you bring it all together with site leadership, integrating your processes into the site's emergency processes and testing them through a tabletop exercise.

All mining sites will have a site-based emergency response plan, covering a wide range of site issues including fires, geotechnical issues, environmental issues, protests, etc. It is important that your incident response plan integrates into that site ERP. That may include a small section for a cyber incident that delegates to the cyber incident response plan, or it might be more complex. Either way, it does need to be in there.

Once the plan is in place, it is important to test the plan, at site, in a physical tabletop exercise working through a severe but plausible incident. This tabletop exercise should involve the general manager, senior site leadership, as well as key OT technical leads.

The goals of the exercise include:

Validating your incident response plan at site
Providing an opportunity for site leadership to experience a cyber incident and your cyber incident response plan
Ensuring effective integration into the site emergency response plan, in particular for any cyber-physical impacts
Testing and validating chain of command and decision-making processes
Validating communications to corporate decision-makers as well as internal and external stakeholders

Be sure to incorporate the lessons learned from that exercise into both the site emergency response plan, as well as the cyber incident response plan and schedule recurring tabletops (at least annually or after major updates to your security or site landscape) to provide an opportunity to refresh and capture key lessons before an incident.

Finally, these steps are presented as a cycle. You never stop building relationships, building trust and optimizing tools and processes. As those relationships get deeper, your will be able to deploy more and more robust and capable solutions.

Summary

Integration of IT and OT is a critical step for holistic security and risk management for all mining and metals companies, as well as any enterprise that derives the majority of their revenue and/or hold significant risk in the OT environment.

However, willing it to be done will not get it there, as this is not largely a technical issue – it is a people and trust issue.

Take the time to build trust and relationships before attempting major technical and process changes – be a student of the process and site. You can only advance your technical and security objectives when you have a license to do so. That license cannot be given by senior leadership, it must be earned through trust.

Making Your Operation More Cyber Resilient

Rob Labbé : Jun 30, 2024 12:00:00 AM

In last quarter's article, I discussed the process for established IT Security teams to expand their influence into OT. This quarter I want to give...