February 2016

Process Control/Instrumentation

Enhance PSM design with metrics-driven best practices

This study shows the attributes and benefits of a data- and metrics-driven system focused on the process safety design integrity, reliability and control of process plant flares and pressure relief systems.

Marshall, M., Michael Marshall LLC

This study identifies the attributes and benefits of a data- and metrics-driven management system focused on the process safety design integrity, reliability and control of process plant flares and pressure relief systems. This process safety management (PSM) system approach focuses on the four key business drivers of risk, regulatory, operations and profits, and involves several distinct business methods involving people, processes and tools/technology. At the center of the management system is the unique design and implementation of metrics and key performance indicators (KPIs) created from data that is lifted and aggregated from an enterprise asset management platform.

The current regulatory climate

The highly publicized incidents at BP Texas City, Texas in 2005; Tesoro Anacortes, Washington in 2010; and Chevron Richmond, California in 2012, occurred not because of a singular failure of equipment, instrumentation, facility siting, operator, procedures, communication, supervision or training, but rather a failure of a combination of all those things—i.e., a management system failure.

The BP, Tesoro and Chevron incidents are now driving the reexamination of the PSM rule by the US regulatory community. The US Chemical Safety Board (CSB) has taken notice that US oil and gas industry losses are the highest among any industrial sector, as well as the fact that the US refining industry accident rate is 3 to 4 times higher than in Europe.

The PSM rule and its allegedly “less-rigorous regulatory framework” are quickly falling out of favor with regulators. As such, the attributes of the “safety case” and as low as reasonably practical (ALARP) regulatory regime currently in use throughout the UK, Australia and Norway are now being advocated by the CSB. More notable is California’s proposed regulation for inherently safer design (ISD), an initiative that was endorsed by then-CSB chairman Dr. Rafael Moure-Eraso, who suggested that other states do the same.

ISD has been hotly debated for years and would require that risk be reduced to the greatest extent possible with the selection and implementation of changes in chemistry and/or a change to process variables—e.g., the reduction in pressure, temperature, flows, etc. Unmistakably, this would take the petrochemical industry and its PSM approach from performance-based to prescriptive.

Before opting to prescriptively rewrite the PSM rule, it is suggested that a focused metrics-driven management system approach is more sensible, productive and achievable in the short term. Such an approach also embodies the core principles of the PSM rule and is consistent with the findings and recommendations of the 2007 Baker Panel Report (Table 1).


It would seem that the Baker Report is prompting a revisit to the PSM rule for intent and direction, as well as for the proper administration of PSM—the effective application of a management systems approach to continuously improving our process safety environment and culture. The authors of the PSM rule took great pains to make it a performance-based standard for a reason (prescriptive is inherently inferior), so it should not be abandoned now.

Revisiting PSM, management systems and continuous improvement

With the promulgation of the PSM Standard 29 CFR 1910.119, the US Occupational Safety and Health Administration (OSHA) mandated that a management system comprising several well-defined elements be established “for preventing or minimizing the consequences of catastrophic releases of toxic, reactive, flammable or explosive chemicals.” The process safety information (PSI) element of the PSM rule states, “The employer shall document that equipment complies with recognized and generally accepted good engineering practices (RAGAGEP),” with specific reference given to “relief system design and design basis.”

Although OSHA does not explicitly use the term “continuously improving” in its regulatory standards, it uses equivalent terms such as “accurate, complete, clear and ongoing,” as in the Appendix C compliance guidelines of 29 CFR 1910.119, which uses the term “complete and accurate” in lieu of “continually improving.” Likewise, for the mechanical integrity element of 29 CFR 1910.119, OSHA uses the term “ongoing” to describe the expectation to continually improve.

Recent incidents and enforcement actions demonstrate OSHA’s expectation for operating plants to maintain a continually improving PSM system. In 2007, OSHA initiated its National Emphasis Program (NEP), a special enforcement initiative specific to refineries and chemical plants. Of the citations issued, many involved missing, inaccurate and incomplete process safety information, as well as outdated relief system studies.

A management system for flare and relief systems

Flare and relief system design compliances are also assessed annually in ever-increasing detail for participants of OSHA’s Voluntary Protection Program (VPP). The environmental enforcement aspect to flare operation and systems management continues to be relevant. Notwithstanding social responsibility, it is just good business to develop a management system that not only enhances safety and environmental protection, but also augments asset protection. Safety and environmental stewardship are of paramount importance, but asset protection, business continuity and public image also have vital significance in any business environment.

Knowing what data to capture and display is essential to proper metrics development and analysis, and the ensuing derivation of KPIs. Fig. 1 illustrates the who, what, when, where and how of doing just that with a focused, metrics-driven flare and overpressure management system (FOMS).

  Fig. 1. Capturing and displaying the correct
  data are essential to proper metrics
  development and analysis.

Associated FOMS benefits

It goes without saying that flare and relief system design and operation have come under intense scrutiny by OSHA. With the US Environmental Protection Agency (EPA) now getting into the act, it would seem that regulators are looking for a new “3% inlet pressure drop (IPD)” soft spot, and have found it in flare system operation and management.

The vast majority of gas flaring is associated with plant upset, poor operation or imbalance, and, as such, is unplanned and subject to regulatory penalty. The EPA is now aggressively mandating and enforcing flare management plans (FMPs) and flare gas recovery systems, just as OSHA enforced relief-system pressure relief analyses (PRAs) and the 3% IPD rule.

It has become clear that this double-barrel surge via OSHA and the EPA has arrived and is progressing rapidly. It may seem that the industry is under assault, but the ultimate truth is that process safety just makes good business sense.

When considering a performance improvement program in this highly regulated process safety environment, four key business drivers should be considered: risk, regulatory, operations and profits. Building a focused flare and relief systems management process around those four drivers involves a unique management system structure of people, processes and tools/technology.

Central to the growth and continuous improvement of those three elements will be the proper design and implementation of metrics and KPIs. The institutionalization of KPIs and the subsequent reporting and action planning process will drive the continuity and sustainability of the plan-do-check-act rudiments of this management system approach.

Sustainability through KPIs

The PSM standard is exceptional in its vision, design and implementation, but it could have been made better by the inclusion of metrics and KPIs. It is often said, “If it can’t be measured, it can’t be managed,” and this is likely a reason why so many PSM programs have failed to grow and measure up to industry best practices and OSHA expectations.

KPIs are critical to a properly designed management system in that they institutionalize processes and drive accountability, providing continuity and sustainability. An effective KPI system and data mining process takes into consideration business drivers, success factors, targets, improvement actions and performance measures. However, knowing which metrics should be funneled into KPIs is the challenge.

It would now seem that API 754 was written only to gauge the “high-level” effectiveness of PSM programs. The opportunity still remains for further development of focused metrics that further drive performance improvement in areas like flare and relief system design and operation, among others. Correspondingly, the CSB has characterized the shortcomings of API 754 as follows:

  • Tier 1 and 2 numbers are lagging indicators and thus of limited usefulness as performance indicators.
  • The statistical power of small numbers of Tier 1 and 2 events is insufficient to detect effect.
  • Tier 3 and 4 events are leading indicators that are reflective of process failures, yet they are not publicly reported and utilized for industry trend analysis and benchmarking comparisons.
  • Employee participation was insufficient in the development process and thereby lacking in a broad-based consensus.

Industry can be even more critical and innovative by utilizing historical operations, reliability and maintenance data in analytical tools and performance metrics to create a competitive environment for improving plant reliability and profitability. The word “competitive” should be stressed so this plan-do-check-act process will drive itself and grow by fostering a healthy and productive incentive among stakeholders for continuous improvement in reliability, profitability and, most importantly, process safety.

Refining the development of KPIs

The idea is to tap into the data-rich potential of an enterprise asset management (EAM) system. From this data and informational structure, the 20% of data that 80% of operators, engineers, managers and execs want to see is extracted, with the challenge being identifying that 20% of key information. Beyond that, further consideration is necessary for the more refined development of KPIs, which then provide the need-to-know requirements of stakeholders at a “dashboard” level of awareness.

What so many KPIs fail to do is drill down deeply enough to facilitate the identification of basic and root causal factors associated with problem solving for optimal performance. There can be too many of these focused metrics, and the pitfalls are similar to usability problems associated with multiple alarms sounding during a process unit upset, commonly referred to as “alarm flood.” Just as with too many alarms, poorly designed alarms and improperly set alarm points, metrics flood and confusion can set in and negatively impact the problem-solving process.

Proper development, implementation and management of metrics and KPIs should involve many of the same concepts utilized in alarm rationalization and management—it is more of an art form than many realize, requiring critical thinking and strategic design aptitude that draw on a frontline-to-exec level of appreciation for what “good” looks like. This is what the FOMS developers had in mind for a management process focused on flare and relief systems.

Too often, the process of data gathering and metrics reporting is more about presentation than substance and lacks real problem solving and process optimization potential. The metrics and KPIs of an FOMS are specifically designed for problem-solving performance improvement issues at the basic and root cause levels, and they are built around the business drivers of risk, regulatory, operations and profits (Table 2).


Management system design and implementation

Again, PSM was conceived out of a management system mentality of a plan-do-check-act cycle with continuous improvement at its core. The focused, metrics-driven management system of FOMS follows this same model and function, illustrated in Fig. 2. In application, it begins with a four-phase development process: Where are we now? Where do we want to go? How are we going to get there? 

  Fig. 2. The FOMS follows the plan-do-check-act cycle.

Phase 1: Where are we now?

  • Identify and engage process owners and stakeholders
    • A changing PSM and PRA landscape
    • BP, Tesoro and Chevron incidents are driving reexamination of PSM rule
    • US refining accidents are three to four times that of Europe
    • Safety case and inherently safer design/technology (ISD) are gaining favor with regulators
  • Compile available documents and information
  • Flowchart current processes, tasks and procedures
    • PRA methods and processes are now mature
    • PRAs giving way to enhanced auditing, mini-PRA tune-ups and management of change (MOC) processes
    • Are more processes needed to ensure PRA integrity?
    • Intense regulatory scrutiny remains: risk, regulatory, operations and profit drivers
  • Identify current tools and technology
    • PRA science and technology are still evolving
    • There is little in the way of PRA-specific management systems tools/information technology (IT)
    • There is a lot of IT structure in need of management system content and integration
  • Understand strengths, weaknesses, opportunities and threats (SWOT)in existing processes.

Phase 2: Where do we want to go?

  • Engage process owners and stakeholders for vision, objectives and value drivers
    • Business focused without putting safety second
    • Team environment, but competitive
    • Problem solving
    • Communities of practice and pride
    • Knowledge managers, not tribal
    • Bottom-up, top-down, inverted pyramid with “closest to the work” mentality
    • Measurements, accountability and rewards
  • Baseline processes and perform gap analysis
  • Evaluate gaps and tradeoffs (costs)
  • Redesign processes and functionalities
    • Think like an operator, manager, regulator
    • Metrics and reporting, KPIs
    • Ongoing gap analyses, data centric
    • Expert systems to automate
    • Integrate with existing systems, customizable
    • Process optimization and profits
    • Better manage and control change
    • Enable regulatory compliance; safety case and ISD
    • Cross-organization integration and collaboration
    • Focus on operations workforce
    • Standardization and consistency
  • Specify tool and technology needs
    • Workflows
    • Protocols and practices
    • Portals and links to data and systems
    • Data repositories
    • Search engines and links
    • Dashboards, scorecards, forums
    • Executive dashboards
    • Document management
    • Training and more training, e.g., computer-based
    • Enterprise discoverability and sharing
    • Design to drive sustainability
  • Develop project plan and prioritize.

Phase 3: How are we going to get there?

  • • Identify needs and objectives
    • Define “good” or “where we want to go,” and plan a path forward
    •  Critical focus on management systems design and implementation, people, processes, tools/technology
    • Strategic focus on gaps, soft spots and critical systems within operations, maintenance and engineering organizations, corporate
    • Content, the 20% of data that 80% of stakeholders want to see (strategic and customizable KPIs, data maps, scorecards, dashboards, reports, data portals, alerts, analyses and trends)
    • Design to involve only new processes and tools, not new labor
  • Develop strategic purpose
    • Maintain a business perspective on everything, including process safety
    • Tightly integrate strategy and tactics with business processes to be self-sustaining
    • Ensure that organization and systems are designed to enable execution of business processes
    • Showcase new philosophy to inspire personnel at all levels
    • Design for employee involvement and buy-in at all levels, and make it competitive
    • Integrate with existing assets, programs and systems
    • Get KPIs in the hands of those closest to the work, i.e., those most able to affect change
    • Connect enterprise performance measurement with budgets, reviews and bonuses
    • Make everyone an ambassador, especially regulators
    • Design metrics to be what operators, managers and executives want
    • Adapt and make compatible with outsourcing applications of IT and engineering services
  • Establish team leadership and governance
    • Understanding and leveraging nuances of culture is vital
    • Plant environment (operations, maintenance, engineering, corporate)
    • Cost/safety prioritization, RAGAGEP benchmarking
    • Address internal competitiveness and silos
    • Integrate with RBI and PSM offerings, leveraging IT and maximizing synergies
    • Get regulators on board, such as local and state OSHA, EPA
    • Communicate to and involve everyone at all levels, and invert the hierarchical pyramid (Fig. 3)
  • Perform root cause analysis
  • Design metrics, KPIs, reports, automation tools
    • Design to drive sustainability (training, auditing, certification, profits)
    • Integrate with existing IT structure and software, synergies
    • Provide for enterprise discoverability and sharing
      • Leverage EAM platform and integrate with PRAs
      • Asset integrity management systems
      • Continuous emissions monitoring systems (CEMSs)
      • PSM suite of software
      • Digital control systems
      • Process instrumentation
    • Remember that IT and software prowess need content in cohesive processes, an FOMS
    • New/improved software solutions, business methods and Internet innovations
  • Initiate training programs
  • Implement transition plan, pilot and then rollout
    • Compatible with third-party applications, software and systems
    • Leverage synergies/overlaps with PSM, equipment inspection and reliability programs, e.g., RBI API 580/581, especially damage mechanisms (API 572) and mechanisms contributing to the loss of primary containment (LOPC)
    • Flare/relief system specific programs for mechanical integrity, MOC, incident investigation, procedures, PSI and other PSM elements.


  Fig. 3. Inverting the
  hierarchical pyramid and
  involving personnel at all
  levels are keys to understanding
  and leveraging the nuances
  of company culture.

Phase 4: How do we improve, grow and keep going?

  • Implement and validate redesigned process
  • Initiate ongoing metrics and management systems
  • Monitor, evaluate and report on new processes
  • Review targets and performance
  • Audit and adjust for continuity, sustainability and growth.

A close second

A close second to an FOMS, however, would be a focused, metrics-driven management system approach addressing mechanisms contributing to LOPC. LOPC is preventable, and equipment reliability relative to process safety is by far the leading risk opportunity and ongoing business concern facing the oil and gas industry today. The same personnel, processes and tools/technology (software and EAM) structure and methods employed in an FOMS can be easily adapted for an LOPC-focused initiative. This strategic initiative also involves the same business drivers of risk, regulatory, operations and profits.

Industry can also be much more critical and innovative in responding to LOPC incidents, data and metrics with enhancements to mechanical integrity proficiencies relative to inspection, maintenance, design and overall systems management. Historical operations, reliability and maintenance data can be better utilized and managed with analytical tools and performance metrics to determine needs and risk exposure, provide direction and address opportunistic reliability issues. A refining-specific incident and loss database, as well as an optimization methodology (utilizing RCFA) that quantifies the economic impact (dollars in lost profit opportunity) of equipment anomalies, LOPC incidents and upset/malfunction operating conditions, has been developed and put into practice. This approach includes a much more critical focus on inherently challenging API 754 process safety event (PSE) LOPC metrics relative to damage mechanisms; operating envelopes; and consequences of deviation, procedures, design and training. HP

The Author

Related Articles

From the Archive



{{ error }}
{{ comment.comment.Name }} • {{ comment.timeAgo }}
{{ comment.comment.Text }}