November 2024
Special Focus—Process Controls, Instrumentation and Automation
The use of AI in process plant control: A proof of concept
The goal of this article is to highlight how: Supervised machine-learning and the Internet of Things can be used to develop a soft sensor that can enable the plant operator to continuously monitor product analysis, as well as gain insights into the root cause of off-spec product—in this work, the off-spec product is distillate. Also, this article will show how reinforcement learning can be used to train an AI agent to operate the plant.
The artificial intelligence (AI) revolution has impacted numerous industries, including chemical plant operations. AI technologies—such as machine-learning (ML), natural language processing, computer vision and robotics—are being used to improve efficiency, automate tasks, enhance decision-making, personalize customer experiences and drive innovations across various sectors.
AI is being used in chemical industries for process optimization, predictive maintenance, quality control, supply chain management, safety monitoring, product development and environmental monitoring, among others.
Objective. The goal of this article is to highlight how:
- Supervised ML and the Internet of Things (IoT) can be used to develop a soft sensor that can enable the plant operator to continuously monitor product analysis, as well as gain insights into the root cause of off-spec product—in this work, the off-spec product is distillate.
- Reinforcement learning (RL) can be used to train an AI agent to operate the plant.
CASE STUDY
A simple case study was conducted to elaborate the concept of AI usage. The purity of distillate from a distillation column depends on several factors, including the design of the column, the composition of the feed, operating conditions (e.g., temperature, pressure) and the efficiency of separation achieved by the column. The composition of distillate was analyzed through various techniques such as chromatography or spectroscopy, mass spectrometry and titration methods.
A diagram of the refinery’s distillation column is shown in FIG. 1. Traditionally, analyzers would be installed to sample and check the quality for Outlet Streams 302, 307, 313 and 314—these are highlighted in blue in FIG. 1.
FIG. 1. A diagram of the refinery’s distillation column.
Concerns. Analysis methods for assessing distillate purity can be time consuming and require specialized equipment and expertise. Additionally, these analytical techniques may be costly if they require expensive instrumentation. Using standard methods, certain impurities may be challenging to detect or quantify accurately, potentially leading to an underestimation of impurity levels. Even sample preparation and handling can introduce errors or variations.
Proposed solution. One proposed solution to measure the purity of a distillate is to develop a sort of soft sensor that can collectively sense the parameters—i.e., a relationship between the distillate and bottom product purity to the column temperature profile. A mathematical equation can be developed and fed to the distributed control system (DCS), where the plant operator can continuously review the analysis and gain insights on a dashboard. This could have the following advantages:
- The analysis can be seen by the operator on a real-time basis.
- Analyzers are not required in many scenarios, which saves on cost.
- The hassles of laboratory analysis and periodic recordings can be avoided, saving cost. This may require a revision in the standard operational procedure, which requires frequent sample analysis.
- Human operational errors in analyzing purity levels are minimized.
- An operator’s corrective action can help improve the quality of the distillate.
- The process will achieve energy savings in terms of optimum reflux adjustment and bottom sump steam optimization, which is a step toward sustainability.
The other proposed solution utilizes deep RL to automate the plant’s operation.
The ML approach. Assuming that the design of the column and the composition of the feed are constant, it is understood that the quality of the outlet depends on the temperature profile of the distillation column, collected at the temperature elements positioned as close to the streams as possible. Therefore:
Quality of stream 302, Q302 = f{T1, T2, T3, T4}
Quality of stream 307, Q307 = f{T1, T2, T3, T4}
Quality of stream 313,Q313 = f{T1, T2, T3, T4}
Quality of stream 314,Q314 = f{T1, T2, T3, T4}
A vectorized notation for the above relationships is provided below:
For the sake of simplicity, the following will concentrate on stream 302 from FIG. 1. This approach can simply be replicated for all other streams.
ML is a subset of AI that focuses on the development of algorithms and models that allow computers to learn and make predictions or decisions based on data without being explicitly programmed for each task. Therefore, classic ML can be used to map many examples of temperature profiles of Q302.
To demonstrate a simple linear regression, temperature data that is normally distributed about the operating temperature of the connected stream, with a standard deviation of 10°C, was generated. The impurity was generated as some complex function of T1, T2, T3 and T4. Sample data is shown in TABLE 1.
Impurity 302 was plotted with respect to T1, T2, T3 and T4 in pair plots (FIG. 2). It must be noted that T3 and T4 are co-related to Impurity 302. Conversely, T1 and T2 do not seem to have much predictive power. Hence, T1 and T2 can be dropped from the final modeling.
FIG. 2. Pair plots.
With respect to T3 and T4, a scatter plot of Impurity 302 is shown in FIG. 3 for visualization purposes.
FIG. 3. Scatter plot of T3 vs. T4 vs. Impurity 302.
Finally, the regression model is shown in FIG. 4. This is provided by Eq. 1:
FIG. 4. Visualization of the regression plane.
The regression plane has the following parameters:
- Accuracy: 98.55%
- Intercept: 𝛼0 = –0.06917101278493731
- Coefficients: [α0 = 0.0021485 and α2 = –0.00179481]
Therefore, the predicted impurity level (Quality), Qˆ302 is provided in Eq. 2:
It must be noted that this is a miniature example, presented for demonstration only. In the real world, when data is available, the following steps should be followed to create a complete ML pipeline:
- Clean the data: This involves missing value treatment, outlier removal and dropping irrelevant variables.
- Apply dimensionality reduction: This involves reducing features via principal component analysis and factor analysis to improve computational efficiency.
- Feature engineering: Scaling the numerical data, creating dummy variables for categorical data and developing calculated features, if needed.
- Develop a prediction model as demonstrated above, but not limited to linear regression.
- Test the model with the data.
Eq. 2 can be fed to the DCS. If Qˆ302 leads to being off-specification, appropriate action(s) should be taken by the operator. An appropriate action can be suggested to the operator by using the IoT and cloud analytics. For example, if the model says the temperature T3 is the main contributing factor for off-spec Qˆ302, this can be indicated on the dashboard. The operator must act on this information. FIG. 5 provides the architecture for the IoT.
FIG. 5. A distillation column with IoT architecture.
Long-range (LoRA) communication protocol can be used to collect data and convert it to standard IoT messaging protocol, while message queuing telemetry transport (MQTT) can be used for cloud computing and analytics.
The RL approach. As opposed to ML, which works on knowledge (i.e., data), RL works on experience and learns by doing tasks. RL is a powerful paradigm in the field of ML that enables an AI agent to learn “optimal” behavior by interacting with an environment.
Unlike supervised ML, where the model was trained on labeled data, RL involves learning through trial and error, guided by rewards or penalties received from the environment.
RL can be used to optimize a chemical process in the distillation column. The task for the RL algorithm will be to learn the optimal control strategy (also called “policy”) for the distillation column based on the current state and the desired state.
The algorithm’s performance can be evaluated by using metrics such as total cost, the average cost per unit time and the stability of the control strategy. The formulation of the problem as a Markov decision process (MDP) is the starting point (FIG. 6). The following section reflects on the components of the MDP.
FIG. 6. The MDP.
Basic components of RL (MDP). The following are the basic components of RL:
- Agent: The learner or decision-maker interacts with the environment. The agent takes actions based on the current state of the environment. In this example, the agent is the AI system that is learning to operate the plant.
- Environment: The external system with which the agent interacts. It could represent the physical world, a game environment or any other system with defined states and rules. In this case, the environment is the distillation column. Here, T3 and T4 are the parameters that define the environment.
- State (s): A specific configuration or situation within the environment. It represents the current observable features of the environment that are relevant for decision-making. The purity of Stream 302 (i.e., Q302) is the state that could change due to action taken by the agent.
- Action (a): The set of choices available to the agent in each state. Actions can lead the environment to transition to new states. These may pertain to a set of control valves, whose actuation could lead to a change in the temperature profile.
- Reward (r): A scalar feedback signal received by the agent after taking an action in a particular state. It indicates the immediate benefit or penalty associated with the action. This can be decided as per a certain philosophy.
- Policy (π): A strategy or rule that the agent uses to select actions in different states. It maps states to actions and can be deterministic or stochastic. This policy is to be learned by the agent. Optimal policy is represented by π*.
- Value function [V(s)]: The expected cumulative reward that an agent can obtain from a given state under a specific policy. It helps the agent evaluate the desirability of different states. The optimal value function is provided by v*.
- Q-function [Q(s, a)]: This is similar to the value function, but it considers the expected cumulative reward for taking a particular action in a specific state. It helps in selecting the best action in each state. The optimal Q-function is provided by q*.
The objective was to maximize operations—V(s) -> v*(s) and Q(s,a) -> q*(s,a)—balancing exploration (trying new actions to learn more about the environment) and exploitation (leveraging known information to maximize rewards).
In the RL algorithm, all temperatures (T1, T2, T3 and T4) can be considered; however, to maintain consistency, only T3 and T4 are considered.
Pseudo code: The distillation process control environment and episode structure. The following is the structure of the code:
initialization // configure environment variables, adding to double ended queue
T3 = queue() // queue of the T3
T3.append(T3)
T4 = queue() // queue of the T4
T4.append(T4)
Q302 = queue() // queue of the Q302
Q302.append(Q302)
done = False
while not done do
CurrentT3 = T3[0]
CurrentT4 = T4[0]
CurrentQ302Stream = Q302[0]
Observe CurrentQ302Stream.state; CurrentT3.state and CurrentT4.state
choose whether to RegulateValve to improve CurrentQ302Stream.state
If chose to RegulateValve then
specify direction //CW or CCW
simulate Q302Stream.purity //based on data from digital twin.
reward = REWARD //REWARD to be chosen to make learning easy
for Stream ∈ (Q302Stream) do
if Q302Stream.purity >= PuritySpecification then
reward += Q302Stream.Purity
else // if stream is don’t meet product specification, then it may be further Regulated.
Q302.append(Q302Stream)
end
end
// Can now remove CurrentQ302Stream from Q302 as it is now attached to a column
Q302.popleft()
else
// if decide not to improve CurrentQ302Stream
Q302.popleft()
end
if Q302 is empty or max steps have been reached then
done = True // episode is over
end
end
The total episode reward can be calculated by summing the rewards over all time steps. However, the value of a state is provided by the immediate reward. The value of all future rewards—for an optimal policy
π∗𝜋∗
—of states downstream in the tree decision process structure—as shown by the recursively defined optimal value functions v* or q*, which satisfy the Bellman optimality equations—are provided here:
Here, E[] represents mathematical expectations, and 0≤γ<10≤𝛾<1 is the discount factor for future rewards R′R′.
Further entropy hyperparameter (H) of the policy at a given timestep and temperature hyperparameter (α) for controlling the trade-off between maximizing the reward and maximizing the policy’s entropy can be introduced to the above equations to create a more robust model.
In Python, “gym” can be used to create an environment, an AI agent with a set of possible actions and the states’ dynamics. Finally, a deep RL agent, employing deep Q-networks (DQNs), may be trained to control the quality in real time. The agent is trained on a real-time stream of data in online mode to learn an optimal control policy.
Here, an AI agent—more specifically, a proximal policy optimization (PPO) model1—has been trained for 100,000 timesteps, considering only T3 and T4 as environment parameters, and the state of Stream 302 was monitored. Rewards were decided arbitrarily as 1 when the stream was within specification and as –3 when it goes off-specification.
The episode length was 60 sec, and the mean reward in the episode is given by ep_rew_mean for all training episodes. The model converges after 100,000 timesteps.
Takeaways. The example in this article was for a distillation column, but this concept can be effectively scaled for various process metrics and control across a range of reactors, exchangers and other equipment. Considerations should be given to the components of RL, specifically in the context of the desired process control or optimization. The states and actions that an AI agent must map through within its learned policy must be carefully defined.
Importantly, it should be acknowledged that the agent cannot be trained directly on an actual plant. Instead, a more logical approach involves training the agent on a digital twin, and then reverse engineering the AI agent. Here, the digital twin is not to be confused with the soft sensor developed in the ML section. In fact, a digital twin is a virtual representation of a real-world object or system—in this case, the distillation column—that accurately reflects the physical counterpart and spans the entire lifecycle of the object. All this data is hosted on a cloud server, which can be used for training the virtual agent.
Another viable solution is to implement imitation learning2 [also known as learning from demonstration (LfD)], which is a method of ML where the learning agent aims to mimic the behavior of a human plant agent and/or a digital twin agent. Unlike traditional ML approaches, where an agent learns through trial and error, guided by a reward function, imitation learning leverages a dataset of demonstrations by an expert. The goal is to replicate the expert’s behavior in similar, if not identical, situations. In this work’s case, the digital twin agent serves as the teacher, and the actual plant agent acts as the imitator/learner.
REFERENCES
1 Schulman, J., F. Wolski, P. Dhariwal, A. Radford and O. Klimov, “Proximal policy optimization algorithms,” Cornell University, July 20, 2017, online: https://arxiv.org/abs/1707.06347
2 Ho, J. and S. Ermon, “Generative adversarial imitation learning,” Cornell University, June 10, 2016, online: https://arxiv.org/abs/1606.03476
Comments