Researchers from Siberian Federal University in Krasnoyarsk have developed a digital twin for the production of petroleum coke, a material critical to the metallurgical and energy industries. This intelligent system can produce highly accurate predictions of finished product quality, which could potentially help refineries save significant resources.
The monitoring of petroleum coke production has historically been one of the most challenging operations in refining. Coke is produced from heavy petroleum residues via a delayed coking process, and its quality (sulfur content, volatile matter, porosity and strength) has a direct impact on whether the product can be used as fuel or, among other things, for the manufacturing of aluminum electrodes. The problem is that laboratory analysis of these parameters takes up to two days. During this period, process conditions and feed characteristics can change, resulting in substandard product and direct economic losses. The existing calculation models are either too simplified and produce inaccurate predictions, or are complex neural network black-boxes that are difficult to interpret and adapt to new conditions.
The Russian researchers have proposed a different approach: a combinatorial digital twin based on physical principles. Instead of seeking a universal formula describing the entire process, they created a library of 32 mathematical models. Each model is designed to predict one of the eight key coke quality parameters, including sulfur content, porosity, mechanical strength and thermal conductivity. These models vary in complexity, from linear dependencies to more advanced equations that take into account feed composition, temperature, pressure and process duration. Since all of them are based on well-known physical and chemical laws, their operation remains transparent and readable for process engineers.
A key element of the system is a unique model builder that selects not just one model but an optimal ensemble of models from the entire library for every production situation. To that end, a two-level optimization scheme is used. At the first level, a genetic algorithm simulating natural selection sifts through millions of possible combinations. At the second level, the parameters of the selected combinations are fine-tuned to match the actual data. This takes into account not only the accuracy of the prediction, but also the computational complexity and interpretability of the result. An important feature is the consideration of relationships between parameters: for instance, the predicted porosity automatically affects the calculation of mechanical strength, making the final prediction physically consistent.
In order to train and validate the digital twin, the researchers created an extensive database, which included both real industrial data from refineries and synthetic sets simulating various operating modes and feedstock compositions. All data were grouped into five clusters reflecting typical production scenarios, from high-quality electrode coke to fuel grade coke from heavy high-sulfur feedstock.
The results showed that the ensemble models selected by the system consistently outperformed every single model in terms of accuracy. The average forecast error for key parameters in different modes ranged from 7.5% to 13%. This is considered an excellent figure for real-world production taking into account immediate forecasting.
Transparency remains the biggest advantage of this solution: the engineer can always understand why the system produced a particular forecast. In the future, this system could open the way towards real-time intelligent coking management, where adjustments can be made even before the cycle is completed. This means fewer laboratory tests, more consistent product quality, lower energy costs and a smaller environmental footprint from heavy oil refining.



