Submission declined on 13 June 2025 by Jlwoodwa (talk).
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
| ![]() |
Physics-enhanced machine learning (PEML) is an approach that combines knowledge of physical systems with data-driven algorithms, aiming to overcome the limitations of using either approach in isolation.[1] PEML incorporates known physical laws and domain-specific knowledge into the learning process, enabling models to match observed data while remaining consistent with underlying physical principles.
The term PEML is relatively recent and is used to describe a broad class of methods that combine physics and machine learning, including earlier techniques such as physics-informed neural networks (PINNs), which were introduced in 2019.[2] The concept is closely related to other terms seen in literature such as scientific machine learning (SciML),[3] physics-informed machine learning (PIML),[4] physics-enhanced artificial intelligence (PEAI)[5] and physics-guided machine learning,[6] and while usage often overlaps, it has been positioned as an umbrella term for methods that improve predictive capability through the integration of physics-based components within machine learning architectures.[7]
PEML has emerged in response to a range of challenges commonly encountered in scientific and data-driven modelling. These include: limited volumes of high-quality data, predictions that may be statistically accurate but physically implausible, difficulty in quantifying uncertainty, and limited interpretability of machine learning models.[7] These limitations have motivated the development of methods that integrate physical knowledge into machine learning, and, by doing so, PEML aims to improve generalisation to unseen conditions, ensure that predictions remain consistent with established physical laws, and enhance the transparency and interpretability of learned models.[1] This concept has gained traction since the early 2020s, when researchers began systematically exploring strategies for embedding domain knowledge into learning algorithms, allowing them to leverage domain-specific structure during training.
Early interest in PEML was driven by challenges observed in fields such as structural mechanics[1][8] and environmental science,[9] where purely data-driven methods struggled with limited data or lacked reliability. For instance, in structural engineering, physics-based simulations can be very accurate but often require costly modelling and still face uncertainty in loads or material properties. On the other hand, data-driven models may fit experimental data but fail to generalise outside the training domain. PEML approaches were developed to bridge this gap, effectively creating a "spectrum" between the extremes of purely physics-based (white-box modelling) and purely data-driven (black-box modelling), known as grey-box or hybrid modelling.[10] In practice, this means a PEML model can leverage governing equations or simulation data to inform the learning process, thus requiring less training data and yielding outputs that obey physical laws.
PEML encompasses a range of methods that integrate domain-specific physical knowledge into the machine learning process.[7] These techniques differ in how and where physics is incorporated, whether through loss functions, model structures, feature design, or data generation, and can be summarised into three different categories: physics-informed, physics-guided, and physics-encoded machine learning.
Physics-informed learning techniques integrate physical laws directly into the machine learning process to simulate complex systems, often in the form of partial differential equations (PDEs) and embeds physical constraints into the model itself. At an abstract level, a physics-informed model can be described in the form of a model , a data-driven model with parameters , such that:[7]This is typically done by embedding physical constraints into the training process, such as by adding PDE residuals into the loss function.[4] An example of physics-informed learning is through the use of physics-informed neural networks (PINNs), which implement composite loss functions that balance errors with PDE residuals, effectively blending sparse observations with physical constraints.[11] PINNs are particularly suited for problems involving irregular geometries or sparse measurements due to their ability to operate in a meshless paradigm by sampling random collocation points, and are often used when a larger volume of data is available. Physics-informed learning has been successfully applied in multi-physics systems such as electroconvection,[12] molecular dynamics,[13] and real-time 4D flow reconstruction from MRI observations.[11]
In physics-guided learning, physical knowledge is introduced into the learning process not by altering the model itself, but through data preprocessing and feature engineering. This strategy enables conventional learning algorithms to work with inputs that already encode important physical structure, enhancing both accuracy and interpretability. Common techniques include:
At an abstract level, physics-guided learning can be represented with a physics-based model and latent physics-based model parameters as:[7]These preprocessing methods are especially useful when physical insight is available, but the system is too complex for fully mechanistic modelling. By encoding physics into the data, standard machine learning architectures such as multi-layer perceptrons (MLPs) can be trained without needing architecture-specific changes. As a result, the learned function is implicitly constrained by the structured inputs, reducing the need to learn fundamental physical relationships from scratch.[18]
Physics-encoded learning, sometimes referred to as hybrid modelling, combines physics-based components with data-driven components in a singular unified framework. This approach is useful when the underlying physical laws are partially understood but insufficient to describe the full system behaviour, and are computationally expensive to simulate. In such methods, the final model integrates the physics-based model and the data-driven correction term , along with additional biases to narrow the solution space to only contain physically plausible outputs such that the system is in the form:[7]This hybrid setup allows the machine learning component to compensate for missing or poorly understood physics, while ensuring the model respects key physical constraints embedded in . Common examples of physics-encoded learning include gaussian process (GP) latent force models[19][20] and Physics-informed sparse identification of nonlinear dynamics (PhI-SINDy),[21] which have been used to model multiple degree-of-freedom (MDOF) oscillators with multiple Coulomb friction contacts under harmonic load using both synthetic and experimental noisy experiments with multiple sources of discontinuous nonlinearities.[22]
PEML methods have moved beyond theoretical development and are now actively deployed in real-world systems across engineering[1][20][23][24], biology[3][9], chemistry[13][16],[25] physics[12], scientific discovery[21], and computer science,[26] to name a few applications. These applications are especially valuable in high-stakes or data-scarce environments where traditional machine learning or purely physics-based models may fall short.
PEML has been applied to predict fatigue loads in wind turbine blades under wake steering control (WSC), a strategy that improves wind farms efficiency by intentionally misaligning turbine yaw angles to reduce wake interference.[23] While WSC can enhance power output, it also introduces additional fatigue loads on downstream turbines, complicating structural health monitoring. Traditional methods, such as look-up-tables (LUTs), approximate turbine loads based on precomputed simulations, but may struggle to capture complex wake-induced loading effects under high turbulence or non-standard conditions.[27] A recent approach addressed this by using gaussian process (GP) models trained on physics-informed features, including damage-equivalent loads (DELs) derived from Rainflow Counting and the Palmgren-Miner rule. These GPs provided probabilistic fatigue predictions with improved accuracy. Compared to LUTs, the PEML model reduced the root mean square error (RMSE) by 13.99% for edgewise moments and by 51.87% for flapwise moments, highlighting the value of incorporating fatigue physics into machine learning-based predictive maintenance.
Tuned mass dampers (TMDs) are widely used to mitigate structural vibrations in tall buildings during seismic events.[28] Traditional physics-based design methods, such as the Den Hartog approach, assume linear structural behaviour and do not fully capture the effects of nonlinear dynamics or variable seismic loads.[29] Conversely, purely data-driven optimisation techniques may lack physical constraints, resulting in unrealistic or inefficient damping configurations. To address this, researchers developed a PEML framework based on a generative adversarial network (GAN) architecture.[24] The system incorporates a physical evaluation network into the GAN loop to guide the generation of TMD parameters (the natural frequency and damping ratio) under realistic seismic excitations. This approach was tested on both linear shear-type structures and nonlinear moment-resisting frames. Compared to traditional particle swarm optimisation (PSO), the physics-enhanced GAN achieved a 24.14% reduction in displacement under seismic loading while reducing computational cost by 80%, demonstrating the effectiveness of hybrid machine learning approaches in structural vibration control.
In spacecraft missions involving dynamic payload changes, such as active debris removal, traditional attitude control systems (ACS) that rely on fixed mass and inertia properties may struggle to maintain stability. A study published in Frontiers in Robotics and AI proposed a PEML approach using deep reinforcement learning (DRL) to address this challenge.[30] The method integrated physics-based simulation with DRL algorithms such as proximal policy optimisation (PPO) and soft actor-critic (SAC),[31] and was trained using the Basilisk high-fidelity spacecraft simulator, which models Newtonian rotational dynamics and reaction wheel behaviour.[32] The approach incorporated "stacked observations," feeding sequences of sensor readings (e.g. angular velocities and torques) into the learning model to enable inference of unknown mass properties over time. Compared to conventional proportional-integral-derivative (PID) controllers, DRL controllers with stacked observations achieved improved control performance, particularly in scenarios involving unknown or varying mass distributions. In simulations, the SAC controller with stacking reduced attitude error by up to 78° and settled the spacecraft 26 seconds faster than the PID controller. These results highlight the potential of PEML methods for improving control robustness under uncertain spacecraft dynamics.
Accurate upper-air wind field prediction is essential for optimising aircraft trajectories to reduce fuel consumption and flight time. Traditional numerical weather prediction (NWP) methods, while physically rigorous, are computationally expensive and limited in short-term forecasting, requiring hours of supercomputing time for multi-day forecasts.[33] A recent study proposed a method that integrates a predictive recurrent neural network (PredRNN) with an improved A* pathfinding algorithm to generate efficient flight routes in dynamic wind conditions.[34] PredRNN was trained on ERA5 wind data[35] at cruising altitudes of 5,500m along major Chinese airline routes using a loss function informed by Navier-Stokes equations. The resulting wind field forecasts enables the A* algorithm to avoid zones of high turbulence and optimise routes in real-time for up to 10 hours in advance. Compared to standard neural network and physics-based approaches, this framework improved forecasting accuracy and produced safer, more fuel-efficient trajectories.
Accurate river discharge forecasting is critical for flood mitigation, as well as waterway management and infrastructure planning. Physics-based hydrological models such as RAPID (Routing Application for Parallel Computation of Discharge) simulate river flow based on the Muskingum algorithm but often make assumptions such as linear process modelling and reliance on adjacent inflows, simplifying the problem.[36][37] These limitations can lead to deviations from observed discharge values, especially in complex or ungauged river networks. To address this, a PEML approach was proposed that integrates RAPID with data-driven models using delta learning and data augmentation techniques.[38] These hybrid models combine physical runoff simulations with machine learning algorithms, including gaussian process nonlinear autoregressive with exogenous inputs (GP-NARX),[39] neural networks, and bidirectional LSTMs (long short-term memory). The goal is to compensate for uncertainties in the RAPID model by learning discrepancies between predicted and gauged discharge values and using additional basin-wide runoff data to inform forecasts. The study demonstrated that the hybrid PEML models significantly outperformed RAPID alone, improving discharge prediction accuracy by a factor of four to seven across various river systems in the United States. By leveraging both physical principles and basin-wide hydrological data, the approach enables robust, long-range forecasting in data-limited conditions and enhances the reliability of streamflow predictions for gauged rivers.
PEML has been applied to predict oil recovery during immiscible CO2 flooding in sandstone reservoirs, a widely used enhanced oil recovery (EOR) method.[40] Traditional core-flooding experiments and physics-based models, while informative, often rely on simplifying assumptions regarding flow dynamics which can limit their predictive accuracy.[41] To improve generalisation, researchers developed a PEML framework combining experimental data with physically informed features expressed through dimensionless numbers, which include the capillary number, relative radius (based on porosity and permeability), injection pressure ratio, and oil composition number.[40] The model was trained on core-flooding datasets spanning a wide range of reservoir conditions: porosity (10.8-37.2%), permeability (1-18,000 mD), injection pressures (2.73-11.44 MPa), flow rates, and various crude oil types. Rather than relying on individual parameters, the PEML model used a grouped dimensionless formulation to represent the combined physical forces governing displacement efficiency. A logarithmic correlation was found between these grouped parameters and the oil recovery factor, achieving strong agreement with experimental results (81% confidence). This approach demonstrated improved accuracy over traditional methods and highlighted the benefits of embedding domain knowledge into machine learning for more robust EOR performance prediction.
In image processing and computer vision, PEML has been used to improve illumination harmonisation and editing tasks. Traditional graphics models are often computationally expensive and may struggle to generalise to diverse real-world lighting conditions.[42] Conversely, standard diffusion-based models are powerful for generative tasks, but can alter intrinsic image properties such as albedo or reflectance, leading to unrealistic visual artifacts. To address these limitations, researchers proposed a PEML-based training strategy known as Imposing Consistent Light (IC-Light) transport.[26] This method incorporates physical light transport theory into the training of diffusion-based illumination models by enforcing a consistency principle: the linear blending of different lighting conditions should reflect physically plausible results. By embedding this constraint during training, the model learns to modify illumination without distorting other visual features of the image. IC-Light was applied to a large-scale training regime involving over 10 million samples, including real photographs, rendered data, and in-the-wild synthetic augmentations. The model was benchmarked against several baselines (e.g. SwitchLight[43] and DiLightNet[44]), achieving state-of-the-art results in perceptual quality (LPIPS[45] = 0.1025), while maintaining balanced performance in PSNR (23.72) and SSIM (0.8513). This PEML approach enables more stable and scalable illumination editing while being physically consistent, supporting applications in content creation and digital design.
Despite its advantages, PEML faces several challenges that limit its scalability and general adoption. One major issue is the lack of standardised benchmarks for evaluating PEML models.[7][46] Direct comparisons are often difficult, as the models integrate domain knowledge in different ways. Studies have noted that models with similar statistical accuracy may generalise differently when applied to new conditions. Recent literature has called for evaluation metrics that account for physical consistency and domain-specific performance, beyond traditional error metrics such as RMSE or MAE.[47][48]
Another key challenge is balancing the role of physics and data in PEML models. If the data-driven component is too flexible, it may overfit to training data and generalise results that violate known physical principles, and conversely, if the physical constrains are applied too rigidly, the model may underfit, and fail to capture important patterns in the data, creating a Pareto frontier.[49] For physics-encoded and hybrid modelling, there is also a risk that the machine learning component may override the physics-based component unless appropriately regulated. To manage this trade-off, researchers have investigated strategies such as incorporating physics-based regularisation terms,[50] and applying adaptive weighting schemes during training.[51]
Error sources in both the data and the physical models present significant challenges as well. These include issues such as incorrect modelling assumptions (for example, wrong constitutive laws), noisy or non-informative data, and model architecture choices that allow the data-driven and physics-based components to become imbalanced for that purpose.[7] To mitigate these risks, recent studies have proposed methods for the automatic detection and correction of such errors, along with uncertainty quantification techniques to flag unreliable or extrapolated predictions.[11]
Scalability is another limitation. Many PEML techniques have been demonstrated on idealised or low-dimensional problems.[1][46] Applying them to large-scale systems, such as multi-physics simulations or real-time control scenarios remains computationally demanding.[7][11] Techniques like domain decomposition, surrogate modelling, and reduced-order physics are used to mitigate this, though they often introduce additional approximation errors.[52]
Finally, interpretability and uncertainty quantification are still under active development. While PEML models are often more transparent than purely black-box approaches, interpreting the learned components (e.g. correction terms) is not always straightforward. Similarly, quantifying uncertainty in both the data and model parameters is critical for high-stakes applications, but current methods are still evolving.[7]
{{cite web}}
: CS1 maint: multiple names: authors list (link)
- Promotional tone, editorializing and other words to watch
- Vague, generic, and speculative statements extrapolated from similar subjects
- Essay-like writing
- Hallucinations (plausible-sounding, but false information) and non-existent references
- Close paraphrasing
Please address these issues. The best way to do it is usually to read reliable sources and summarize them, instead of using a large language model. See our help page on large language models.