Numerical Optimization

General Framework

The infrastructure of MrHyDE was designed to reduce the burden for users to be able to access some of the large-scale numerical optimization tools developed within the Rapid Optimization Library (ROL) at Sandia. By leveraging automatic differentiation tools and an abstract framework for solving multiscale/multiphysics problems, MrHyDE automates the construction of discrete adjoints which can then be used to provide gradient information (or Hessian vector products) which is required for higher-order optimization methods. A full derivation of the adjoint-based gradient is beyond the intended scope of this wiki, but we will mention a few key aspects that may impact a user:

For transient problems, the adjoint problems runs backwards in time and needs to be linearized around the state approximation. Thus, the state solutions needs to be available as the adjoint solver progresses. If an adjoint solve is required, MrHyDE automatically stores the full forward solution at each time node. Checkpointing is a common approach to reduce the memory requirements in storing the forward solution, but this is not implemented in MrHyDE.
The adjoint solve requires the derivative of the forward residual with respect to the state (the Jacobian) and the derivative of the objective function with respect to the state. The gradient calculation also requires the derivative of the forward residual with respect to the active and/or discretized parameters.
For these discrete adjoints, the Jacobian for the adjoint system is the transpose of the forward Jacobian. MrHyDE assembles the adjoint Jacobian directly by transposing the row and column indices in the scatter step. No transposes are required at the linear algebra level.
MrHyDE does not support Hessians or Hessian-vector products at this time (future work).
The adjoint gradients are supported for multi-step time integrators, but not for multi-stage integrators. This is work in progress.
For sensor (pointwise) objectives, one must be careful to measure quantities that behave nicely as the mesh is refined. Not everything that can be included in an objective function lives in $L^{\infty}$.
For all objectives except for sensors, MrHyDE will measure the objective at each time step and weight this by $\Delta t$ (see below). This is to maintain consistency as the number of time steps are increased. Sensors are only measured at certain times, so they do not require this re-weighting.

MrHyDE support several different types of objective functions for optimization and inversion. The general form of the objective function is given by:

$$F(u,z) = \sum_{i=1}^{N_{\text{obj}}} \left(\omega_i f_i(u,z) + \sum_{j=1}^{N_{i,r}} \eta_{i,j} R_{i,j}(z) \right),$$

where $u$ denotes the state variable, $z$ denotes the parameters or controls, $f_i(u,z)$ is a misfit function, $w_i$ is the weight on this misfit function, $R_{i,r}(z)$ is a regularization term, and $\eta_{i,r}$ is the weight on this regularization term. Multiple misfit functions are allowed, but only the weighted average will be optimized. Multi-objective optimization is not supported at this time.

Defining/Using Parameters and Controls

MrHyDE has a customized parameter manager whose sole responsibility is to deal with the various types of parameters available in MrHyDE. These types are:

Active These are scalar or vector parameters that are eligible for sensitivities, i.e., adjoint-based gradients. Active parameters can be static, i.e., constant over time, or dynamic which may take different values on each time step. The default is always static.
Inactive These are scalar or vector parameters that are not eligible for sensitivities. These are never updated from their initial values.
Stochastic These are scalar or vector parameters that are not eligible for sensitivities, but are randomly distributed according to a user-defined distribution. They are assigned new values for each realization of a UQ study.
Discretized These are parameters/controls that represent discretized fields. Other codes often call these distributed parameters. In MrHyDE, these parameters can use any of the available discretizations. The parameter manager sets up a Panzer DOF manager for these discretized parameters, the linear algebra interface is able to set up up linear algebra objects, i.e., vectors and matrices, associated with them, and sensitivities are available through the postprocess manager. Discretized parameters can also be either static or dynamic. The user can define any of these parameters from within the Parameters sublist in the input file. To define an inactive parameter, we set: Note that MrHyDe does not (yet) have a discretized stochastic parameters, i.e., it cannot handle random fields directly. It can handle random fields in a Karhunen-Loeve expansion, which only requires a basis and a vector of uncertain parameters.

The user can define any of these parameters from within the Parameters sublist in the input file. These can be used within physics modules directly or within strings defined elsewhere in the input file. For vector parameters, you will need to provide which scalar component to use, e.g., rho(2).

To define an inactive parameter, we set:

Parameters:
    magma:
      type: scalar
      value: 1.0
      usage: inactive

Changing this to an active parameter only requires modifying the usage flag to active. To define two stochastic parameters, one uniform and one Gaussian, we set:

Parameters:
    a:
      type: scalar
      value: 1.0 # not actually used
      usage: stochastic
      distribution: uniform
      min: 1.0
      max: 2.0
    b:
      type: scalar
      value: 0.0 # not actually used
      usage: stochastic
      distribution: Gaussian
      mean: 0.0
      variance: 1.0

To define a discretized parameter, we set:

Parameters:
    magma:
      type: HGRAD
      order: 1
      usage: discretized
      initial value: 1.0

To make this parameter dynamic, we add dynamic: true.

Examples of Objective Functions

Integrated Response

An integrated response is meant for cases where the simulation produces a scalar QoI (as an integrated quantity) and the user wants to find the model input parameters that force the QoI to match a given target value. The misfit for an integrated response takes the following form:

$$ f_i(u,z) = \left( \overline{r}-\overline{q}\right)^2, \quad \overline{r} = \int_{[t_o,t_f]} \int_{\Omega} r(u,z,x,t) \ dx \ dt $$

where $r(u,z,x)$ is the user-defined response function, $\overline{q}$ is the user-defined target value. In MrHyDE, we approximate this objective function by discretizing over time:

$$ \overline{r}\approx \sum_{n=1}^{N_t} \int_{\Omega} r(u,z,x,t_{n+1}) \ dx $$

where $N_t$ is the number of time steps, $\Delta t_n$ is the corresponding step size and $t_{n+1}$ is the right-end point of the subinterval. The integral over space is approximated using the quadrature associated with the chosen discretization. If the problem is steady-state, then MrHyDE will set $N_t=1$ and $\Delta t_n = 1.0$.

An example of an integrated response objective is

Postprocess: 
    compute objective: true
    compute sensitivities: true
    Objective functions:
      obj0:
        type: integrated response
        response: 'e'
        target: 1.0
        weight: 0.5

Integrated Control

An integrated control attempts to force the state, or some function of the state, to match a given target function over all space and time. This involves matching (or minimizing the misfit) significantly more data than the integrated response. The misfit function for an integrated control is given by:

$$ f_i(u,z) = \int_{[t_o,t_f]} \int_{\Omega} g(u,z,x,t) \ dx \ dt, $$

where $g(u,z,x,t)$ is the user-defined function. The most common use case will define this function as

$$ g(u,z,x,t) = \left( r(u,z,x,t) - q(x,t)\right)^2,$$

where $r(u,z,x,t)$ is a user-defined response function and $q(x,t)$ is a user-defined target function. It is not required for the function to have this form. The integrals for the integrated control are approximated in the same way as the integrals for the integrated response. However, these two objective functions are quite different. An integrated response is modifying the parameters to match a scalar quantity of interest, while the integrated control is typically trying to force the response function to match the target function over all space and time.

An example input file snippet is given below:

Postprocess: 
    compute objective: true
    compute sensitivities: false
    Objective functions:
      obj0:
        type: integrated control
        function: '1.0*(e-targ)^2'
        weight: 0.0625

Discrete Control

A discrete control is similar to an integrated control in the sense that it seeks parameter values that cause the state to match a given target over all space/time. A key difference is in the data that is being matched. The objective function for the discrete control is given by:

$$ f_i(u,z) = \int_{[t_0,t_f]} \lVert \mathbf{U}_h(t) - \mathbf{T}_h(t) \rVert_2^2 \ dt, $$

where $\mathbf{U}$ is the Tpetra vector of degrees of freedom for the state, $\mathbf{T}$ is the Tpetra vector of degrees of freedom associated with the target solution, and $| \cdot |_2$ is the discrete ${\cal l}_2$ norm. The discretization in time is the same as the integrated control and integrated response. This particular objective function is typically used for demonstration purposes since one requires the Tpetra vector associated with the target. This is usually obtained by first solving a forward problem using a data-generating set of parameters, and then trying to recover or approximate these parameters.

An example input file snippet is given below:

Postprocess: 
    compute objective: true
    compute sensitivities: true
    Objective functions:
      obj1:
        type: discrete control
        weight: 0.5

Sensors

Sensors are a bit different from the previously described objective functions. These are pointwise (in space and time) measurements of functions of the state (and other variables). Settings this up requires defining both the sensor locations and the data that needs to be matched. One should be careful to only measure quantities that are well-behaved in a pointwise sense, but MrHyDE will allow almost any function to be evaluated pointwise. This capability is the main motivation for the point forest in the function manager and some functionality in the workset. In some ways, sensors are treated as another quadrature rule, but the sensors are rarely distributed evenly amongst the elements - some have several sensors and some have none. The objective function for sensors is given by:

$$ f_i(u,z) = \sum_{i=1}^{M_t} \sum_{j=1}^{N_{sens}} \left( r_{i,j} - q_{i,j} \right)^2, \quad r_{i,j} = r(u,z,x_j,t_i), $$

where $q_{i,j}$ is the measured quantity at sensor $j$ at time $t_i$. It is currently assumed that all sensors collect data at each measurement time, $t_i$. MrHyDE does not adjust the time steps so that they align with the measurement times - this is up to the user. The response function can be an arbitrary function of the state, parameters, etc. An example input file is given below:

Postprocess: 
    compute objective: true
    Objective functions:
      obj0:
        type: sensors
        sensor points file: sensor_points.dat
        sensor data file: sensor_data.dat
        save sensor data: false
        response: 'e'
        weight: 1.0

We see that it is necessary to provide separate files for the sensor locations (physical locations in space) and the data. In the data file, the first row is the measurement times, and subsequent rows give data at these times. Regarding the sensor locations, it is best to avoid placing sensors on element boundaries or at the nodes of the mesh. Many quantities are continuous across elements, but some are not, so there can be variance in the objective function depending on which element is identified as the owner of a sensor.

Examples of Regularization Functions

The regularization functions can be any function of the parameter, but should not depend on the state. $L^2$, $H^1$, and total variation regularization can be implemented in this way from the input file. Note that the regularization functions are tied to an objective function and an objective function can be associated with multiple regularization functions. For example, the following code adds two regularization terms: one a volumetric $L^2$ term and one a boundary $H^1$ (gradient) term.

Postprocess: 
    compute objective: true
    Objective functions:
      obj_dx:
        type: sensors
        sensor points file: sensor_points.dat
        sensor data file: sensor_dx_data.dat
        response: 'dx'
        weight: 0.5
        Regularization functions:
          reg0:
            type: integrated
            location: volume
            function: mufield^2
            weight: 1.0e-5
          breg0:
            type: integrated
            location: boundary
            boundary name: top
            function: '1.0*(grad(disc_trac)[x])^2'
            weight: 0.5e-5

Home
Getting Started
Performance Tuning
FAQs

Features

   Physics Modules
   User-Defined Functions
   Postprocessing Capabilities
   Incorporating Data
   Heterogeneous Parallelism
   Multiscale and Multiresolution
   Numerical Optimization
   Uncertainty Quantification
   Data-Consistent Inversion
   In-Situ Data Compression

Examples (Coming Soon)

   Maxwell’s Equations
   Shallow Water Equations
   Porous Media
   Heat Equation
   Convection-Diffusion-Reaction
   Ordinary Differential Equation
   Stokes
   Navier Stokes
   Burgers Equation
   Linear Elasticity
   Crystal Elasticity
   Phasefield
   Multiphysics
   Multiscale Hybridized DG

Numerical Optimization

General Framework

Defining/Using Parameters and Controls

Examples of Objective Functions

Integrated Response

Integrated Control

Discrete Control

Sensors

Examples of Regularization Functions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally