Skip to content

Commit b9a080c

Browse files
Merge pull request #64 from awslabs/sjg/libceed-pa-dev
Partial assembly support using libCEED
2 parents cdc8e20 + 8f6c834 commit b9a080c

37 files changed

+682
-630
lines changed

CHANGELOG.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,14 @@ The format of this changelog is based on
2424
- Changed implementation of numeric wave ports to use MFEM's `SubMesh` functionality. As
2525
of [#3379](https://github.com/mfem/mfem/pull/3379) in MFEM, this has full ND and RT
2626
basis support. For now, support for nonconforming mesh boundaries is limited.
27-
- Added Apptainer/Singularity container build definition for Palace.
27+
- Added support for operator partial assembly for high-order finite element spaces based
28+
on libCEED for non-tensor product element meshes. This option is disabled by default,
29+
but can be activated using `config["Solver"]["PartialAssemblyOrder"]` set to some number
30+
less than `"Order"` and `config["Solver"]["Device"]: "ceed-cpu"`.
2831
- Added build dependencies on [libCEED](https://github.com/CEED/libCEED) and
2932
[LIBXSMM](https://github.com/libxsmm/libxsmm) to support operator partial assembly (CPU-
3033
based for now).
34+
- Added Apptainer/Singularity container build definition for Palace.
3135

3236
## [0.11.2] - 2023-07-14
3337

docs/src/config/solver.md

+38-24
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
"Solver":
1010
{
1111
"Order": <int>,
12+
"PartialAssemblyOrder": <int>,
13+
"Device": <string>,
1214
"Eigenmode":
1315
{
1416
...
@@ -40,6 +42,16 @@ with
4042

4143
`"Order" [1]` : Finite element order (degree). Arbitrary high-order spaces are supported.
4244

45+
`"PartialAssemblyOrder" [100]` : Order at which to switch from full assembly of finite
46+
element operators to [partial assembly](https://mfem.org/howto/assembly_levels/). Setting
47+
this parameter equal to 1 will fully activate operator partial assembly on all levels.
48+
49+
`"Device" ["cpu"]` : The device configuration passed to [MFEM]
50+
(https://mfem.org/howto/assembly_levels/) in order to activate different backends at
51+
runtime. CPU-based partial assembly is supported by the `"cpu"` backend for tensor-product
52+
meshes using the native MFEM kernels and `"ceed-cpu"` backend for all mesh types using
53+
libCEED.
54+
4355
`"Eigenmode"` : Top-level object for configuring the eigenvalue solver for the eigenmode
4456
simulation type. Thus, this object is only relevant for
4557
[`config["Problem"]["Type"]: "Eigenmode"`](problem.md#config%5B%22Problem%22%5D).
@@ -299,13 +311,13 @@ directory specified by [`config["Problem"]["Output"]`]
299311
"Tol": <float>,
300312
"MaxIts": <int>,
301313
"MaxSize": <int>,
302-
"UsePCMatShifted": <bool>,
303-
"PCSide": <string>,
304-
"UseMultigrid": <bool>,
305-
"MGAuxiliarySmoother": <bool>,
314+
"MGMaxLevels": <int>,
315+
"MGCoarsenType": <string>,
306316
"MGCycleIts": <int>,
307317
"MGSmoothIts": <int>,
308318
"MGSmoothOrder": <int>,
319+
"PCMatShifted": <bool>,
320+
"PCSide": <string>,
309321
"DivFreeTol": <float>,
310322
"DivFreeMaxIts": <float>,
311323
"GSOrthogonalization": <string>
@@ -365,26 +377,15 @@ equations arising for each simulation type. The available options are:
365377
`"MaxSize" [0]` : Maximum Krylov space size for the GMRES and FGMRES solvers. A value less
366378
than 1 defaults to the value specified by `"MaxIts"`.
367379

368-
`"UsePCMatShifted" [false]` : When set to `true`, constructs the preconditioner for frequency
369-
domain problems using a real SPD approximation of the system matrix, which can help
370-
performance at high frequencies (relative to the lowest nonzero eigenfrequencies of the
371-
model).
372-
373-
`"PCSide" ["Default"]` : Side for preconditioning. Not all options are available for all
374-
iterative solver choices, and the default choice depends on the iterative solver used.
375-
376-
- `"Left"`
377-
- `"Right"`
378-
- `"Default"`
379-
380-
`"UseMultigrid" [true]` : Chose whether to enable [geometric multigrid preconditioning]
380+
`"MGMaxLevels" [100]` : Chose whether to enable [geometric multigrid preconditioning]
381381
(https://en.wikipedia.org/wiki/Multigrid_method) which uses p- and h-multigrid coarsening as
382382
available to construct the multigrid hierarchy. The solver specified by `"Type"` is used on
383383
the coarsest level. Relaxation on the fine levels is performed with Chebyshev smoothing.
384384

385-
`"MGAuxiliarySmoother"` : Activate hybrid smoothing from Hiptmair for multigrid levels when
386-
`"UseMultigrid"` is `true`. For non-singular problems involving curl-curl operators, this
387-
option is `true` by default.
385+
`"MGCoarsenType" ["Logarithmic"]` : Coarsening to create p-multigrid levels.
386+
387+
- `"Logarithmic"`
388+
- `"Linear"`
388389

389390
`"MGCycleIts" [1]` : Number of V-cycle iterations per preconditioner application for
390391
multigrid preconditioners (when `"UseMultigrid"` is `true` or `"Type"` is `"AMS"` or
@@ -396,6 +397,18 @@ preconditioners (when `"UseMultigrid"` is `true` or `"Type"` is `"AMS"` or `"Boo
396397
`"MGSmoothOrder" [3]` : Order of polynomial smoothing for geometric multigrid
397398
preconditioning (when `"UseMultigrid"` is `true`).
398399

400+
`"PCMatShifted" [false]` : When set to `true`, constructs the preconditioner for frequency
401+
domain problems using a real SPD approximation of the system matrix, which can help
402+
performance at high frequencies (relative to the lowest nonzero eigenfrequencies of the
403+
model).
404+
405+
`"PCSide" ["Default"]` : Side for preconditioning. Not all options are available for all
406+
iterative solver choices, and the default choice depends on the iterative solver used.
407+
408+
- `"Left"`
409+
- `"Right"`
410+
- `"Default"`
411+
399412
`"DivFreeTol" [1.0e-12]` : Relative tolerance for divergence-free cleaning used in the
400413
eigenmode simulation type.
401414

@@ -411,10 +424,11 @@ vectors in Krylov subspace methods or other parts of the code.
411424

412425
### Advanced linear solver options
413426

414-
- `"UseInitialGuess" [true]`
415-
- `"UsePartialAssembly" [false]`
416-
- `"UseLowOrderRefined" [false]`
417-
- `"Reordering" ["Default"]` : `"METIS"`, `"ParMETIS"`,`"Scotch"`, `"PTScotch"`,
427+
- `"InitialGuess" [true]`
428+
- `"MGLegacyTransfer" [false]`
429+
- `"MGAuxiliarySmoother" [true]`
430+
- `"PCLowOrderRefined" [false]`
431+
- `"ColumnOrdering" ["Default"]` : `"METIS"`, `"ParMETIS"`,`"Scotch"`, `"PTScotch"`,
418432
`"Default"`
419433
- `"STRUMPACKCompressionType" ["None"]` : `"None"`, `"BLR"`, `"HSS"`, `"HODLR"`, `"ZFP"`,
420434
`"BLR-HODLR"`, `"ZFP-BLR-HODLR"`

palace/drivers/eigensolver.cpp

+2-1
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,8 @@ void EigenSolver::Solve(std::vector<std::unique_ptr<mfem::ParMesh>> &mesh,
163163
divfree = std::make_unique<DivFreeSolver>(
164164
spaceop.GetMaterialOp(), spaceop.GetNDSpace(), spaceop.GetH1Spaces(),
165165
spaceop.GetAuxBdrTDofLists(), iodata.solver.linear.divfree_tol,
166-
iodata.solver.linear.divfree_max_it, divfree_verbose);
166+
iodata.solver.linear.divfree_max_it, divfree_verbose,
167+
iodata.solver.pa_order_threshold);
167168
eigen->SetDivFreeProjector(*divfree);
168169
}
169170

palace/fem/multigrid.hpp

+106-51
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
#include <mfem.hpp>
1010
#include "linalg/operator.hpp"
1111
#include "linalg/rap.hpp"
12+
#include "utils/iodata.hpp"
1213

1314
namespace palace::utils
1415
{
@@ -17,61 +18,118 @@ namespace palace::utils
1718
// Methods for constructing hierarchies of finite element spaces for geometric multigrid.
1819
//
1920

21+
// Helper function for getting the order of the finite element space underlying a bilinear
22+
// form.
23+
inline auto GetMaxElementOrder(mfem::BilinearForm &a)
24+
{
25+
return a.FESpace()->GetMaxElementOrder();
26+
}
27+
28+
// Helper function for getting the order of the finite element space underlying a mixed
29+
// bilinear form.
30+
inline auto GetMaxElementOrder(mfem::MixedBilinearForm &a)
31+
{
32+
return std::max(a.TestFESpace()->GetMaxElementOrder(),
33+
a.TrialFESpace()->GetMaxElementOrder());
34+
}
35+
36+
// Assemble a bilinear or mixed bilinear form. If the order is lower than the specified
37+
// threshold, the operator is assembled as a sparse matrix.
38+
template <typename BilinearForm>
39+
inline std::unique_ptr<Operator>
40+
AssembleOperator(std::unique_ptr<BilinearForm> &&a, bool mfem_pa_support,
41+
int pa_order_threshold, int skip_zeros = 1)
42+
{
43+
mfem::AssemblyLevel assembly_level =
44+
(mfem::DeviceCanUseCeed() ||
45+
(mfem_pa_support && GetMaxElementOrder(*a) >= pa_order_threshold))
46+
? mfem::AssemblyLevel::PARTIAL
47+
: mfem::AssemblyLevel::LEGACY;
48+
a->SetAssemblyLevel(assembly_level);
49+
a->Assemble(skip_zeros);
50+
a->Finalize(skip_zeros);
51+
if (assembly_level == mfem::AssemblyLevel::LEGACY ||
52+
(assembly_level == mfem::AssemblyLevel::PARTIAL &&
53+
GetMaxElementOrder(*a) < pa_order_threshold &&
54+
std::is_base_of<mfem::BilinearForm, BilinearForm>::value))
55+
{
56+
// libCEED full assembly does not support mixed forms.
57+
#ifdef MFEM_USE_CEED
58+
mfem::SparseMatrix *spm =
59+
a->HasExt() ? mfem::ceed::CeedOperatorFullAssemble(*a) : a->LoseMat();
60+
#else
61+
mfem::SparseMatrix *spm = a->LoseMat();
62+
#endif
63+
MFEM_VERIFY(spm, "Missing assembled sparse matrix!");
64+
return std::unique_ptr<Operator>(spm);
65+
}
66+
else
67+
{
68+
return std::move(a);
69+
}
70+
}
71+
2072
// Construct sequence of FECollection objects.
2173
template <typename FECollection>
22-
std::vector<std::unique_ptr<FECollection>> ConstructFECollections(bool pc_pmg, bool pc_lor,
23-
int p, int dim)
74+
std::vector<std::unique_ptr<FECollection>> inline ConstructFECollections(
75+
int p, int dim, int mg_max_levels,
76+
config::LinearSolverData::MultigridCoarsenType mg_coarsen_type, bool mat_lor)
2477
{
2578
// If the solver will use a LOR preconditioner, we need to construct with a specific basis
2679
// type.
27-
constexpr int pmin = (std::is_same<FECollection, mfem::H1_FECollection>::value ||
28-
std::is_same<FECollection, mfem::ND_FECollection>::value)
80+
constexpr int pmin = (std::is_base_of<mfem::H1_FECollection, FECollection>::value ||
81+
std::is_base_of<mfem::ND_FECollection, FECollection>::value)
2982
? 1
3083
: 0;
3184
MFEM_VERIFY(p >= pmin, "FE space order must not be less than " << pmin << "!");
3285
int b1 = mfem::BasisType::GaussLobatto, b2 = mfem::BasisType::GaussLegendre;
33-
if (pc_lor)
86+
if (mat_lor)
3487
{
3588
b2 = mfem::BasisType::IntegratedGLL;
3689
}
90+
91+
// Construct the p-multigrid hierarchy, first finest to coarsest and then reverse the
92+
// order.
3793
std::vector<std::unique_ptr<FECollection>> fecs;
38-
if (pc_pmg)
39-
{
40-
fecs.reserve(p);
41-
for (int o = pmin; o <= p; o++)
42-
{
43-
if constexpr (std::is_same<FECollection, mfem::ND_FECollection>::value ||
44-
std::is_same<FECollection, mfem::RT_FECollection>::value)
45-
{
46-
fecs.push_back(std::make_unique<FECollection>(o, dim, b1, b2));
47-
}
48-
else
49-
{
50-
fecs.push_back(std::make_unique<FECollection>(o, dim, b1));
51-
}
52-
}
53-
}
54-
else
94+
for (int l = 0; l < std::max(1, mg_max_levels); l++)
5595
{
56-
fecs.reserve(1);
57-
if constexpr (std::is_same<FECollection, mfem::ND_FECollection>::value ||
58-
std::is_same<FECollection, mfem::RT_FECollection>::value)
96+
if constexpr (std::is_base_of<mfem::ND_FECollection, FECollection>::value ||
97+
std::is_base_of<mfem::RT_FECollection, FECollection>::value)
5998
{
6099
fecs.push_back(std::make_unique<FECollection>(p, dim, b1, b2));
61100
}
62101
else
63102
{
64103
fecs.push_back(std::make_unique<FECollection>(p, dim, b1));
104+
MFEM_CONTRACT_VAR(b2);
105+
}
106+
if (p == pmin)
107+
{
108+
break;
109+
}
110+
switch (mg_coarsen_type)
111+
{
112+
case config::LinearSolverData::MultigridCoarsenType::LINEAR:
113+
p--;
114+
break;
115+
case config::LinearSolverData::MultigridCoarsenType::LOGARITHMIC:
116+
p = (p + pmin) / 2;
117+
break;
118+
case config::LinearSolverData::MultigridCoarsenType::INVALID:
119+
MFEM_ABORT("Invalid coarsening type for p-multigrid levels!");
120+
break;
65121
}
66122
}
123+
std::reverse(fecs.begin(), fecs.end());
67124
return fecs;
68125
}
69126

70127
// Construct a hierarchy of finite element spaces given a sequence of meshes and
71128
// finite element collections. Dirichlet boundary conditions are additionally
72129
// marked.
73130
template <typename FECollection>
74-
mfem::ParFiniteElementSpaceHierarchy ConstructFiniteElementSpaceHierarchy(
131+
inline mfem::ParFiniteElementSpaceHierarchy ConstructFiniteElementSpaceHierarchy(
132+
int mg_max_levels, bool mg_legacy_transfer, int pa_order_threshold,
75133
const std::vector<std::unique_ptr<mfem::ParMesh>> &mesh,
76134
const std::vector<std::unique_ptr<FECollection>> &fecs,
77135
const mfem::Array<int> *dbc_marker = nullptr,
@@ -80,17 +138,18 @@ mfem::ParFiniteElementSpaceHierarchy ConstructFiniteElementSpaceHierarchy(
80138
MFEM_VERIFY(!mesh.empty() && !fecs.empty() &&
81139
(!dbc_tdof_lists || dbc_tdof_lists->empty()),
82140
"Empty mesh or FE collection for FE space construction!");
83-
auto *fespace = new mfem::ParFiniteElementSpace(mesh[0].get(), fecs[0].get());
141+
int coarse_mesh_l =
142+
std::max(0, static_cast<int>(mesh.size() + fecs.size()) - 1 - mg_max_levels);
143+
auto *fespace = new mfem::ParFiniteElementSpace(mesh[coarse_mesh_l].get(), fecs[0].get());
84144
if (dbc_marker && dbc_tdof_lists)
85145
{
86146
fespace->GetEssentialTrueDofs(*dbc_marker, dbc_tdof_lists->emplace_back());
87147
}
88-
mfem::ParFiniteElementSpaceHierarchy fespaces(mesh[0].get(), fespace, false, true);
89-
90-
// XX TODO: LibCEED transfer operators!
148+
mfem::ParFiniteElementSpaceHierarchy fespaces(mesh[coarse_mesh_l].get(), fespace, false,
149+
true);
91150

92151
// h-refinement
93-
for (std::size_t l = 1; l < mesh.size(); l++)
152+
for (std::size_t l = coarse_mesh_l + 1; l < mesh.size(); l++)
94153
{
95154
fespace = new mfem::ParFiniteElementSpace(mesh[l].get(), fecs[0].get());
96155
if (dbc_marker && dbc_tdof_lists)
@@ -111,31 +170,27 @@ mfem::ParFiniteElementSpaceHierarchy ConstructFiniteElementSpaceHierarchy(
111170
{
112171
fespace->GetEssentialTrueDofs(*dbc_marker, dbc_tdof_lists->emplace_back());
113172
}
114-
auto *P = new ParOperator(
115-
std::make_unique<mfem::TransferOperator>(fespaces.GetFinestFESpace(), *fespace),
116-
fespaces.GetFinestFESpace(), *fespace, true);
173+
ParOperator *P;
174+
if (!mg_legacy_transfer && mfem::DeviceCanUseCeed())
175+
{
176+
// Partial and full assembly for this operator is only available with libCEED backend.
177+
auto p = std::make_unique<mfem::DiscreteLinearOperator>(&fespaces.GetFinestFESpace(),
178+
fespace);
179+
p->AddDomainInterpolator(new mfem::IdentityInterpolator);
180+
P = new ParOperator(AssembleOperator(std::move(p), false, pa_order_threshold),
181+
fespaces.GetFinestFESpace(), *fespace, true);
182+
}
183+
else
184+
{
185+
P = new ParOperator(
186+
std::make_unique<mfem::TransferOperator>(fespaces.GetFinestFESpace(), *fespace),
187+
fespaces.GetFinestFESpace(), *fespace, true);
188+
}
117189
fespaces.AddLevel(mesh.back().get(), fespace, P, false, true, true);
118190
}
119191
return fespaces;
120192
}
121193

122-
// Construct a single-level finite element space hierarchy from a single mesh and
123-
// finite element collection. Unnecessary to pass the dirichlet boundary
124-
// conditions as they need not be incorporated in any inter-space projectors.
125-
template <typename FECollection>
126-
mfem::ParFiniteElementSpaceHierarchy
127-
ConstructFiniteElementSpaceHierarchy(mfem::ParMesh &mesh, const FECollection &fec,
128-
const mfem::Array<int> *dbc_marker = nullptr,
129-
mfem::Array<int> *dbc_tdof_list = nullptr)
130-
{
131-
auto *fespace = new mfem::ParFiniteElementSpace(&mesh, &fec);
132-
if (dbc_marker && dbc_tdof_list)
133-
{
134-
fespace->GetEssentialTrueDofs(*dbc_marker, *dbc_tdof_list);
135-
}
136-
return mfem::ParFiniteElementSpaceHierarchy(&mesh, fespace, false, true);
137-
}
138-
139194
} // namespace palace::utils
140195

141196
#endif // PALACE_FEM_MULTIGRID_HPP

palace/linalg/CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ target_sources(${TARGET_NAME}
1111
${CMAKE_CURRENT_SOURCE_DIR}/ams.cpp
1212
${CMAKE_CURRENT_SOURCE_DIR}/arpack.cpp
1313
${CMAKE_CURRENT_SOURCE_DIR}/chebyshev.cpp
14-
${CMAKE_CURRENT_SOURCE_DIR}/curlcurl.cpp
1514
${CMAKE_CURRENT_SOURCE_DIR}/distrelaxation.cpp
1615
${CMAKE_CURRENT_SOURCE_DIR}/divfree.cpp
1716
${CMAKE_CURRENT_SOURCE_DIR}/gmg.cpp
17+
${CMAKE_CURRENT_SOURCE_DIR}/hcurl.cpp
1818
${CMAKE_CURRENT_SOURCE_DIR}/jacobi.cpp
1919
${CMAKE_CURRENT_SOURCE_DIR}/ksp.cpp
2020
${CMAKE_CURRENT_SOURCE_DIR}/iterative.cpp

palace/linalg/amg.hpp

+2-2
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ class BoomerAmgSolver : public mfem::HypreBoomerAMG
1818
{
1919
public:
2020
BoomerAmgSolver(int cycle_it = 1, int smooth_it = 1, int print = 0);
21-
BoomerAmgSolver(const IoData &iodata, int print)
22-
: BoomerAmgSolver(iodata.solver.linear.pc_mg ? 1 : iodata.solver.linear.mg_cycle_it,
21+
BoomerAmgSolver(const IoData &iodata, bool coarse_solver, int print)
22+
: BoomerAmgSolver(coarse_solver ? 1 : iodata.solver.linear.mg_cycle_it,
2323
iodata.solver.linear.mg_smooth_it, print)
2424
{
2525
}

0 commit comments

Comments
 (0)