Discrepancy between structure factors #1306

joema10 · 2025-02-04T22:28:17Z

I am using freud to calculate the structure factor from a trajectory computed with LAMMPS. I am trying to reproduce the results for the structure factor in figure 4 of this article: https://pubs.rsc.org/en/content/articlehtml/2021/sm/d1sm00445j

This is schematically my code using freud:

# Load x,y,z (positions)  and box limits (infLx, supLx)*(infLy, supLy)*(infLz, supLz)
# Transform coordinates to get the box size is (0,Lx)*(0,Ly)*(0,Lz)

points = np.column_stack((x, y, z))

deltaKbin = 2*np.pi/frame.Lx  
kmax = 8.16 
kmin = deltaKbin  
nbin = int((kmax-kmin)/deltaKbin)+1 

sf = freud.diffraction.StaticStructureFactorDirect(bins=nbin, k_max=kmax, k_min=kmin)
sf.compute((freud.box.Box(frame.Lx, frame.Ly, frame.Lz), points),reset=False)

Everything works fine for most of the cases (e.g. rho=0.155, 0.252 and T=0.3), but for rho=0.407 and T=0.3 I don't get the same result. In this case the g(r) is similar, but S(k) is totally different using freud (see below)

I have made my own code (very inefficient) and calculating S(k) sampling randomly values of \vec{k} gives similar results compare to the reported S(k). Also, using ovito I have calculated S(k) and it seems to come out similar as well. I don't know if this particular case is difficult because of some of the assumptions freud makes internally and I just haven't found the right set of parameters or if something is wrong

Please find attached screenshots of ovito, the comparison between my code and freud, as well as one of the frames of the simulation I am analysing

Any help is welcome

I am using jupyter-notebook in Ubuntu 18.04.6 LTS with:

python v3.9.21
freud v3.1.0

Frame_last.txt

The text was updated successfully, but these errors were encountered:

DomFijan · 2025-02-12T20:21:02Z

Have you tried using StaticStructureFactorDebye?
How do the results compare between Debye vs. Direct vs. your code?
It looks like the number of bins between what you use in your code and in Freud is also different. This is probably not the cause of the issue, but could you set the number of bins such that they are equivalent for easier comparison?
What happens if you set num_sampled_k_points in Direct to say 10 000? Do you get the same results? ( this also shouldn't matter)

joema10 · 2025-02-12T21:58:33Z

This is the result for all three methods. As you can see between 1 and 2 there is a maximum in both my code and Debye. That maximum is also observed in the given reference, but Direct does not show that maximum. For this figure I have used num_sampled_k_points=10000, but there doesn't seem to be any change from the default value.

DomFijan · 2025-02-15T07:17:30Z

Can you confirm that all the particles are inside the box? You can always wrap them for good measure.

Could you also share the full code you use to run the example, including extraction of box parameters, and coordinates from the file? With that I could try and take a closer look as well.

joema10 · 2025-02-15T08:09:11Z

Thanks for the suggestion. It seems that the particles are wrapped correctly

Attached is the python script I am using for this comparison. The framework I am using to calculate S(q) here is attached above. If you need anything else from me, please let me know

COMPUTE_Sq_from_LAMMPSdump.txt

janbridley · 2025-02-20T16:30:03Z

When opening Frame_last.py in OVITO, the particles form layers in an odd way. Is this intended/your experience as well?

joema10 · 2025-02-20T17:39:52Z

Yes, that the system that I am studing

You can find this behaviour for particles interacting via SALR potential, like in this reference: https://pubs.rsc.org/en/content/articlehtml/2021/sm/d1sm00445j

joaander · 2025-02-26T14:02:39Z

I don't have time to try recreating your plots. But I would like to point out that a 1D averaged structure factor throws away a LOT of information from the full 3D S($\vec{k}$). With your lamellar type system, the peak would only show up at the one specific $\vec{k}$ that is commensurate with the direction the lamella repeat. All other S($\vec{k}$) at the same magnitude $k$ will likely be 0. Since StaticStructureFactorDirect samples randomly selected $\vec{k}$ values, it might miss the specific ones that match your system's direction, thus leading to the 0's where you expect a "peak" (which is the average of many 0's and a few non-zero values).

Nominally, StaticStructureFactorDirect states that when num_sampled_k_points==0 (the default) it will use all k points. However, this is not clear in the implementation. The code contains a complex process to select k points. Setting num_sampled_k_points==0 appears to only change the selection criteria for the random sampling:

freud/freud/diffraction/StaticStructureFactorDirect.cc

Lines 201 to 299 in 6be399c

    
           std::vector<vec3<float>> StaticStructureFactorDirect::reciprocal_isotropic(const box::Box& box, float k_max, 
        
                                                                                      float k_min, 
        
                                                                                      unsigned int num_sampled_k_points) 
        
           { 
        
               const auto box_matrix = box_to_matrix(box); 
        
               // B holds "crystallographic" reciprocal box vectors that lack the factor of 2 pi. 
        
               const auto B = box_matrix.transpose().inverse(); 
        
               const auto q_max = k_max / freud::constants::TWO_PI; 
        
               const auto q_max_sq = q_max * q_max; 
        
               const auto q_min = k_min / freud::constants::TWO_PI; 
        
               const auto q_min_sq = q_min * q_min; 
        
               const auto dq_x = B.row(0).norm(); 
        
               const auto dq_y = B.row(1).norm(); 
        
               const auto dq_z = B.row(2).norm(); 
        
               const auto q_volume = dq_x * dq_y * dq_z; 
        
               // Above the pruning distance, the grid of k points is sampled isotropically 
        
               // at a lower density. 
        
               const auto q_prune_distance = get_prune_distance(num_sampled_k_points, q_max, q_volume); 
        
               const auto q_prune_distance_sq = q_prune_distance * q_prune_distance; 
        
               const auto bx = freud::constants::TWO_PI * vec3<float>(B(0, 0), B(0, 1), B(0, 2)); 
        
               const auto by = freud::constants::TWO_PI * vec3<float>(B(1, 0), B(1, 1), B(1, 2)); 
        
               const auto bz = freud::constants::TWO_PI * vec3<float>(B(2, 0), B(2, 1), B(2, 2)); 
        
               const auto N_kx = static_cast<unsigned int>(std::ceil(q_max / dq_x)); 
        
               const auto N_ky = static_cast<unsigned int>(std::ceil(q_max / dq_y)); 
        
               const auto N_kz = static_cast<unsigned int>(std::ceil(q_max / dq_z)); 
        
               // The maximum number of k points is a guideline. The true number of sampled 
        
               // k points can be less or greater than num_sampled_k_points, depending on the 
        
               // result of the random pruning procedure. Therefore, we cannot allocate a 
        
               // fixed size for the data. Also, reserving capacity for the concurrent 
        
               // vector had no measureable effect on performance. 
        
               // NOLINTNEXTLINE(misc-include-cleaner) 
        
               tbb::concurrent_vector<vec3<float>> k_points; 
        
               // This is a 3D loop but we parallelize in 1D because we only need to seed 
        
               // the random number generators once per block and there is no benefit of 
        
               // locality if we parallelize in 2D or 3D. 
        
               util::forLoopWrapper(0, N_kx, [&](size_t begin, size_t end) { 
        
                   // Set up thread-local random number generator for k point pruning. 
        
                   const auto thread_start = static_cast<unsigned int>(begin); 
        
                   std::random_device rd; 
        
                   std::seed_seq seed {thread_start, rd(), rd(), rd()}; 
        
                   std::mt19937 rng(seed); 
        
                   std::uniform_real_distribution<float> base_dist(0, 1); 
        
                   auto random_prune = [&]() { return base_dist(rng); }; 
        
                   const auto add_all_k_points = std::isinf(q_prune_distance); 
        
                   for (unsigned int kx = begin; kx < end; ++kx) 
        
                   { 
        
                       const auto k_vec_x = static_cast<float>(kx) * bx; 
        
                       for (unsigned int ky = 0; ky < N_ky; ++ky) 
        
                       { 
        
                           const auto k_vec_xy = k_vec_x + static_cast<float>(ky) * by; 
        
                           // Solve the quadratic equation for kz to limit which kz values we must sample: 
        
                           // k_min^2 <= |k_vec_xy|^2 + kz^2 |bz|^2 - 2 kz (k_vec_xy \cdot bz) <= k_max^2 
        
                           // 0 <= kz^2 (|bz|^2) + kz (-2 (k_vec_xy \cdot bz)) + (|k_vec_xy|^2 - k_min^2) 
        
                           // 0 >= kz^2 (|bz|^2) + kz (-2 (k_vec_xy \cdot bz)) + (|k_vec_xy|^2 - k_max^2) 
        
                           // This step improves performance significantly when k_min > 0 
        
                           // by eliminating a large portion of the search space. Likewise, 
        
                           // it eliminates the portion of search space outside a sphere 
        
                           // with radius k_max. We round kz_min down and kz_max up to 
        
                           // ensure that we don't accidentally throw out valid k points in 
        
                           // the range (k_min, k_max) due to rounding error. 
        
                           const auto coef_a = dot(bz, bz); 
        
                           const auto coef_b = -2 * dot(k_vec_xy, bz); 
        
                           const auto coef_c_min = dot(k_vec_xy, k_vec_xy) - k_min * k_min; 
        
                           const auto coef_c_max = dot(k_vec_xy, k_vec_xy) - k_max * k_max; 
        
                           const auto b_over_2a = coef_b / (2 * coef_a); 
        
                           const auto kz_min = static_cast<unsigned int>( 
        
                               std::floor(-b_over_2a + std::sqrt(b_over_2a * b_over_2a - coef_c_min / coef_a))); 
        
                           const auto kz_max = static_cast<unsigned int>( 
        
                               std::ceil(-b_over_2a + std::sqrt(b_over_2a * b_over_2a - coef_c_max / coef_a))); 
        
                           for (unsigned int kz = kz_min; kz < std::min(kz_max, N_kz); ++kz) 
        
                           { 
        
                               const auto k_vec = k_vec_xy + static_cast<float>(kz) * bz; 
        
                               const auto q_distance_sq 
        
                                   = dot(k_vec, k_vec) / freud::constants::TWO_PI / freud::constants::TWO_PI; 
        
                               // The k vector is kept with probability min(1, (q_prune_distance / q_distance)^2). 
        
                               // This sampling scheme aims to have a constant density of k vectors with respect to 
        
                               // radial distance. 
        
                               if (q_distance_sq <= q_max_sq && q_distance_sq >= q_min_sq) 
        
                               { 
        
                                   const auto prune_probability = q_prune_distance_sq / q_distance_sq; 
        
                                   if (add_all_k_points || prune_probability > random_prune()) 
        
                                   { 
        
                                       k_points.emplace_back(k_vec); 
        
                                   } 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
               }); 
        
               return std::vector<vec3<float>>(k_points.cbegin(), k_points.cend()); 
        
           }

It is unfortunate that StaticStructureFactorDirect does not provide the full 3D S($\vec{k}$). The 1D version of it throws away so much information, and is really only valid when you have a massive sample with randomly oriented grains (powder diffraction).

Note that:

OVITO computes its structure factor with FFTs and will therefore not miss any peaks.
StaticStructureFactorDebye uses an approximation that is not valid at low values of $k$ as stated in the documentation. You should not expect it to give accurate results at low $k$.
Your first brute force plot appears to show arbitrarily selected $\vec{k}$ values. $\vec{k}$ that are not commensurate with the periodic boundary conditions will introduce aliasing artifacts.

Could you please provide a complete minimum working example that loads your data and plots the result so that we do not need to spend our time recreating it. @DomFijan - the S(K) computation code and binning is so simple it is hard to imagine something going wrong there. If I had to guess, I would say the bug might be in the complex reciprocal_isotropic method. We should check that all k vectors are properly generated. We should also consider writing a simple and clearly correct code path when num_sampled_k_points==0. All we should need is a triple for loop without the optimizations to limit the search space. Although, if the existing code is missing k-points, then the bug will still be present when num_sampled_k_points > 0.

DomFijan · 2025-02-27T00:20:32Z

Re num_sampled_k_points==0 : this sets q_prune_distance_sq = std::numeric_limits<float>::infinity(); which in turn makes the if condition for sampling points always true. So setting num_sampled_k_points==0 always emplaces back all the k_vec values, however it could still be true that these are not generated correctly - and thus making the whole num_sampled_k_points==0 path incorrect as you mentioned @joaander.

I used @joema10 script to test if turning off parallelization would show different results, but that doesn't seem to change results at all. I also played with number of bins and it seems that the brute force approach is very susceptible to bin width in this range as well (23 bins originally vs 48 bins plotted here).

One last thing I want to try is to plot the diffraction pattern. My thinking was that diffraction pattern is based on discrete FT and shouldn't suffer from any of the sampling shenanigans. I used a view orientation similar to what @janbridley posted and rotated the camera accordingly. One thing that I'm not 100% certain about is what correct orientation to pass to freud's compute - it seems that viewport API in ovito was changed recently and now does not provide viewMatrix, but rather only view orientation so I'm not 100% sure if did the rotation right. I'm not 100% sure if there is a peak at around k=1.25 or not, might have to play with this more.

I will try to implement the triple loop approach for num_sampled_k_points==0 as @joaander at some point this week to try and see if there are any issues with the sampling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between structure factors #1306

Discrepancy between structure factors #1306

joema10 commented Feb 4, 2025

DomFijan commented Feb 12, 2025

joema10 commented Feb 12, 2025

DomFijan commented Feb 15, 2025

joema10 commented Feb 15, 2025

janbridley commented Feb 20, 2025

joema10 commented Feb 20, 2025

joaander commented Feb 26, 2025

DomFijan commented Feb 27, 2025

Discrepancy between structure factors #1306

Discrepancy between structure factors #1306

Comments

joema10 commented Feb 4, 2025

DomFijan commented Feb 12, 2025

joema10 commented Feb 12, 2025

DomFijan commented Feb 15, 2025

joema10 commented Feb 15, 2025

janbridley commented Feb 20, 2025

joema10 commented Feb 20, 2025

joaander commented Feb 26, 2025

DomFijan commented Feb 27, 2025