MULTIPOLE | All ERIs are recalculated at each SCF iteration. For long-range ERIs the double asymptotic expansion is used. This is the default. | ||
CONVENTIONAL | All ERIs are calculated at the beginning of the SCF procedure and stored. | ||
DIRECT | All ERIs are recalculated at each SCF iteration. | ||
MIXED | Short-range ERIs are calculated at the beginning of the SCF procedure and stored in RAM. Long-range ERIs are recalculated in each SCF iteration employing the double asymptotic expansion. | ||
RAM=Real | Specifies the RAM per core [in MB] usable in the calculation. | ||
TOL=Real | Threshold for ERI screening. |
Whereas the MULTIPOLE option is the method of choice for calculations that are memory bound ( 25000 basis functions), the MIXED option can be computationally beneficial for smaller systems, particularly in parallel runs with many SCF iterations. With this option the short-range ERIs are calculated only once before the SCF procedure and stored in RAM. The long-range ERIs are calculated in each SCF iteration employing the double asymptotic expansion. In parallel runs the ERI storage is distributed over all cores that have free RAM space, i.e. that are not involved in the allocation of SCF matrices. In Table 10 average timings per SCF cycle for PBE/DZVP/GEN-A2 calculations of n-alkane chains with the DIRECT, MULTIPOLE and MIXED option of the ERIS keyword are shown. Also listed are the number of basis functions, , the number of auxiliary functions, , and the number of SCF cycles until convergence is reached. Note that the number of SCF cycles was the same for all options and that the converged energies were identical to a.u. or better. All calculations were performed in parallel on a single compute node with 2 octo-core Intel Xeon E5-2650v2 CPUs @ 2.60GHz with a total of 64 GB RAM.
|
Table 10 shows that the ERIS option choice makes a noticeable difference for smaller systems where the linear algebra tasks are still not dominant. For such systems the MIXED option is advisable, particularly for Born-Oppenheimer molecular dynamics simulations (see Section 4.7).
To monitor the RAM space for the ERIS MIXED option a RAM allocation table can be printed with PRINT RAM (see also 4.12.2). The following example shows such a table for a CH calculation with the aug-cc-pVQZ basis (5270 basis functions) and GEN-A2 auxiliary function set employing 32 cores with 4 GB RAM each.
*** ERI STATISTIC *** Est. Integrated ERIs: 8486773340 Est. Asymptotic ERIS: 96798170 *** RAM Allocation *** Program part Size in MBytes SCF Kernel 905.121 ERI Kernel 4.416 DAE Kernel 1.152 FIT Kernel 0.613 LAG Kernel 225.340 Max RAM 4096.000 Max SHM 65536.000 Integrated ERIs: Incore on 31 CPUs Asymptotic ERIs: Direct SCF method *** Incore ERI Storage *** Sizes [MBytes] #CPU #ERIs ERI Vector Max. Size 0 SCF kernel allocation 1 277791345 2119.380 3614.660 2 273321035 2085.274 3614.660 3 270188345 2061.373 3614.660 4 270513580 2063.855 3614.660 5 270640015 2064.819 3614.660 6 271920380 2074.588 3614.660 7 273630545 2087.635 3614.660 8 278321900 2123.428 3614.660 9 277542740 2117.483 3614.660 10 279195440 2130.092 3614.660 11 278907830 2127.898 3614.660 12 274740355 2096.103 3614.660 13 276984955 2113.228 3614.660 14 276904470 2112.613 3614.660 15 274954730 2097.738 3614.660 16 273873285 2089.487 3614.660 17 274457520 2093.945 3614.660 18 272614335 2079.882 3614.660 19 270426255 2063.189 3614.660 20 270650360 2064.898 3614.660 21 270687890 2065.185 3614.660 22 270468465 2063.511 3614.660 23 271087330 2068.232 3614.660 24 273499175 2086.633 3614.660 25 274764620 2096.288 3614.660 26 273490215 2086.565 3614.660 27 271100105 2068.330 3614.660 28 274784945 2096.443 3614.660 29 272456980 2078.682 3614.660 30 272598035 2079.758 3614.660 31 274256160 2092.408 3614.660
The ERI STATISTIC lists the estimated number of ERIs calculated by recurrence relations (Est. Integrated ERIs) and by the double asymptotic expansion (Est. Asymptotic ERIS). These numbers are estimates because screening due to density matrix elements or fitting coefficients is not included. The following RAM Allocation table lists the RAM sizes required for individual calculation tasks. These are self-consistent field (SCF) iteration, near-field ERI recurrence relation (ERI), double asymptotic expansion (DAE), density fitting (FIT) and linear algebra (LAG) operations. The following two lines, Max RAM and Max SHM, print the maximum RAM per CPU (in this case set by the MAXRAM parameter; see Table 1) and the maximum shared memory size available to the SCF matrices. Note that the maximum shared memory size for the SCF matrices is 16 times the maximum RAM per CPU because there are 16 CPUs on each board of the cluster here used. The following output is specific to ERIS MIXED. It states that the near-field ERIs (Integrated ERIs) are held in-core on 31 CPUs and that the double-asymptotically-expanded ERIs (Asymptotic ERIs) are calculated according to the direct SCF method, i.e. they are recalculated twice (once for the Kohn-Sham matrix and another time for the Coulomb vector) in each SCF iteration. As this table shows, no ERIs are stored on CPU 0 because its RAM is used for the storage of the SCF matrices. On the other 31 CPUs a little bit more than 2 GB are used for ERI storage. With the ERIS MIXED option the computational time for ERI calculation can be reduced to below 10% of the total computational time [188]. Thus it is advisable to explore if the ERIS MIXED option can be used for an application at hand. Note that in the case of a serial run with only near-field ERIs the ERIS MIXED option is equivalent to an in-core SCF.
The CONVENTIONAL option of the ERIS keyword calculates the three-center ERIs before the SCF procedure and, if possible, stores them in RAM. This so-called in-core method is fast as long as all integrals fit into the RAM. The available RAM size (as distinct from system RAM size) is set by deMon2k with the MAXRAM parameter (see Table 1) or the RAM option of the ERIS keyword. If the RAM space is not sufficient, deMon2k will write all ERIs to the scratch file ioeri.scr. As a result, the ERIs must be read from disk at each SCF step. For larger systems this disk I/O becomes the bottleneck of the calculation. Note that the printing of the ERIs in the deMon.out file enabled by PRINT ERIS (see 4.12.2) requires the ERIS CONVENTIONAL option. The same holds for PRINT DEBUG which includes PRINT ERIS.
With the RAM option of the ERIS keyword the allocatable RAM size per core can be defined in the deMon input file. This overrides the MAXRAM definition in the parameter.h file. Note that a RAM size definition larger than the available physical memory will result in large paging overhead during program execution.
Screening of the ERIs can be controlled with the TOL option. The threshold
is calculated as:
(18) |