A beginner’s guide to Big O notation

Cited from https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-notation/

Got confused about this when setting MaxDisk in Gaussian 09.

 

A beginner’s guide to Big O notation

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.

Anyone who’s read Programming Pearls or any other Computer Science books and doesn’t have a grounding in Mathematics will have hit a wall when they reached chapters that mention O(N log N) or other seemingly crazy syntax. Hopefully this article will help you gain an understanding of the basics of Big O and Logarithms.

As a programmer first and a mathematician second (or maybe third or fourth) I found the best way to understand Big O thoroughly was to produce some examples in code. So, below are some common orders of growth along with descriptions and examples where possible.

O(1)

O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.

bool IsFirstElementNull(IList<string> elements)
{
    return elements[0] == null;
}

O(N)

O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

bool ContainsValue(IList<string> elements, string value)
{
    foreach (var element in elements)
    {
        if (element == value) return true;
    }

    return false;
}

O(N2)

O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N3), O(N4) etc.

bool ContainsDuplicates(IList<string> elements)
{
    for (var outer = 0; outer < elements.Count; outer++)
    {
        for (var inner = 0; inner < elements.Count; inner++)
        {
            // Don't compare with self
            if (outer == inner) continue;

            if (elements[outer] == elements[inner]) return true;
        }
    }

    return false;
}

O(2N)

O(2N) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2N) function is exponential – starting off very shallow, then rising meteorically. An example of an O(2N) function is the recursive calculation of Fibonacci numbers:

int Fibonacci(int number)
{
    if (number <= 1) return number;

    return Fibonacci(number - 2) + Fibonacci(number - 1);
}

Logarithms

Logarithms are slightly trickier to explain so I’ll use a common example:

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.

This article only covers the very basics or Big O and logarithms. For a more in-depth explanation take a look at their respective Wikipedia entries: Big O Notation, Logarithms.

Advertisements

Dummy atom in Gaussian

Recently, I am working on some Gaussian jobs. The error “angle Alpha is outside the valid range of 0 to 180” always makes me upset.

Learning by oneself is really low-efficient and time-consuming. : (

I got some help from internet. Dummy atom could solve this problem. Please read the article first to know dummy atom at http://www.cup.uni-muenchen.de/ch/compchem/geom/internal.html. Please pay attention to the example at the end of the paper.

The origin coordinate is

C1
C2  1  r2
H3  1  r3  2  a3
H4  1  r3  2  a3     3  120.
H5  1  r3  2  a3     3  -120.
N6  2  r6  1  180.0  3  180.0

r2=1.45
r3=1.1
r6=1.2
a3=110.

The coordinate with dummy atom is

C1
C2  1  r2
H3  1  r3  2  a3
H4  1  r3  2  a3  3  120.0
H5  1  r3  2  a3  3  -120.0
X6  2  1.0  1  90.0  3  0.0
N7  2  r7  6  90.0  1  180.0

r2=1.45
r3=1.1
r7=1.2
a3=110.

So the key to set dummy atom in your system is:

  1. Add it before the atom you want to modify. Here is N7;
  2. Divide the problem angle to half. Here is N7-C2-C1. X should have a bond with C2 and an angle with C1. Set the bond length to 1.0 and the angle to 90.0;
  3. Dihedral analysis. The original N7 has dihedral N6-C2-C1-H3. Put X in the plane of C2-C1-H3, so that you do not need to change the dihedral for the new N7. You may tell the old dihedral N6-C2-C1-H3 is the same with N7-C2-X6-C1. So X6-C2-C1-H3 is 0.
  4. Modify the N7 data. N7 still has a bond with C2. N7 form a new angle with X6, which is 90.0. And find a nearby atom, here is C1. Since the old dihedral N6-C2-C1-H3 is the same with N7-C2-X6-C1, you may set N7-C2-X6-C1 as original  N6-C2-C1-H3 with 180.0

Now, you may use it to optimize or scan operations.

 

Basic Concept

Cited from https://faculty.uscupstate.edu/llever/Computational%20Chemistry/ab%20initio.html.

 

Ab Initio Molecular Orbital Theory

To make a quantum mechanical model of the electronic structure of a molecule, we must solve the Schrödinger equation.


Solving this equation is a very difficult problem and cannot be done without making approximations. We have covered some of these approximations in the Semiempirical MO Theory handout. In this handout we focus on ab initio methods of solving the equation, in which no integrals are neglected in the course of the calculation.


The Born-Oppenheimer Approximation

The first approximation is known as the Born-Oppenheimer approximation, in which we take the positions of the nuclei to be fixed so that the internuclear distances are constant. Because nuclei are very heavy in comparison with electrons, to a good approximation we can think of the electrons moving in the field of fixed nuclei. We first choose a geometry (with fixed internuclear distances) for a molecule and solve the Schrödinger equation for that geometry. We then change the geometry slightly and solve the equation again. This continues until we find an optimum geometry with the lowest energy.


The Independent Electron Approximation

 

 

When more than one electron is present, the Schrödinger equation is impossible to solve because of the interelectron terms in the Hamiltonian. Consider, for instance, the Hamiltonian for the hydrogen molecule in the Born-Oppenheimer approximation.

 

 

 


The first two terms are due to the kinetic energy of the electrons. The last six terms express the potential energy of the system of four particles. The potential energy term due to the repulsion of the electrons makes the Schrödinger equation impossible to solve.

To produce a solvable Schrödinger equation we assume that the Hamiltonian is a sum of one-electron functions, fi, with an approximate potential energy that takes the average interaction of the electrons into account. This leads to a set of one-electron equations, called the Hartree-Fock equations, where is a one-electron wavefunction.

The total wavefunction that is a solution to the total Schrödinger equation, , is approximated as the product of the solutions to the one-electron equations.

This product must be adjusted to satisfy the Pauli Exclusion principle, but we won’t get into that here. If you are familiar with determinants, it involves writing the wavefunction as a determinant.


The Hartree-Fock Self-Consistent Field (SCF) Approximation

The question remains about the approximate potential energy in the one-electron functions that take the average interaction of the electrons into account. What is the form of the functions fi in the Hartree-Fock equations? The most common way of handling this is to define

where vi is an average potential energy due to the interaction of one electron with all the other electrons and nuclei in the molecule. The average potential depends on the orbitals, , of the other electrons, which means we must solve the Hartree-Fock equations iteratively.

The iterative solution of the Hartree-Fock equation is as follows.

1. Guess reasonable one-electron orbitals (wavefunctions), , and calculate the average potential energies, vi.

2. Using the variation principle, solve the Hartree-Fock equations,

to give new one-electron orbitals, . Use these new orbitals to calculate new and improved average potential energies, vi. Because the solution of the Hartree-Fock equations depends on the variation principle, the Hartree-Fock energy should be higher than the true energy.

3. Repeat the second step until the one-electron orbitals and potential energies don’t change (are self-consistent).


Restricted Hartree-Fock Calculations

 

To take the Pauli Principle into account, we must include electron spin in our wavefunctions. The orbitals that are calculated by the Hartree-Fock method actually are spin orbitals that are a product of a spatial wavefunction and a spin function.

In a spin orbital, is the spatial wavefunction describing the probability of finding the electron in space and or are spin wavefunctions.

For a closed shell system, in which all of the electrons are paired, during the solution of the self-consistent field equations, we can restrict the solution so that the spatial wavefunctions for paired electrons are the same. This is called a restricted Hartree-Fock (RHF) calculation and generally is used for molecules in which all the electrons are paired. When the spin functions are removed, we are left with a set of spatial orbitals, each occupied by two electrons.

 

An example would be the restricted Hartree-Fock solution to the Schrödinger equation for the hydrogen molecule, H2. This would lead to two spatial orbitals, one occupied by the pair of electrons and one unoccupied. The orbitals holding electrons are called occupied orbitals and the unoccupied orbitals are called virtual orbitals.

 

 

 

 

 

 

 

 

 


Unrestricted Hartree-Fock Calculations

 

For open shell systems that contain unpaired electrons, the assumption made in the restricted Hartree-Fock method obviously won’t work. There is more than one way of handling this type of problem. One way is tonot constrain pairs of electrons to occupy the same spatial orbital – the unrestricted Hartree-Fock (UHF) method. In this method there are two sets of spatial orbitals – those with spin up () electrons and those with spin down

() electrons. This leads to two sets of orbitals as pictured at the right and to a lower energy than if the restricted method were used.

 

 

 

 

 

 


Basis Sets

For molecular calculations, the Hartree-Fock SCF equations

still cannot be solved without one further approximation. To solve the equations, each SCF orbital,, is written as a linear combination of atomic orbitals. For instance, for the H2 molecule, the simplest approximation is to write each spatial SCF orbital as a combination of 1s atomic orbitals, each centered on one of the protons.

 


This reduces the problem to solving for the coefficients, c1 and c2, since the atomic orbitals do not change.

The set of atomic orbitals that is chosen to represent the SCF orbitals is called a basis set. The {1sA, 1sB} basis set shown above is a minimal basis set – the smallest set of orbitals possible that describe an SCF orbital. Usually, the quality of a basis set depends on its size. For instance, a larger basis set, such as {1sA, 1sB, 2sA, 2sB}would do a better job approximating the SCF orbital than {1sA, 1sB}.
For many-electron atoms, we don’t know the actual mathematical functions for the atomic orbitals, so substitutes are used – usually either Slater-type orbitals (STO) or Gaussian-type orbitals (GTO). We won’t concern ourselves with the exact form of STO and GTO. Suffice it to say that they are chosen to behave mathematically like the actual atomic orbitals: s-type, p-type, d-type, and f-type, for instance. A few commonly used basis sets are listed below. The symbol of the basis set is given in the left column and the characteristics of the basis set in the center. At the right is the basis set that would be used to represent methane. For instance, the STO-3G basis set for methane would be {1sH, 1sH, 1sH, 1sH, 1sC, 2sC, 2pxC, 2pyC, 2pzC}.

Basis Sets1

Characteristics

Basis Set Example (CH4)
STO-3G A minimal basis set (although not the smallest possible) using three GTOs to approximate each STO. This basis set should only be used for qualitative results on very large systems Each H: 1s

C: 1s, 2s, 2px, 2py, 2pz

3-21G Inner shell basis functions made of three GTOs. Valence s- and p-orbitals each represented by two basis functions (one made of two GTOs, the other of a single GTO). Use for very large molecules for which 6-31G is too expensive. Each H: 1s, 1s’

C: 1s, 2s, 2px, 2py, 2pz, 2s’, 2px‘, 2py‘, 2pz

6-31G(d)
(6-31G*)
Inner shell basis functions made of six GTOs. Valence s- and p- orbitals each represented by two basis functions (one made of three GTOs, the other of a single GTO). Adds six d-type basis functions to non-hydrogen atoms. This is a popular basis set that often is used for medium and large systems. Each H: 1s, 2s

C: 1s, 2s, 2px, 2py, 2pz, 2s’, 2px‘, 2py‘, 2pz‘, 3dx2, 3dy2, 3dz2, 3dxy, 3dxz, 3dyz

6-31G(d,p)
(6-31G**)
Like 6-31G(d) except p-type functions also are added for hydrogen atoms. Use when hydrogens are of interest and for final, accurate energy calculations. Each H: 1s, 2s, 2px, 2py, 2pz

C: 1s, 2s, 2px, 2py, 2pz, 2s’, 2px‘, 2py‘, 2pz‘, 3dx2, 3dy2, 3dz2, 3dxy, 3dxz, 3dyz

Generally, the larger the basis set the more accurate the calculation (within limits) and the more computer time that is required. As an example, consider the calculation of the bond length of H-F using different basis sets, as shown below. 1

 

Basis Set Bond Length (Å) | Error (Å) |
6-31G(d)

0.93497

0.017

6-31G(d,p)

0.92099

0.003

6-31+G(d,p)

0.94208

0.025

6-31++G(d,p)

0.92643

0.009

6-311G(d,p)

0.91312

0.004

6-311++G(d,p)

0.91720

0.000

Experimental

0.917

You might notice that although the large basis set, 6-311++G(d,p), predicts the correct answer to within 0.001 Å, several others are correct to within 0.01 Å (well within the criteria of chemical accuracy). Although a larger basis set usually gives better results, you often have diminishing returns as you choose larger sets. A point may be reached beyond which the additional computer time is not worth it.


Post-SCF Calculations

Even with a very large basis set calculation, Hartree-Fock results are not exact because they rely on the independent electron approximation. Hartree-Fock SCF Theory is a good base-level theory that is reasonably good at computing the structures and vibrational frequencies of stable molecules and some transition states2. Electrons are not independent, though. We say that they are correlated with each other and that the Hartree-Fock method neglects electron correlation. This means that Hartree-Fock calculations do not do a good job modeling the energetics of reactions or bond dissociation. There are several ways of correcting SCF results to take electron correlation into account.

One method of taking electron correlation into account is Møller-Plesset many-body perturbation theory, which is used after a RHF or UHF calculation has been made. It is assumed that the relationship between the exact and Hartree-Fock Hamiltonians is expressed by an additional term, H(1), so that H = fi + H(1). Calculations based on this assumption lead to corrections that can improve SCF results. Various levels of perturbation theory can be applied to the problem. They are calledMP2, MP3, MP4, etc. MP2 calculations are not time-consuming and usually give quite accurate geometries and about one-half of the correlation energy. Because perturbation theory is not based on the variation principle, the energy predicted by MP calculations can fall below the actual energy.

Another important method of correcting for the correlation energy is configuration interaction (CI). Conceptually we can think of CI calculations as using the variation principle to combine various SCF excited states with the SCF ground state, which lowers its energy. We won’t use CI calculations in our exercises at this level.


SCF Molecular Orbitals

When calculating molecular orbitals, you should remember that molecular orbitals are not real physical quantities. Orbitals are a mathematical convenience that help us think about bonding and reactivity, but they are not physical observables. In fact, several different sets of molecular orbitals can lead to the same energy. Nevertheless, they are quite useful. We will use ethylene as an example to illustrate MO concepts.


The basis functions in SCF molecular orbitals are like atomic orbitals. A RHF/6-31G(d) calculation on ethylene uses 38 basis functions (15 for each carbon and 2 for each hydrogen). Since the molecular orbital wavefunction is expanded in terms of the all the basis functions,

it might seem that constructing a picture of the orbital would be difficult. Luckily, most of the coefficients are zero, so the molecular orbitals are easy to picture. Consider, for instance, the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) of ethylene.

HOMO:


The HOMO is a bonding-orbital.

 

 


LUMO:

The LUMO is an antibonding -orbital.

 

 


Scaling Vibrational Frequencies

In the last part of the job output from a frequency calculation you will find the predicted vibrational frequencies (cm-1) of the normal modes of the molecule. Also supplied are the predicted intensities of the IR and Raman bands corresponding to these normal modes.

 

                  1             2             3   
                  B1            B2            A1 
Frequencies -- 1335.5948     1383.4094    1679.4157 
 
 
                  4             5             6 
                  A1            A1            B2 
Frequencies -- 2027.8231     3160.8817    3232.9970

 

Computational results usually have systematic errors. In the case of Hartree-Fock level calculations, for instance, it is known that calculated frequency values are almost always too high by 10% – 12%. To compensate for this systematic error, it is usual to multiply frequencies predicted at the HF/6-31G(d) level by an empirical factor of 0.893. Similarly, frequencies calculated at the MP2/6-31G(d) level are scaled by 0.943. 1

 

The predicted frequencies after applying the 0.893 scale factor are listed below.

 

                       1             2             3 
                       B1            B2            A1 
Scaled Frequencies -- 1193          1235          1450 
 
 
                       4             5             6 
                       A1            A1            B2 
Scaled Frequencies -- 1811          2822          2887

 

 

 


References

1J. B. Foresman and Æ. Frisch, Exploring Chemistry with Electronic Structure Methods, Gaussian, Pittsburgh, 1995-96, p 102.

2J. B. Foresman and Æ. Frisch, Exploring Chemistry with Electronic Structure Methods, Gaussian, Pittsburgh, 1996, p 115.


Converse/USCS/Wofford
Physical Chemistry Consortium

Using Gaussian Checkpoint Files

I am always confused about how to use chk file in Gaussian09. The following is a good instruction for it. Cited from http://www.ccl.net/cca/documents/dyoung/topics-orig/checkpoint.html

Using Gaussian Checkpoint Files

David Young
Cytoclonal Pharmaceutics Inc.The Gaussian computational chemistry program allows the results of a calculation to be saved in a machine readable file, called a checkpoint file. The primary use of a checkpoint file is to use the results of one calculation as the starting point for a second calculation.

When a calculation is started using information from a checkpoint file, the calculation results will be placed in the exact same checkpoint file, overwriting the original checkpoint file. Thus it is always a good idea to make a backup copy of the checkpoint file. Note that it is possible for a checkpoint file to become corrupted (i.e. if a calculation dies while writing to the checkpoint file).

Gaussian will use a checkpoint file if the command

	%Chk=file_name

appears before the route card in the input file. If the specified file does not exist, it will be created. If the specified file does exist, information to be used in the present calculation can be read from it.

Commands for reading from the checkpoint file

A calculation can be started using information from the checkpoint file by including one of the following commands in the route card.

ChkBasis
Read the basis set from the checkpoint file.
SCF=Restart
Restart an SCF calculation from the checkpoint file. This is normally used when an SCF calculation failed finish for some reason.
IRC=Restart
Restarts an IRC calculation that did not complete, or restarts an IRC calculation for which additional points along the reaction path are desired.
Scan=Restart
Restarts a potential energy surface scan which did not complete.
Freq=Restart
Restarts a numerical frequency calculation which did not complete. Analytic frequency calculations cannot be restarted.
Polar=Restart
Restarts a numerical polarizability calculation which did not complete.
CIS=Restart
Restarts a CIS (Configuration Interaction – Single excitation) calculation which did not complete.
Opt=Restart
Restarts an geometry optimization which did not complete.
Geom=Checkpoint
Reads the molecular geometry from the checkpoint file.
Geom=AllCheckpoint
Reads the molecular geometry, charge, multiplicity and title from the checkpoint file. This is often used to start a second calculation at a different level of theory.
Guess=Read
Reads the initial guess from the checkpoint file. If the basis set specified is different from the basis set used in the job which generated the checkpoint file, then the wave function will be projected from one basis to the other. This is an efficient way to switch from one basis to another.
Density=Checkpoint
Reads the density from the checkpoint file. This inplies Guess=Only so that no integrals or SCF are computed. This is used to compute the population analysis or create cube files from a wave function without rerunning the job.
Field=Checkpoint
Reads the 34 multipole components of a finite field from the checkpoint file.
Field=EChk
Reads the 3 electric dipole field components from the checkpoint file.
Charge=Check
Reads point charges from the checkpoint file.
ReArchive
This option is used to generate an archive entry from the information in the checkpoint file. No calculation is run.

Checkpoint file Utilities

The c8694 utility can be used to convert a checkpoint file from an old format (Gaussian 86 through Gaussian 92/DFT) to the Gaussian 94 format. It is given a single argument, which is the name of the checkpoint file. The file is overwritten by a new file with the same name.

chkchk is a utility provided with the Gaussian distribution. It extracts the title and route sections from a checkpoint file. The command usage is

	chkchk file_name

where file_name.chk is the name of a checkpoint file in the current directory.

chkmove is another utility provided with Gaussian. It is used to move checkpoint files between machines of different architecture. Using ftp in binary mode to transfer a checkpoint file between machines of different architecture will not result in having a readable file on the new machine. In order to transfer a file between machines of dissimilar architecture, the steps that must be followed are

  1. Convert the checkpoint file from an unformated file to a formated file with the command
    		chkmove f file_name.chk file_name.xfr
    
  2. Transfer the formatted file to the new machine using the unix rcp command or ftp in ASCII mode.
  3. On the new machine, convert the file back into an unformated checkpoint file with the command
    		chkmove u file_name.xfr file_name.chk
    

The cubegen utility can be used to generate cube files from a checkpoint file. This gives the same end result as using the cube keyword in the route card. cubegen is explained in the “Gaussian User’s Reference”.

The formchk utility can be used to generate a formated (ASCII) version of the checkpoint file. formchk is used with the command line

	formchk file_name.chk file_name.fchk

where the first file name is the binary checkpoint file and the second is the name of the file to create. Using the formchk utility after the completion of a job does not allow as much flexibility of output format as is available by using the FormCheck keyword when the job is run.

The freqchk utility can be used to extract vibrational frequency and thermochemistry information from a checkpoint file. It can also be used to create files which can be read by HyperChem and used to animate the vibrational modes.

Creating an ASCII checkpoint file

The FormCheck keyword can be used to request that an ASCII version of the checkpoint file be created. It is always given the name Test.FChk and placed in the current directory. This file is not readable by the Gaussian program. FChk and FCheck are synonyms for FormCheck. The FormCheck command has a number of options which are explained fully in the “Gaussian User’s Guide”.

Further Information

An introductory description of how to use Gaussian is in
J. B. Foresman, A. Frisch “Exploring Chemistry with Electronic Structure Methods: A Guide to Using Gaussian” Gaussian (1993)

Commands are described in detail in
M. J. Frisch, A. Frisch, J. B. Foresman “Gaussian 94 User’s Guide” Gaussian (1996)

Return to table of contents.

Improved BASH script to extract stationary structure from Gaussian09 scan log file.

I have published a blog before to do this at https://sunxiaoquan.wordpress.com/2015/02/06/bash-script-to-extract-structure-at-stationary-point-from-gaussian09-scan-log-file/. Recently, I did a lot scan for a ring structure. And for some scan, Gaussian tried thousand times to find the stationary points. The previous script is too slow to extract structure from huge log file (>100M). So I improved it, the speed can not compare with perl or python. But it’s enough for daily use.

 

 

read -p "Gaussian log file " log
((atom=`grep "NAtoms=" $log | head -1 |cut -f2 -d"N" | cut -f2- -d" "`))
head=`cat $log | head -3 | tail -1 | cut -f1 -d"." | cut -f2 -d"="`
((m=`grep "Stationary point found" $log | wc -l`)) #get the total number of statoinary point step
((n=`grep "Z-Matrix orientation" $log | wc -l`)) #get the total number of scan step

a=(`grep -n "Stationary point found" $log | cut -f1 -d":"`)
b=(`grep -n "Z-Matrix orientation" $log | cut -f1 -d":"`)
((p=0))
for ((i=0;i<${#a[*]};i++));do
#echo i $i p $p
        for ((j=$p;j<${#b[*]};j++));do
                if (( ${b[$j]} > ${a[$i]} )); then
                        ((jj=$j-1))
                        (( c= ${b[$jj]} + 4 + $atom ))
                        ((q=$i+1))
                        echo $atom > "$head"_stationary_point"$q".xyz
                        echo >> "$head"_stationary_point"$q".xyz
                        cat $log | head -"$c" | tail -"$atom" | cut -c 15-20,31- >> "$head"_stationary_point"$q".xyz
                        ((p=$j))
                        break;
                fi
        done
done

mkdir "$head"_stationary_point
mv "$head"_stationary_point*.xyz ./"$head"_stationary_point
cd "$head"_stationary_point
for ((i=1;i<=$m;i++));do
        if (( $i == 1 ));then
        echo mol new "$head"_stationary_point"$i".xyz > combine.tcl
        else
        echo mol addfile "$head"_stationary_point"$i".xyz >> combine.tcl
        fi
done
unset a b

Generate Z-matrix coordinate file.

When I use Gaussian to scan bond, angle or dihedral, I need to use Z-matrix coordinate. I have two ways to do this.

  1. Use AMBERTOOLS with command
    antechamber -i 1.pdb -fi pdb -o 1.gzmat -fo gzmat
  2. Use Avogadro to open the structure file. Choose Extensions–Gaussian. and choose “z-matrix” in format option.

I have checked the output of the two method. Only numbering of atoms are the same, the values of bond, angle or dihedral have a very very slight difference that I think could be ignored.

BASH script to extract structure at stationary point from Gaussian09 scan log file.

I am using Gaussian 09 to do dihedral, bond and angle scan now. My lab does not have Gaussian View, and sometimes I need to show the scan procedure or analyze them. QMTool in VMD shows all the structure rather than the stationary structures. So I write this script with BASH to do this job. The script does not have high efficiency but enough for daily demand.

When I paste the script here, it lost the format 😦     But it works well.

1.Source ./extract_structure.sh.

2.Type the name of log file with the extension, eg Scan.log.

3.The extracted structures (xyz format) will be stored in a folder with the same name of the log file, eg Scan.

4.A VMD Tcl/Tk script will generate in the folder. The name is combine.tcl.

5.Open VMD and its Tk console and type source combine.tcl. Or simply type vmd -e combine.tcl. VMD will load all the structure as a trajectory. And you may analyze them now.

########################################################

read -p "Gaussian log file " log
((atom=`grep "NAtoms=" $log | head -1 |cut -f2 -d"N" | cut -f2- -d" "`))
head=`cat $log | head -3 | tail -1 | cut -f1 -d"." | cut -f2 -d"="`
((m=`grep "Stationary point found" $log | wc -l`)) #get the total number of stationary point step
((n=`grep "Z-Matrix orientation" $log | wc -l`)) #get the total number of scan step

for ((i=1;i<=$m;i++));do
    ((a=`grep "Stationary point found" $log -n | cut -f1 -d":" | head -"$i" | tail -1`));#echo a=$a
    for ((j=$n;j>=1;j--));do
    ((b=`grep "Z-Matrix orientation" $log -n | cut -f1 -d":" | head -"$j" | tail -1`))
        if (( $b < $a ));then
            (( c = b + 4 + $atom ));#echo b=$b c=$c
            echo $atom > "$head"_stationary_point"$i".xyz
            echo >> "$head"_stationary_point"$i".xyz
            cat $log | head -"$c" | tail -"$atom" | cut -c 15-20,31- >> "$head"_stationary_point"$i".xyz
            break
        fi
    done
done
mkdir "$head"_stationary_point
mv "$head"_stationary_point*.xyz ./"$head"_stationary_point
cd "$head"_stationary_point
for ((i=1;i<=$m;i++));do
    if (( $i == 1 ));then
        echo mol new "$head"_stationary_point"$i".xyz > combine.tcl
    else
        echo mol addfile "$head"_stationary_point"$i".xyz >> combine.tcl
    fi
done

########################################################