Visualize connected voids in the unit cell

In the previous post, I have shown how to find the void in VMD. But if you want to visualize the connected voids, it will become a little complicated. At the beginning, I tried to use watershed algorithm to do that, during which I had an idea to do that in a much simpler way. I will introduce below.

  1. Use the following Tcl script to find all voids and put a Si atom in each void. And the end of the script, a XYZ file is generated to use Si to represent the voids in the unit cell.
  2. Use VMD to open the XYZ file.
  3. In the representation setting, you can select “fragment 0” to “fragment N” to see the connected voids. VMD can figure out the connected voids and treat them as different fragments. With Selection tool, you can even move the connected voids to anywhere you want for better representation.
  4. You can also use color method “fragment” to use different colors to highlight different fragments, but the default color scheme is not good.


#Change background color to white
color Display Background white

#set sphere radius
set probe 0.8

#set the box side-length, this is a cube
set length 39.91

#the side-length is divided into N grid, 50 is enough to visualize
set step [expr { $length/1.4 }]

#delete all graphics
draw delete all

#draw box, wrap the system
pbc box
pbc wrap
#draw color green
#recalculate the bond formation
mol bondsrecalc 0

set outfile [open "" w+]

set i 0
#find void
for {set x 0.5} {$x <= $length} { set x [expr {$x + $length/$step}]} {
        for {set y 0.5} {$y <= $length} { set y [expr {$y + $length/$step}]} {
                for {set z 0.5} {$z <= $length} { set z [expr {$z + $length/$step}]} {
                        set a [[atomselect top "sqrt( sqr(x-$x) +sqr(y-$y) + sqr(z-$z) )  < 2"] num]
                        if { $a == 0 } {
#                               draw sphere "$x $y $z" radius $probe
                                incr i
                                set X($i) $x; set Y($i) $y; set Z($i) $z


puts $outfile $i
puts $outfile "Connected Pores"
for {set m 1} {$m <= $i } {incr m} {
        puts $outfile "Si $X($m)  $Y($m)  $Z($m)"
close $outfile
mol new



Using VMD to show the VOID in a box

I am studying the porosity of polymers. The polymer was packed into a cubic box from an extended linear conformation. After that, we had a PDB file of the packed polymer. The following script in Tcl/Tk is used to show the void inside.

The length is the side-length of the box, the unit is angstrom. The step is how fine you want to do the ananlysis. After running this, all void will be filled by a sphere shape of radius 1 angstrom.

This script can give you a rough result. If you want more detailed results, you need to specify the radius of each element, which I chose 3 angstroms in the script.

set length 39.533
set step 50
draw delete all
pbc box
pbc wrap

for {set x 1} {$x <= $length} { set x [expr {$x + $length/$step}]} {
        for {set y 1} {$y <= $length} { set y [expr {$y + $length/$step}]} {
                for {set z 1} {$z <= $length} { set z [expr {$z + $length/$step}]} {
                        set a [[atomselect top "sqrt( sqr(x-$x) +sqr(y-$y) + sqr(z-$z) )  < 3"] num]
                        if { $a == 0 } { draw sphere "$x $y $z" radius 1}


Ten tips to give a great thesis defense

I am going to have my proposal defense on tomorrow morning. This is what I think is important guidance for the proposal defense.

The content is cited from


Planning Your Talk

1)      Know Your Audience
Everyone will tell you to know your audience, which couldn’t be truer when you’re planning the introduction to your talk. Sure, there is a big difference between talking to high school students and presenting at a conference, but try to think: who is coming to my talk? If they are all cellular biologists like you, then skip the central dogma slide. But if you have a mix of disciplines you need to be able to explain your work to a biologist, as well as an electrical engineer. Imagine you’re giving the talk to one person with each potential background. Would each person be able to follow it? Sometimes you need to sacrifice some specific details in order to explain the important stuff to everybody. (But you should be able to talk extemporaneously on the specifics if anyone asks!)

2)      Justify Yourself
An introduction is more than just a history of your field up until now. That is, it’s more than a literature review. You need to review the current literature, but more importantly put your research into context. What have you done (or what are you doing) that no one else has done? Keep in mind that just because no one else has done X doesn’t mean doing X is worthwhile- there might be a very good reason why no one else has done it! As you introduce your research you’ll likely explain why you’re doing it, but make sure you also explain why others in the field care. Even more important that justifying your work is justifying your conclusions. You MUST be able to back up any claims with solid references, or solid experimental results! In many cases this means statistical tests of quantitative data. When in doubt, err on the side of “inconclusive” or qualify/temper any of your statements rather than stretch your conclusions.

3)      Tell A Story
One of the most jarring moments in a bad presentation is the lack of transitions. Your presentation should flow from slide to slide and section to section. This will most likely mean that you aren’t going to present your experiments in the order that you did them. You’re NOT telling the story of you working in the lab! Think: what are the overall conclusions from your work and how can you explain and prove the things you’ve concluded? Walk your audience through the story, laying out the evidence convincing them you’re right about your conclusions. One last thing: you’ve (hopefully) done a lot of experiments, you’ve invested a lot of time, energy, and maybe even money into these experiments and you want to show off everything you’ve done. But if an experiment or data slide doesn’t fit in the “story” you might have to leave it out. If you can’t make it fit in the flow of your story and/or you don’t NEED it: leave it out.

4)      Sweat the Small Stuff
The little details are important. Even if you have some really great results to show, you’re going to anger, upset, or at least annoy your audience if you don’t pay attention to details. Some examples:

  • Label the axes of any graphs (with units), don’t use 10E3 mV (when V works) and don’t forget error bars!
  • Make sure any images have scale bars, and label items of interest. (You might know what’s a cell and what’s dust, but everyone else might not!) Use the same size, color, and font text.Try to use the same slide layout.
  • Make all your graphs, diagrams, molecular depictions, etc. with the same program throughout. It’s noticeable if you copied one molecule from a paper, made some in ChemDraw, and others with ChemSketch. The same holds true with graphs in Excel versus Origin.
  • Excel can be your friend but if you use the default graph settings it will be your downfall. Don’t leave on the gridlines or use the standard random colors. Oh, and look into Origin.

5)      Present in Bite Sized Slides
For each slide be sure to explain everything. Explain the x and y axes of your graph, explain what a large value indicates, and a low value indicates. Walk people through how you set up the experiment, how you collected the data, analyzed the results, and talk about the controls. Before moving to the next slide, restate the major finding or “take-away” from this slide. What did this experiment tell you, and what questions are still unanswered. This will help build in transitions as you tell your story. You probably know every piece of your presentation inside and out, but you need to remind your audience of salient points from earlier in the presentation.

Giving the Presentation

6)      Practice, Practice, Practice!
Even the most beautiful slides with the most logical flow and greatest data can trip you up if you don’t know what you’re going to say. It should go without saying that you can’t just read off of your slides, but seriously: practice, practice, practice! Run through it in your head, do in out loud and most importantly, do it in front of other people: schedule practice talks! In the days leading up to your presentation you should be able to run through the talk in your head without notes. As you’re walking the halls, driving, or cooking you should run through the talk over and over. The goal is that when you get up there on the big day, everything comes out naturally- almost second nature. For me, I need to write a script- I don’t memorize it word for word, but the act of writing what I want to say helps. Of course, if you’re a naturally gifted speaker and can give a talk on the fly you’re set- but you should still practice!

7)      Don’t wait until the last minute
The goal of practice talks is to get feedback from friends, lab mates, classmates in general, and hopefully your advisor. It does you little to no good if your practice talks are the day or two right before your talk. You need to give yourself time to integrate their changes into your presentation- both the slides and the talk. I like a formal practice talk the week, and two weeks before the talk. This gives you enough time to change slides, change what you might say, and change the written document (if applicable). If you give yourself enough time you might even be able to squeeze in an extra experiment before the big day to fill any “holes” in your story.

8)      Try out the room and equipment
Not all practice talks are created equal. Sure, you can run through the slides on your laptop in your advisor’s office but you really need to get up in front a group of people- preferably in the same room you’ll be giving your presentation. Not only do you get in the presentation mind set, but you get used to the space, you test the equipment and therefore minimize surprises on presentation day. For example, one talk I went to recently was marred by the screen flashing horizontal bars randomly- it was nearly seizure inducing. Finally, they borrowed someone else laptop but do you really want that stress on your big day? Dress rehearsals are your friend!

9)      Be comfortable with your knowledge
In many cases when you present your research you will be the most knowledgeable person in the room about your topic. Be comfortable with that, and confident that you know what you’re talking about. Professors and especially your thesis committee (whom probably know a decent amount about your topic) can smell fear like sharks find blood in the water. Don’t make it easier for them! Don’t let them know you’re nervous, or might not be sure about something. Confidence goes a long way, BUT don’t let it go too far. Don’t get cocky because nothing is more tantalizing that crushing an OVER confident student. Be confident, but not cocky.

10)   Be humble
You know your research, your techniques, your experiments, and your data. But you might get questions a little removed (or a lot removed) from your research. You might even get questions you don’t know the answer to, or aren’t sure about. The best advice I can give someone going into a defense- even last minute- is don’t be afraid to say “I don’t know.” Guessing, or even worse, making something up, is so much worse that admitting you don’t know the answer to a question. I’ve seen professors who will grill a student and not stop until they say “I don’t know” or they catch them answering wrong (guessing/making something up). You’ll never know everything about everything so don’t be afraid to say “I don’t know”. But it is inexcusable to guess or make up an answer- it will only get more painful from that point on. On the flip side, don’t answer every question with “I don’t know”- it’s not a get out of jail free card!

Appending a script after a running script.

This is a pretty cool function.

I use Gaussian all the time. Sometimes, I just run the command in the terminal at the frontground. But sometimes, I have another job that I want to run after the previous job. I don’t want to check the running condition all the time, and I don’t want to waste the computing resource. So the following is what we can do for this condition.

Got help from HPC manager David Chaffin and Pawel Wolinski, as well as the website at

If you have a script running in the terminal at the frontground.

Press ctrl+z

Type bg

Then the job will run at the background.

Use jobs to check the scripts running at the background. If is the first one, you may type the following command to append your will run right after the end of

wait %1; ./

You may kill jobs by kill %1 too.

The following is the content at I think it is very useful.


Bg, Fg, &, Ctrl-Z – 5 Examples to Manage Unix Background Jobs

When you execute a unix shell-script or command that takes a long time, you can run it as a background job.

In this article, let us review how to execute a job in the background, bring a job to the foreground, view all background jobs, and kill a background job.

1. Executing a background job

Appending an ampersand ( & ) to the command runs the job in the background.

For example, when you execute a find command that might take a lot time to execute, you can put it in the background as shown below. Following example finds all the files under root file system that changed within the last 24 hours.

# find / -ctime -1 > /tmp/changed-file-list.txt &

2. Sending the current foreground job to the background using CTRL-Z and bg command

You can send an already running foreground job to background as explained below:

  • Press ‘CTRL+Z’ which will suspend the current foreground job.
  • Execute bg to make that command to execute in background.

For example, if you’ve forgot to execute a job in a background, you don’t need to kill the current job and start a new background job. Instead, suspend the current job and put it in the background as shown below.

# find / -ctime -1 > /tmp/changed-file-list.txt

# [CTRL-Z]
[2]+  Stopped                 find / -ctime -1 > /tmp/changed-file-list.txt

# bg

3. View all the background jobs using jobs command

You can list out the background jobs with the command jobs. Sample output of jobs command is

# jobs
[1]   Running                 bash &
[2]-  Running                 evolution &
[3]+  Done                    nautilus .

4. Taking a job from the background to the foreground using fg command

You can bring a background job to the foreground using fg command. When executed without arguments, it will take the most recent background job to the foreground.

# fg

If you have multiple background ground jobs, and would want to bring a certain job to the foreground, execute jobs command which will show the job id and command.

In the following example, fg %1 will bring the job#1 (i.e to the foreground.

# jobs
[1]   Running                 bash &
[2]-  Running                 evolution &
[3]+  Done                    nautilus .

# fg %1

5. Kill a specific background job using kill %

If you want to kill a specific background job use, kill %job-number. For example, to kill the job 2 use

# kill %2

A beginner’s guide to Big O notation

Cited from

Got confused about this when setting MaxDisk in Gaussian 09.


A beginner’s guide to Big O notation

Big O notation is used in Computer Science to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.

Anyone who’s read Programming Pearls or any other Computer Science books and doesn’t have a grounding in Mathematics will have hit a wall when they reached chapters that mention O(N log N) or other seemingly crazy syntax. Hopefully this article will help you gain an understanding of the basics of Big O and Logarithms.

As a programmer first and a mathematician second (or maybe third or fourth) I found the best way to understand Big O thoroughly was to produce some examples in code. So, below are some common orders of growth along with descriptions and examples where possible.


O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.

bool IsFirstElementNull(IList<string> elements)
    return elements[0] == null;


O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.

bool ContainsValue(IList<string> elements, string value)
    foreach (var element in elements)
        if (element == value) return true;

    return false;


O(N2) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N3), O(N4) etc.

bool ContainsDuplicates(IList<string> elements)
    for (var outer = 0; outer < elements.Count; outer++)
        for (var inner = 0; inner < elements.Count; inner++)
            // Don't compare with self
            if (outer == inner) continue;

            if (elements[outer] == elements[inner]) return true;

    return false;


O(2N) denotes an algorithm whose growth doubles with each additon to the input data set. The growth curve of an O(2N) function is exponential – starting off very shallow, then rising meteorically. An example of an O(2N) function is the recursive calculation of Fibonacci numbers:

int Fibonacci(int number)
    if (number <= 1) return number;

    return Fibonacci(number - 2) + Fibonacci(number - 1);


Logarithms are slightly trickier to explain so I’ll use a common example:

Binary search is a technique used to search sorted data sets. It works by selecting the middle element of the data set, essentially the median, and compares it against a target value. If the values match it will return success. If the target value is higher than the value of the probe element it will take the upper half of the data set and perform the same operation against it. Likewise, if the target value is lower than the value of the probe element it will perform the operation against the lower half. It will continue to halve the data set with each iteration until the value has been found or until it can no longer split the data set.

This type of algorithm is described as O(log N). The iterative halving of data sets described in the binary search example produces a growth curve that peaks at the beginning and slowly flattens out as the size of the data sets increase e.g. an input data set containing 10 items takes one second to complete, a data set containing 100 items takes two seconds, and a data set containing 1000 items will take three seconds. Doubling the size of the input data set has little effect on its growth as after a single iteration of the algorithm the data set will be halved and therefore on a par with an input data set half the size. This makes algorithms like binary search extremely efficient when dealing with large data sets.

This article only covers the very basics or Big O and logarithms. For a more in-depth explanation take a look at their respective Wikipedia entries: Big O Notation, Logarithms.

Steered MD simulation in AMBER using Torsion as the reaction coordination.

I need to generate a set of trajectory of the rotation of one dihedral in a polymer. AMBERTOOLS14 has a command “dihedralscan” to do that, but it only works for dihedrals of proteins. It’s not allowed to define your own dihedral. So I chose SMD to do this.

It’s not complicated to set up the system. The only confusing part is the value of coordinate of torsion is rad not degree.

The input file of NPT part is listed below in case you might need. The initial torsion was about -1rad. To rotate 360degree. I set the reaction coordinate change from the initial value to 5.3. The diehdral is on residue 10 ( atom  100, 196, 155, 175). The rest parts are restrained with 500kcal/mol/angstrom^2.

  imin = 0, irest = 1, ntx = 7,
  ntb = 2, pres0 = 1.0, ntp = 1,
  taup = 2.0,
  cut = 10.0, ntr = 1,
  ntc = 1, ntf = 1,
  tempi = 300.0, temp0 = 300.0,
  ntt = 3, gamma_ln = 1.0,
  nstlim = 100000, dt = 0.001,
  ntpr = 1000, ntwx = 1000, ntwr = 1000, ig = -1,
Weak Hold
RES 1 9
RES 11 20

 output_file = 'smd.txt'
 output_freq = 1000
  type = TORSION
  i = (100, 196, 155, 175)
  path = (X, 5.3) path_mode = LINES
  harm = (100.0)
 end variable
end ncsu_smd


DSSP refers to Dictionary of Secondary Structures of Proteins. It’s a rough method to analyze protein secondary structures along with time in MD simulations.

The following is an example. Cited from

There are many softwares or MD simulation package can do such an analysis. I prefer to use AMBERTOOLS.

The script is below.

parm test.prmtop
trajin test.mdcrd
secstruct :1-100 out dssp.gnu

The GNU file has some problem and you can not get a picture from it.

I did the following modification to generate PNG file or PDF file.

  1. At the end of GNU file change pause -1 to set output.
  2. Delete line 1 to 12. The error part is at line 5, which is the setting of cbtics.
  3. Change line 13 splot “-” with pm3d title “dssp.gnu” to splot “-” u ($1/10):2:3 w pm3d title “dssp.gnu”. You may change $1/10 according to your relationship between frame and time.

The head of GNU file can be changed to the following content. 

unset key
set grid
set terminal pngcairo
set size 0.96,1
set encoding iso_8859_1
set xtics nomirror
set ytics nomirror
set pm3d map corners2color c1
set cbrange [  -0.500:   7.500]
set cbtics    0.000,1,7
set palette maxcolors 8
set palette defined (0 "white",1 "#0000FF",4 "#00FF00", 5 "grey", 7 "#FF0000")
set cbtics("None"    0.000,"Para"    1.000,"Anti"    2.000,"3-10"    3.000,"Alpha"    4.000,"Pi"    5.000,"Turn"    6.000,"Bend"    7.000)
set xlabel "Time (ns)"
set ylabel "Residue Number"
set yrange [   0.000: 600]
set xrange [   0.000: 84]
set title "DSSP of this protein"
set output "dssp.png"