Find hydrogen bond pair residues at each time range

Recently, I am learning python. BASH is too slow to deal with big data. Python is amazing. Five hours BASH script can be down in two minitues by Python.

The following script is still BASH. Because the running time is only several seconds, I didn’t use Python. Previously, I introduced a way to find residues on receptor to demonstrate the averaged hydrogen bonds at each time range at https://sunxiaoquan.wordpress.com/2015/03/09/count-hydrogen-bond-number-of-all-resides-on-receptor-at-each-time-range/ and https://sunxiaoquan.wordpress.com/2015/03/15/find-residues-of-hydrogen-bonds-per-picosecond-at-different-time-period/.

Today, I am going to introduce how to find the two residue forming hydrogen bonds and see how the averaged hydrogen bonds changed over time. This would be helpful for analyzing the interaction motif.

The strategy is

1. Get the hydrogen bond occupancy during the whole simulation. VMD gives very tricky result, a residue could be both donor or acceptor, I rearrange the donor and acceptor. The ligand residue is put ahead and residue in receptor is after that. And then removed the repeated terms.

2. Used the pair got from step1 and the hydrogen bond occupancy at each time range to find the corresponding occupancy.

3. Find the high occupancy pair to show the distribution.

The BASH code is below.

a=(`sed -n 3,"\$"p wholesimulation_hbond_details.txt | cut -f1 -d"-"`)
b=(`sed -n 3,"\$"p wholesimulation_hbond_details.txt | cut -f2 -d"-" | awk '{print $2}'`)

for ((i=0;i<${#a[*]};i++));do
        ((temp=`echo ${a[$i]} | cut -c 4-`))
        if (( $temp < 432 ));then
                a1[$i]=${b[$i]}; b1[$i]=${a[$i]}
        else
                a1[$i]=${a[$i]}; b1[$i]=${b[$i]}
        fi
done
echo ${a1[*]} | tr ' ' '\n' > temp_a1
echo ${b1[*]} | tr ' ' '\n' > temp_b1
paste temp_a1 temp_b1 | sort | uniq -c | sort -nr -k1> pair_uniq.dat
paste temp_a1 temp_b1 | sort | uniq > temp_ab
a=(`cat temp_ab | awk '{print $1}'`)
b=(`cat temp_ab | awk '{print $2}'`)
unset a1 b1

rm -f pair_*ns.txt
paste -d + temp_a1 temp_b1 | sort | uniq > pair_0ns.txt
for i in `ls -lrtx1 *_*_hbond_details.txt`;do
        time=`echo $i | cut -f2 -d"_"`
        for ((j=0;j<${#a[*]};j++));do
                temp=`grep -E "${a[$j]}.*${b[$j]}" $i | awk '{print $3}' | cut -f1 -d"%"`
                if [ -z "$temp" ]; then temp=0; fi
                sum=0
                for k in $temp;do
                        sum=`echo "$sum + $k" | bc -l`
                done
                sum=`echo "$sum/100" | bc -l | awk '{printf "%f", $0}' | cut -c -5`
                echo $sum >> pair_${time}ns.txt
        done
done

i=`ls -lrtx1 pair_*ns.txt`
paste $i > pair_total.dat
rm -r pair_*ns.txt temp_*
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s