网易闪电邮+Hotmail

```打开网易闪电邮，

（如果25不行，改用587）

Convert AVI to GIF.

My colleague has this demand. Sometimes when you present, and no essential video encoders are in the computer. That would be a miserable presentation. But GIF is ok for all the computer, so we do this. I tested this in my iMac.

ffmpeg -i video.avi video.gif

Convert Multipage PDF to TIFF.

I am working on many figures. I find a good way to generate great quality TIFF figures.

1. Generate PDF figures with GNUPLOT.
2. Convert PDF figures to TIFF with DPI 300.

For one page PDF, I use “convert -density 300 image.pdf image.tiff”.

For multipage PDF, I use “convert -density 300 image.pdf %02d.tiff”.

I use iMac with Imagemagick for this work.

PBC problem in Steered MD simulation

I am using distance of two atoms as CV and Steered MD simulation to force a linear molecular to a ring to calculate the tension on the ring. No matter whether this method was correct or not, I had some problems.

Some command of SMD is below.

```Hold one end fixed
100.0
ATOM 21
END
END

ncsu_smd
output_file = 'md2.smd'
output_freq = 1000
variable
type = DISTANCE
i = (21, 1458)
path = (X, 3.0) path_mode = LINES
harm = (100.0)
end variable
end ncsu_smd
```

Atom 21 was constrained. Atom 1458 could be moved. The finial distance was set to 3 Angstrom.

I measured the initial distance in VMD which was around 122Angstrom. But the output printed out 87Angstrom as initial distance.

I debugged for quite a while. And then I realized this should be caused by PBC, when I was having my lunch (very delicious Chinese Braised Pork Belly ; )

I changed the NPT system to NVT, since my initial coordinate was got from a well equilibrated system.

In detail, change ntb = 2, pres0 = 1.0, ntp = 1, to ntb = 0, igb =0,.

Then the output was normal.

Ten handy python libraries for (aspiring) data scientists

As suggested from guys of HPCC managers, I began to learn Python since last year for simple array operations. Now I am a pretty good entry level Python programmer. With Numpy and Scipy, I can handle most of my jobs. Here is a post from http://bigdata-madesimple.com/ten-handy-python-libraries-for-aspiring-data-scientists/. This post briefly introduces popular Python modules to facilitate the programming.

Data science has gathered a lot of steam in the past few years, and most companies now acknowledge the integral role data plays in driving business decisions.

Python, along with R, is one of the most handy tools in a data scientist’s arsenal. It’s also one of the simplest computer languages to learn and use, primarily because most concepts can be expressed in fewer lines of code in Python, than in other languages.

Hence, beginners venturing out into the field of data science should definitely familiarise themselves with Python.

Python also offers a slew of active data science libraries and a vibrant community. Below are some of the most commonly used libraries and tools:

NumPy

NumPy is an open source extension module for Python. It provides fast precompiled functions for numerical routines. It’s very easy to work with large multidimensional arrays and matrices using NumPy.

Another advantage of NumPy is that you can apply standard mathematical operations on an entire data set without having to write loops. It is also very easy to export data to external libraries that are written in low-level languages (such as C or C++), and for data to then be imported from these external libraries as NumPy arrays.

Even though NumPy does not provide powerful data analysis functionalities, understanding NumPy arrays and array-oriented computing will help you use other Python data analysis tools more effectively.

Scipy

SciPy is a Python module that provides convenient and fast N-dimensional array manipulation. It provides many user-friendly and efficient numerical routines, such as routines for numerical integration and optimization. SciPy has modules for optimization,  linear algebra,  integration and other common tasks in data science.

Matplotlib

Matplotlib is a Python module for visualization. Matplotlib allows you to quickly make line graphs, pie charts, histograms and other professional grade figures. Using Matplotlib, you can customise every aspect of a figure. When used within IPython notebook, Matplotlib has interactive features like zooming and panning. It supports different GUI backends on all operating systems, and can also export graphics to common vector and graphic formats like PDF, SVG, JPG, PNG, BMP, GIF, etc.

Scikit-Learn

Scikit-Learn is a Python module for machine learning built on top of SciPy. It provides a set of common machine learning algorithms to users through a consistent interface. Scikit-Learn helps to quickly implement popular algorithms on datasets. Have a look at the list of algorithms available in Scikit-Learn,  and you will realise that it includes tools for many standard machine-learning tasks (such as clustering, classification, regression, etc.).

Pandas

Pandas is a Python module that contains high-level data structures and tools designed for fast and easy data analysis operations. Pandas is built on NumPy and makes it easy to use in NumPy-centric applications, such as data structures with labelled axes. Explicit data alignment prevents common errors that result from misaligned data coming in from different sources.

It is also easy to handle missing data using Python. Pandas is the best tool for doing data munging.

Theano

Theano is a Python library for numerical computation, and is similar to Numpy. Some libraries such as Pylearn2 use Theano as their core component for mathematical computation. Theano allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

NLTK

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and wrappers for industrial-strength NLP libraries. NLTK has been used successfully as a platform for prototyping and building research systems.

Statsmodels

Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.

PyBrain

PyBrain is an acronym for “Python-Based Reinforcement Learning, Artificial Intelligence, and Neural Network”. It is an open source library mainly used for neural networks, reinforcement learning and unsupervised learning.

Neural network forms the basis for this library, making it a powerful tool for real-time analytics.

Gensim

Gensim is a Python library for topic modeling. It is built on Numpy and Scipy.

The figure below summarizes the number of GitHub contributors to the most popular data science libraries.

These are some of the best libraries I’ve tried or come across. But there are others.

If I’ve missed out any Python data science libraries that you swear by, do let me know what they are by leaving a comment below this blog.

First Blog

This Blog is used to record and organize my study and knowledge. I hope this would be helpful for other people.