dmc_lec1

Summer School on Computational Materials Science:

Basic Simulation Techniques

Lecture 1: Introduction

Why is computer simulation important?

Many-body problems are ubiquitous in physics, chemistry, biology, and materials science. Simulations provide a very powerful way of solving these problems, enabling one to go directly from a microscopic Hamiltonian to macroscopic properties measured in experiment, essentially without approximation. Some of the reasons why simulations have become so important are:

Other general methods for many-body problems involve approximations. In the continuum, a two body problem can be solved analytically. There is no exact solution for more than two bodies.
Uncontrolled approximations always breakdown somewhere. One needs a method to test out or benchmark, approximate methods to find their range of validity. This implies that the exactness of simulation is very important, otherwise we're wasting computer and human time.
What is the role of theory today? Why not just measure properties in the lab? Simulation can give reliable predictions when experiments are not possible or very difficult. Some examples: What is the equation of state of hydrogen at 10MBars? How do polymers move?
It is often interesting to study artificial models to gain insight into the limitations of theory. For example: how do phase transitions change in going from 2 to 3 to 4 dimensions?
Simulations are easy to do, even for very complex systems; their complexity is no worse than the complexity of the physical description. In contrast, purely theoretical approaches typically are applicable only to very simplified models. Simulations are good for learning about a system and to get a good idea of what is going on. Also they are a good black box for non-experts since one does not have to master a particular theory to understand the input and output of a simulation.
Simulations scale up with the increase of computer power. The earliest simulations involved 32 particles; now one can do millions of particles.

The above reasons have lead to a "boom" in the development of simulations. Originating at the dawn of the computer age (1950) direct simulations have now spread to most areas of pure and applied science. Computational physics has been said to constitute a third way of doing physics, comparable to theory and experiment. We will see that simulations do have both theoretical and experimental aspects. Simulations have become so important that all scientists should learn its fundamentals.

There are many applied applications, for example, simulated annealing for optimization, economics (stock markets). The literature on simulation is quickly expanding but there are few comprehensive textbooks. Most interesting techniques have not yet been invented or tried; there are plenty of developments coming out.

Two approaches to simulation are often presented:

Assume the Hamiltonian is given from experiment. Most of the basic equations in condensed matter physics are all well-known. It is just a question of working out the details. What properties of a many-body system can we calculate without making any uncontrolled approximations? In this view a simulation expert is essentially an applied mathematician. In fact, at the forefront problems, this is not the case. It takes a great deal of physics knowledge and intuition to figure out which simulations to do, which properties to measure etc., just like an experimentalist needs to do more than read the output of the apparatus.
The other view is that of a modeler: we are allowed to invent new equations and algorithms to describe some physical system. Let us invent a fictitious model with some dynamics that is easy to do on a computer and do systematic studies of the properties of the model. Later on we might investigate whether some physical system is described by the model. Clearly this approach is warranted in such fields as economics, ecology, biology ... since the "correct" underlying description has not always been worked out.

We will concentrate on the first type of problem. The second problem is quite different and success is not so well defined, but occasionally it is extremely successful (chaos, cellular automata...)

Further reading. "Microscopic Simulations in Physics" by D. Ceperley, Rev. Mod. Phys. 71, S438, 1999.

Some of the issues concerned with in simulations are:

The theoretical (mathematical) underpinnings of simulation.
- We must understand all the approximations and errors since we want to control all of the errors.
- The generic problems with simulations. The fundamental problem is that computer time is limited which implies that the number of particles and total length of the simulation are much smaller and shorter than the real world. Also all simulation results have statistical errors just like experiments do. We have to be able to control the effects of these limitations.
- What is the optimal algorithm for a given problem? How do we measure the performance of an algorithm? Important problems take all computer resources so optimization is important. In the past, the efficiency of simulations has increased more because of better algorithms than computer hardware.
Analysis of results. All simulations produce are numbers. Usual theory produces equations. How do we go from numbers to "understanding" or at least characterization of the results.
Debugging techniques. How do we gain confidence that the computer code is correct, it is doing what we think it should be doing?
What are the outstanding challenges? Which properties can't we simulate? Classical simulations of simple properties are in relatively good shape unless you have a system which requires many particles or long time scales. (for example the propagation of a crack, the movements of proteins etc.) But these are precisely the system we want to understand.

Statistical Mechanics Background:

The physical foundation of simulation is statistical mechanics. Without statistical mechanics, it would be difficult to make the connection between the rather small systems simulated for short periods of time with measurements on macroscopic samples. Statistical mechanics "course grains"- separates the thermodynamic macroscopic variables like pressure and temperature from the microscopic variables: the individual atomic positions and velocities.

Even though most of the simulations we will talk about are classical, it is actually easier to understand the fundamental principles if we use the basic concepts of stationary states in quantum mechanics. This is because the energy states are a countable finite set. We shall indicate a few salient points after defining some terminology.

Phase Space

A mechanical system is composed of many particles with position, q_i, and momentum, p_i, denoted as q and p for short. This is the phase space. Any mechanical systems with such coordinates can be characterized by a function, H(q, p; t), typically, the Hamiltonian of the system which determines the energy for any coordinates.

Ensemble

The collection of all possible sets of allowed values of {q_i} and {p_i} for the system, compatible with the constraints, e.g. fixed N, Volume, V, and energy, E, for microcanonical, is called the "ensemble" (in the original sense of the word, a collection or group). If there are G accessible states, then there are G systems in the ensemble. There are different types of ensembles depending on the constraints, constant-NVE, constant-NVT, constant- m VT, and so on.

Suppose that we have a system with fixed N and V. Then:

There are a finite number of energy levels in any energy range (E, E+dE). We call this the density of states in Phase Space g(E)= exp( S(E) ) where the entropy is S(E).
If we don't know which state the system is in, the natural assumption is that it is equally likely for the system to be in any of the states in that range. Among the states in this range, all are equally likely. ( A Priori equal probabilities is what you assumed for coin tosses, if the coin is not weighted, or for a well-shuffled deck of cards in which each of the 52! arrangements are equally likely.) This was called "principle of insufficient reason" by James Bernoulli (1713) at the start of probability theory, which was furthered by Bayes and Laplace and their non-frequency interpretation of probability. In classical statistical mechanics, the idea of phase space is harder to justify since one has continuous variables; quantum mechanics solved that fundamental problem of which phase space in fundamental.
Next consider the interaction between two weakly coupled systems (with Energy E₁ and E₂ and particle numbers N₁ and N₂) that are allowed to exchange energy and nothing else. Then for each pair of states, there exists a new combined state with energy E₁+E₂. Hence the entropy of the combined system is the sum of the entropy of the two parts. S(E₁+E₂)=S₁(E₁)+S₂(E₂).
If one of the two systems is large (N₁ N₂ with N₁+N₂=N), the density of states is enormous and energy will flow to maximize the total entropy. This maximum will occur when the temperatures of the two systems are the same. The temperature (corresponding to the energy E) is defined as 1/T = dS/dE. See notes on Gibbs Distribution (PDF).
The following is the important conclusion of Boltzmann: For a system in contact with a heat bath, P(E_i)=exp(-b E)/Z.
Z= sum exp(-b E), the normalization constant, is known as the partition function. Usually it is more convenient to work with the (Helmholtz) free energy: F = -kT ln(Z), since the free energy is proportional to the size of the system, extensive, not exponential. All thermodynamic properties can be derived as various derivatives of the free energy as we will see.
The measured value of some observable (operator A) will equal the trace(exp(-beta H) A)/trace (exp(-beta H)). This is the quantum version of the more familiar < A > = S P_iA_i.

While we have discussed the constant energy ensemble (micro-canonical) we will encounter other ensembles soon, for example, constant pressure, constant temperature (canonical), grand canonical (where the system can exchange particles with the heat bath), etc.

Classical Statistical Mechanics

The above discussion was formulated in terms of quantum states. Let us now take the classical limit of the Boltzmann formula. Suppose H=K+U where K is the kinetic operator and U is the potential operator. For high enough temperatures we can write exp(-b H) = exp(-b K) exp(-b U). (See the formula page.) One finds that the probability of being in a volume dpdr in phase space is proportional to exp(-b E) where E is the classical energy E= p²/2m+V(R). To summarize, if a system is allowed to exchange energy and momentum with outside systems its distribution will equal the canonical distribution. Quantum mechanics does introduce a new feature, an N! coming from the fact the particles are often indistinguishable. Usually this feature drops out classically.

The momentum part of the Maxwell-Boltzmann distribution (the Maxwell distribution) is just a "normal" Gaussian distribution. It is completely uncorrelated with the position part. If you know something about the positions you have no knowledge about the momentum (and vice-versa) in thermal equilibrium. The average kinetic energy is exactly 3/2k_BT (equipartition).

After doing the momentum integrals, we are left with the configuration distribution and partition function.

Time Average versus Ensemble Average

In experiments one typically considers a large number of successive times (t₁ .... t_M) to make measurements, each of which reveals the current state of the system. Then <A> = S A_iP_i where P_i = n_i/M and n_i is the number of times the systems is in state i out of M total. This is a time average over a single system because the value of n_i were obtained by successive observations. Now consider a classical simulation for a fixed number of particles and a given potential energy function V(R) isolated from any heat bath. Newton tells us the system will evolve according to F = -grad(V)= ma. We assign initial positions and velocities (implying an initial total energy and momentum). If we let it evolve what will happen? Will it go to the canonical distribution?

First, a few constants of motion maybe be known: energy, momentum, number of particles, etc. These will not change but in a system of more than 2 particles the number of degrees of freedom is greater than the number of constants of motion. Gibbs introduced the artifice of a large number of replicas of the system (i.e., the ensemble) and imagined them to trace out paths on the energy surface. Basically, we could follow A(t) for one system for infinite time and average, or just consider the value of A for each member of the ensemble and average. They should be the same thing. So Gibbs replaced <A>_time with <A>_ens. Gibbs adopted the view from the outset that this was only a device to calculate the probable behavior of a system.

In other words, instead of starting from a single initial condition, we start from the canonical distribution of positions and momentum. By Louiville Theorem, it is easy to show that this distribution is preserved. What we are doing in Molecular Dynamics is assuming we can interchange the average over initial condition with the average over time. In an MD simulation, t_MDsteps cannot go over infinite time, but hopefully an average of long finite t_MDsteps is satisfactory. Thus, <A>_time = (1/MDsteps) S MDsteps A(t)

Gibbs clearly recognized the relation to entropy to probability and observed the most probable situation (equilibrium) corresponds to maximum entropy subject to given constraints.

The Ergodic Hypothesis

Is <A>_time=<A>_ens really true, which is what Boltzman wanted to prove?

It is true if <A>= < <A>_ens >_time = < <A>_time >_ens = <A>_time. Equality one is true if ensemble is stationary. For equality two, it is assumed that interchanging averages should not matter. The third equality is only true if system is ERGODIC.

ERGODIC: [adjective] from Greek, erg + hodos (energy path), leading to the German construction "ergodenhypothese" (hypothesis on the path of energy). The Ergodic Hypothesis of Boltzmann (also called the continuity of path by Maxwell) is simply that a phase point for any isolated system passes in succession through every point compatible with the energy of the system before finally returning to its original position in phase space. This journey takes a Poincare cycle.

Ergodicity of a system has been the fundamental assumption of classical statistical mechanics, and it is usually assumed in simulations as well. If the system has ergodic or chaotic behavior it will eventually sample all the phase space it can subject to the conserved numbers. There are a number of related concepts:

The system, no matter how it is prepared, relaxes on a reasonable time scale towards a statistical equilibrium. The definition of equilibrium is that all macroscopic variables are constant in time.
This equilibrium should be characterized by the micro-canonical probability: the integral over the portion of phase space with the given constants of motion. The difference between the micro-canonical distribution and the canonical distribution is 1/N.
This implies there are no hidden integrals of motion, and that time correlations decay.
A typical orbit wanders irregularly thoughout the whole energy surface.
Two orbits, starting out close together are rapidly separated. So that there is a sensitive dependence on initial conditions.

Actually physical systems and systems being simulated are often not ergodic.

Non-Ergodic Behavior

If the system is non-ergodic; it gets stuck or very slowly decays to equilibrium. This was discovered by Fermi, Pasta and Ulam (FPU) in 1954 on the Los Alamos MANIAC I in perhaps the most profound computer simulation discovery in the early years of computers. These notes are from G. Benettin in MD Simulations of Statistical Mechanical Systems. The FPU Hamiltonian is of N equally massive particles connected by anharmonic springs:

V(x) = k/2 x² + z x³

For z = 0, one has decoupled harmonic oscillators with well-known normal modes. For non-zero z, FPU put all the energy into one normal mode and watched the energy spread into the other normal modes. Equipartition says that each mode should get equal energy (at least if you wait long enough). Instead they found:

Let us say here that the results of our computations were, from the beginning, surprising us. Instead of a continuous flow of energy from the first mode to the higher modes, all of the problems show an entirely different behavior. ... Instead of a gradual increase of all the higher modes, the energy is exchanged, essentially, among only a certain few. It is, therefore, very hard to observe the rate of "thermalization" or mixing in our problem, and this was the initial purpose of the calculation.

Collected papers of E. Fermi, Vol II, U. of Chicago Press, p 978, 1966.

For higher values of z, the system does approach equipartition after long enough. Thus, whether the system reaches equipartition depended upon its initial conditions. In many respects, this was a precursor to the "chaos" ideas of recent years.

In the intervening years it has been established that non-ergodic behavior is characteristic of most systems at low enough temperature. It is not a small system or one-dimensional effect, but a low temperature effect. For this reason it is not all that relevant to condensed matter physics since quantum mechanics takes over from classical mechanics at low temperatures. But is something one should be aware of since the temperatures are not all that low. For example, 64 argon atoms below 5K (a good harmonic solid) will never reach equilibrium but stay with whatever phonons they start with. Also celestial mechanics is happy with ordered dynamics. They provide explanations for planetary, ring systems and spiral galaxies.

There are at least 3 different concepts related to ergodicity that characterize a particular system and trajectory:

Integrable system. In these systems the number of constants of motion equal the number of degrees of freedom. They are essentially analytically solvable. A good example is a collection of uncoupled harmonic oscillators. While common in elementary textbooks, they are very rare in nature. Integrable systems can be either periodic or quasiperiodic.
Ergodic trajectory. A trajectory that goes everywhere that is accessible given its constants of motion (such as energy). The KAM theorem states that if an integrable system is subject to a small perturbation the trajectories always stay close to the non-perturbed trajectories and are hence non-ergodic. The FPU system is of this kind. Another good example is the solar system. What is usually meant by ergodic is that the time average over a typical trajectory equals the ensemble average.
Mixing system. Mixing trajectories are sensitive to their initial conditions. Errors (such as contact with a heat bath or numerical errors from the computer precision) grow. In a K-flow system (after Kolmogorov) the perturbed trajectory diverges exponentially in time. The rate is called the Lyapunov exponent. This is the most common situation for many-body systems not at very low temperatures.

On the other hand, in statistical mechanics we expect our system to converge to equilibrium. Whether it does or not cannot usually be proven mathematically but must be determined "experimentally". In general we don't know if its ergodic or if it is stuck for a long-time (on our scale). At low temperature very complicated things can happen. It is up to you to decide if it is in fact ergodic. Non-ergodic behavior is both good and bad. On the one hand we are never sure if we have reached true equilibrium (nature has much longer time scales). On the other hand simulation can be used to study metastable states that occur in nature such as the freezing of a super-cooled liquid or a glass. Another example where metastable states may be physically important is in the folding of a protein.

Contact with the outside world

Often it is good to introduce some contact with a thermal bath. (Langevin equation) Normal laboratory systems are in contact with the outside world. How else can we account for flux of particles, energy and momentum coming from the outside? The boundary conditions depend critically on physics. There is a whole continuum of algorithms that can all sample the same canonical distribution, distinguished from each other by how much randomness is allowed.

Molecular Dynamics (no randomness)
Langevin Dynamics (heat bath gives accelerations)
Brownian Dynamics (heat bath determines velocities)
Forced-bias or Smart Monte Carlo (random walk biased by force)
Classic Metropolis Monte Carlo ( unbiased random walk)

Once one adds any randomness, ergodicity and convergence to the canonical distribution usually follows. Whether any particular run (of finite length) has in fact converged remains an "experimental" question.

Statistical Errors.

Now I am going to review a basic idea that is central to simulations: how to estimate the errors of a calculation. Simulation without error bars are only suggestive. Suppose we are trying to estimate the mean value of some quantity, say A(t) where t refers to the simulation 'time.' 'A' is some property we measure such as the energy. We want to determine 'A' in thermal equilibrium. We denote the 'exact' value < A >. (What does exact mean? The integral over the exact distribution function.) The simulation is started and there is an initial transient as the system comes into equilibrium.

First let me define a simulation. A simulation has some "state" variables, which we denote by S. In classical mechanics these
are the positions and velocities of the particles. In an Ising model they are the spins of the particles. We start the simulation at
some initial state S0. We have some process which modifies the state to produce a new state so that we can write
Sn+1=T(Sn). We will call the iteration index "time". It is in quotes because it may not refer to real time but a fictitious time or
pseudo-time, sometimes called Monte Carlo time.

The iteration function T(S) can either be deterministic (like Newton's equations of motion called Molecular Dynamics) or
have a random element to it (called Monte Carlo). In the random case, we say that the new state is sampled from the
distribution T(S). What we discuss in this lecture and in much of the course, is the situation where the "dynamics" is ergodic,
which we define roughly to mean that after a certain time one loses memory of the initial state, S0, except possibly for certain
conserved quantities like energy, momentum etc. This implies there is a correlation or renewal time. For time differences
much greater than the correlation time, all properties of the system not explictly conserved are unpredictable as if they were
randomly sampled from some distribution. Ergodicity is often easy to prove for the random transition but usually difficult for
the deterministic simulation. We will discuss ergodicity further in the next lecture.

Assuming the system is ergodic one can discuss the state of the system as a probability distribution: An(S). (it is the
distribution of outcomes that you get if you take a distribution of initial states.) At large time this will approach a unique
distribution, the equilibrium state A*(S). In classical statistical mechanics this is the Boltzmann distribution.

The main goal of most simulations is the calculation of a set of properties in equilibrium. The most common example is the
internal energy E = <e(S)> where the average means over F*(S). In a simulation we sample e(Sn) for n<T where T is the
total length of the run. The curve of e versus T we call the trace. Calculating an estimate of E is easy: just take an average
over the run. The only "trick" is to be sure to throw away the part of the trace that too much influenced by the initial
conditions. This is called the transient, equilibration portion or warm-up. (Question: if you have a big spike at the
beginning of your trace and you neglect to throw it out of the average, how long does it take before the influence
of the spike is less than the statistical error? That is, how does the estimated mean depend on the propertiess of
the warm up portion?)

The next order of business is to figure how accurate is the estimate of the exact value: namely the estimated error in the
estimated mean commonly called the error bar. Simulation results without error bars are only suggestive. Without error bars
one has no idea of its significance. All respectable papers in the literature should contain estimates of any properties that are
important to the conclusion of the paper. Many mistakes are
made in estimating errors. You should understand both the equations and get an intuitive feel for how to estimate errors.
(“eye ball” method) Although we have prepared a code to automatically estimate errors, you should know the formulas and
what they mean. On the exam you will have to do it without DataSpork. Often one hears the complaint that one cannot
estimate errors because the quantity being measure is a very complicated function of the state. However, it is always possible
to get a rough idea of the error bar which is often all that is needed, by simply rerunning the entire code a few times with
different initial conditions and looking at the spread of results. This is called the "eyeball" method. While there are many ways
for errors to be underestimated, the eyeball method will not seriously overestimate the errors.

The fundamental theorem on which error estimates is based is the central limit theorem due to Gauss. It says that the
average of any statistical processes will be "normally" or have a Gaussian distribution. Further the error bar will equal square
root of the variance of the distribution divided by the number of uncorrelated steps. Later we will discuss in detail the
conditions under which this result holds. The correlation time of sequence is the number of steps it takes for a fluctuation to
disappear.

There is a terminology which we use to describe the process of estimating the <A> and estimating the error of our estimate.
You need to learn what all these terms mean so you can communicate with your fellow scientists about statistical
distributions. We have developed a JAVA Analysis Tool, DataSpork, which does the computations for you discussed here.
In the first homework you have to learn how to use it and what the results mean.

The symbols are defined on the fomula page.

Trace of A(t): The plot of A(t) versus t. You can learn a lot by looking at the trace. What are the trends you observe? What is the correlation between nearby values of t? Are there any spikes or outliers? These could represent a real physical effect or could be the result of a bug and should be understood.
Equilibration time. How long does it take to settle to a plateau if it does at all? Usually the equilibration period is thrown out so as to estimate the mean more accurately. It is very important that the plateau region be much longer than the equilibration period to be sure that A(t) has really settled down.
Histogram of values of A ( P(A) ). The distribution of values of A. Is it Gaussian? Is it symmetric about the mode? Does it have tails? Is it bimodal? The first two moments are very important:

Mean of A (a). The estimate of the mean value is simply the average over the equilibrated data.
Variance of A ( v ). The mean squared deviation of A(t) about the estimated mean.

An estimate is our best guess of a value. Normally we do not know the exact value of a quantity, thus we must estimate it from the finite sample. Thus we have an estimate of the mean, an estimate of the variance, etc. Sometimes there are different formulas for taking the same trajectory and estimating a property; these are call different estimators. For the mean, the equally weighted sum is the best (has the least error).
Autocorrelation of A (C(t)). The correlation of neighboring fluctuations. Note the following properties of C: C(0)=1, C(t)=C(-t) and at large time C(t) approaches zero. C(t) is needed to estimate the error of our estimate of the mean. If data points are highly correlated our estimate of the mean is poor.
Correlation time (k ). The integrated autocorrelation function. The area under the autocorrelation curve measures how bad the correlation is. Danger you have to cut off the integral at large times since the noise never quits.
Effective number of points (N _effective). The total number of points (minus the equilibration period) divided by the correlation time. Example: if you have 1000 data points but the correlation time is 5 steps, then you only have 200 independent data points. N_effective is what goes is the estimator for the error of the mean.
The (estimated) error of the (estimated) mean value (s ). This is what we want to quote as an error bar! The standard deviation is the error of a Gaussian (normal) distribution. Why do we care about Gaussian distributions? According to the central limit theorem, if the variance is finite and the correlation time is finite, the estimated mean of any simulation will be normally distributed. Hence we can use the rule that 68% of the time the estimate mean will be within one standard deviation of the true mean, 90% of the time within 2 s 's and so forth.
Blocking analysis This is another equivalent procedure of correcting for autocorrelation. Sections of the run are added together to make blocks; large enough blocks so that successive ones are uncorrelated. Then the effective number of points equals the number of blocks. (see AT pgs 192-193).
Cumulative average. The running average from the beginning of the simulation (or after the equilibration step). This is a bad way of looking at the data since the data has to be pathological to show up problems in the cumulative average.
Efficiency. The statistical errors always eventually decrease as 1/Ö (run time). However the coefficient out front is partially under our control. [efficiency = 1/(CPU time * error ²)] This is what we call the efficiency of the algorithm. We will discuss this later. The efficiency depends on how fast your computer is, how well you program the algorithm, etc. Alternatively, one measures the "stepwise efficiency" =1/(number_steps * error ²)] however with this one cannot compare the efficiencies of two different algorithms that use different amount of time/step.

See also the detailed page pdf formulas.

Molecular Dynamics

By Molecular Dynamics is meant the evolution of a classical equation by Newton's equation of motion. See AT pgs 1-4 for a brief history of the development of MD. In this lecture we discuss how to numerically solve F=ma .While as in nature time advances continuously, in MD time is discrete. Normally there is a fixed increment of time is denoted h throughout this lecture.

MD is the solution of : md²R/dt² = - F(R)

where F(R) = - Ñ V(R).

Potential Cutoff

Energy (E= T+V) conservation is very important. It is the basis for thermodynamics. For numerical method to exhibit energy conservation the potential energy must be differentiable. This is often a problem at the edge of the box or when the potential is "cutoff" a common beginner's mistake. The cutoff must be done continuously by a procedure such as

V_c (r) = V(r) - V(r_c) where r_cis the cutoff radius.

Please look at the code fragment to see how the simplest Lennard-Jones force calculation is programmed.

Integrators

A & T pgs. 71-84

Criteria for selecting an integrator:

simplicity (How long does it take to write and debug?)
efficiency (How fast to advance a given system?)
stability (what is the long-term energy conservation?)
reliability (Can it handle a variety of temperatures, densities, potentials?)
vectorization (What is the efficiency on special computer hardware?)

Some additional questions:

Which algorithm will enable me to answer some physical question as quickly as possible? (i.e. minimize the scientist's time assuming adequate computational resources).
Which algorithm is most efficient? That is which algorithm will calculate some physical quantity to a given accuracy in the smallest amount of CPU time or the least memory (on a given machine)? This is the situation for a mature problem and for writing a proposal to get computer time. You must justify that you are using the machine as efficiently as possible.
Have we balanced statistical errors (going as h^-1/2 ) with time step errors which go as h^p where p=2,3...? It doesn't help to have an extremely accurate trajectory if the algorithm is so slow that it cannot sample enough of phase space.

Characteristics of simulations which affect choice of estimator.

Potentials are highly non-linear with discontinuous higher derivatives either at the origin or at the box edge. Harmonic oscillator is not a good test problem!
Small changes in accuracy lead to totally different trajectories. (the mixing or ergodic property)
We need low accuracy because the potentials are not very realistic. Universality saves us: a badly integrated system is probably similar to our original system. This is not true in the field of non-linear dynamics or, for example, in studying the solar system (we know gravity is correct.)
CPU time is totally dominated by the calculation of forces.
Memory considerations are not too important these days (but beware cache misses.)
Energy conservation is important. Energy stability is roughly equivalent to time-reversal invariance. A good test for your code is whether the algorithm can retrace its steps. The energy surface becomes a little fuzzy if energy is not conserved. Rule of thumb: allow 0.01kT fluctuation in the total energy.
Other symmetries may be important, momentum etc.

The nearly universal choice for an integrator is the Verlet algorithm (reference Verlet .

Derivation of the Verlet algorithm.

Let us make a Taylor solution of trajectory about the current position:

r(t+h) = r(t) + v(t) h + 1/2 a(t) h² + b(t) h³ + O(h⁴)

Substitute in for -t:

r(t-h) = r(t) - v(t) h + 1/2 a(t) h² - b(t) h³ + O(h⁴)

Add the two expansions:

r(t+h) = 2 r(t) - r(t-h) + a(t) h² + O(h⁴)

Velocities are eliminated. One can estimate them as:

v(t) = (r(t+h) - r(t-h))/(2h)  + O(h²)

Time reversal invariance and momentum conservation are built in. This implies that the energy has no drift. Probably the dynamics generated by the Verlet algorithm has a constant of motion which is nearly but not exactly equal to the total energy.

Velocity Verlet is algebraically equivalent but avoids taking differences between large numbers.

r(t+h) = r(t) + v(t) h + 1/2 a(t) h²v(t+h) = v(t) + 1/2 (a(t) + a(t+h))h

Higher order algorithms are available (Gear etc.) These are useful only when high accuracy is needed and the potential and several derivatives are smooth.

The graph below from Berendsen and van Gunsteren (in MD simulation of Statistical-Mechanical Systems, 1986) shows a study of energy conservation versus time step for the Verlet algorithm and higher order Gear algorithms on a protein. If is seen that for large time steps the Verlet algorithm is considerably more stable. Higher order algorithms have only a the marginal increase in efficiency and actually get worse beyond 7^th order.

The following quote is from their article:

How accurate a simulation has to be in order to provide reliable statistical results is still a matter of debate. The purpose is never to produce reliable trajectories. Whatever accuracy the forces and the algorithms have, significant deviations from the exact solution will inevitably occur. Also in the real physical world individual trajectories have no meaning: in a quantum mechanical sense they do not exist and even classically they become unpredictable in the long run due to the nonisolated character of any real system. So what counts are statistical averages. It seems that very little systematic evaluation of algorithms has been done with this in mind. We have the impression that a noise as high as 10% of the kinetic energy fluctuation is still acceptable, although the accuracy of fluctuations may not be sufficient to obtain thermodynamic date from them. With such a level of inaccuracy the Verlet of leap frog algorithm is always to be preferred.

David Ceperley, University of Illinois May 2001