The data sets here are synthetic and should not be considered a benchmark. Simulations are necessarily a simplification of reality and we used ours only to show that the error we describe can seriously affect widely-used alignment methods. The analysis also demonstrated that non-phylogenetic corrections for insertions implemented in some alignment methods worsen the error, but the relative performance of these aligners should not be considered a benchmark result. Although the simulations were aimed to be realistic (there are no obvious reasons why insertions and deletions would not be time-dependent processes and happen independently in different evolutionary lineages), the simulated sequences are not real biological sequences and it would be silly to optimize an aligner for these data sets.
The simulated data sets: 4X, 2X, close, intermediate and distant; and trees. The latest version of PRANK has option "-e" that reads in an existing alignment and together with option "-writeanc" can output infered insertion and deletion events.