The PGN Comparer works asymmetrically

In normal life one is used to comparisons being symmetrical. That means it doesn't matter whether you compare A with B or B with A. But when sets of things are compared, things are no longer so simple.

When the PGN Comparer compares two files, for each game in the first file it searches for the most similar game from the second file. So, if the second file is not empty, every game of the first file has exactly one most similar game. However, not every game from the second file is a most similar game, and if it is, it can be the most similar game of several games.

This asymmetry usually does not have major consequences. But statistical considerations can go wrong. Let us assume that we compare two files with 100 games each, and let this be the PGN Comparer's information:

  • Number of games with identical versions: 70 (70%)
  • Number of games with differing versions: 20 (20%).
  • Number of remaining games per file: X, Y.

What numbers can we expect instead of X and Y? Well, X has to be 10, because 100-70-20=10. But Y does not have to be equal to 10! All we know is that Y is at least 10, and in fact it can be quite a bit larger. (Even Y=98 is possible!)

 

If your statistical data looks illogical, please check each individual file, i.e., apply the PGN Comparer with "Check a single file" to every file. Then the number of equal or very similar games in each file will explain the results.

 

If you use the PGN Comparer with "Check two files", it is up to you to determine which file is the first one and which is the second. If necessary, swap the order to see the comparison from both sides.

On the other hand, if you use the PGN Comparer with "Check any number of files", then there is no guaranteed rule which file takes the role of the first file in any of the pairwise comparisons.