The PGN comparer of the Chess Suite has already been mentioned here and there by chess historian Tim Harding and maybe someone is interested in what he wrote about it.
In the New In Chess Magazine 7/2020 he published an article on his book Steinitz in London (McFarland, 2020) and on page 45 he wrote:
"My book indicates numerous discrepancies between versions of Steinitz games. Many of these were detected using a suite of chess software developed by Thomas
Niessen of Aachen (see https://thomastonk.jimdofree.com). Niessen has previously pointed out that the scores of about 14 per cent of the games played in the Hastings 1895
tournament differ between the official English language tournament book, edited by Horace Cheshire, and Emil Schallopp’s German-language book, which is often the more
reliable."
On his web site he has witten the article "New series: fix your database". There he reports:
"Since 2020 we have been using Thomas Niessen's Chess Suite software, which compares PGN databases and identifies previously unnoticed discrepancies in records of the same game. (The Chess
Suite has other features but this is the only one I have used.)
The software produces a report on the databases compared. Then we check these findings against the primary printed sources, although sometimes they disagree about how games really went, which
requires more research and sometimes the exercise of judgment to decide between alternatives."
He also started the thread "Game score errors and discrepancies" at the English Chess Forum.
Thanks to Tim Harding's reports, I received a call from a well-known chess journalist. During and especially after the conversation I feared that the PGN comparisons should be used to denounce errors and omissions of the database providers. As a person who has collected as many different databases as possible over the years and has perhaps done the most database comparisons, I would like to share part of my perspective here.
Today's PGN Comparer was definitely written to find mistakes in historical game-scores. Comparing two or more databases for a player or tournament quickly provides a list of inconsistencies. Except in rare cases, any inconsistency also means an error in one of the databases. But that doesn't imply that the digitization by the database producer is responsible. For example, if for one database a German or English language source is used and for another database a Russian source, then the sources can make the difference. And they usually do.
Remember how many different algebraic and descriptive chess notations were used in the past, and anyone who has ever recorded a game from an old source knows how quickly mistakes can occur.
It is the number of different publications of a game that determines the probability of different game-scores! This can be proven
statistically. If you
compare the games of Lasker, Capablanca and Alekhine as given by the most reliable databases ten years ago, you will find high percentages of different game-scores, while already with normal
grandmasters, whose games were published less frequently, these percentages are falling rapidly.
Is it the job of the database producers to determine the historically correct version of each game? After all, some of them are selling a product. Nevertheless I think it is unrealistic to expect this. It would already help a lot if they were more aware of the problem. The often criticized chessgames.com database is doing well here and has been giving users the opportunity to report errors for years. As a result, it achieves the best quality where inspections have taken place.
The PGN comparer was actually written as a tool to end up creating a new type of database. In this intended database, the games and the sources (i.e. chess magazines, columns,
books, etc.) should be linked to one another, and if necessary with different versions of moves or other data. This would make such a database a
meta-index for all historical chess sources, and for any given game, that database should yield a list of all sources where that game is published. This project would
need the support of several people and I can only be one of them.
---
Two more things, one limitation and one fun fact.
1. What about the games of Kasparov or Carlsen? Well, what I said above applies in part to Kasparov, but not to Carlsen.
Since the second half of the 1990s, the number of differing game-scores has declined drastically. In our days this has to do with the electronic recording of the games I assume. The fact that this effect became relevant much earlier should have something to do with the founding and success of Mark Crowther's The Week in Chess.
2. Above I spoke about reliable databases twice. Are there really bad databases? Yes, there are. You may be familiar with Edward Winter's article
"A Chess Database".
I was only able to get hold of this database in the 1998 version, and I can tell you that it is downright fun to rush the PGN Comparer onto it.
Back to serious statements I can add that the comparison result provide some evidence that this database was probably merged from exactly six others, with little post-processing, if
any.
---
Addendum: Since this page contains time references such as "10 years ago," it would have been useful to date it when writing. Now a date can only be determined approximately: the text was written sometime in the second half of 2021.