I was happy to see that my earlier post on online cheating sparked a lively debate on L19. So far, several different measures for combatting online cheating have been proposed; personally I took a liking to the suggestion of proxy tournaments, described here in Marcel Grünauer’s post.
In unproctored tournament settings, it is always possible for a player to find a way to cheat:
When combatting cheating, it seems to me that the most important question is if the prevention mechanism is appropriate to the scenario at hand.
Besides the chosen cheating prevention mechanism, there should of course be a team of experienced referees who judge and analyse alleged cheating cases.
To my surprise, the general opinion on L19 seemed to be that it is not possible to catch cheaters from a small dataset, such as just one or two games. While smart cheaters are indeed hard to catch, in my experience a majority of cheating cases are fairly obvious. Let me introduce you to three anonymous examples.
In the first case, I was approached on ogs by a player with a Japanese username that translated to ‘god of go’. The player put on a surprisingly humble front, asking for a teaching game with a professional player. When I checked their profile, I noticed that their non-verbose username was actually one famously used by a Japanese top player on Asian servers.
I was getting suspicious, so I checked the player’s game list and noticed they are playing an opponent whom I knew to be a strong Japanese professional. I then checked that game and saw that the professional was losing badly; and the ‘god of go’ had even gone so far as to say ‘your play is bad, there is no way you can win’. I mentally gave them a 99.99% prior probability of cheating, on grounds of finding it unconceivable that a top-level human player would behave so badly.
Next I fired up my KataGo and checked the content of the game. On Lizzie, my current gui of choice, the ai’s #1 move candidate shows up on the board as a blue circle. Of the 50 moves our humble cheater played, 46 moves hit the blue circle, and two of the moves that didn’t were White’s first and second move.
Personally I think it would be more likely for me to win the jackpot in the Finnish national lottery than hit 46 out of 50 blues in a game; but I cannot deny the ‘possibility’ that there could be a stronger human than me who is more familiar with the ai than me and also has spare time to play around on ogs. Even then, my most generous estimate for the odds of this came to around 0.01% – and that’s before taking the prior 0.01% into account. Without thinking too much about it, I reported this ‘god of go’ to ogs admins.
The second case was even clearer, if possible. This time I was not approached by the god of go, but by Sai himself from Hikaru no Go (the exact username was not ‘Sai’, but directly referred to him). Sai, too, wanted a teaching game with me.
Sensing a pattern, I checked Sai’s list of games and saw they had a correspondence game going on with ‘god of go’. Analysing that game, I saw that Sai was actually winning even though the opponent was again obviously cheating – most of both players’ moves were in the blue. Let’s get this straight: here was a player who was beating an opponent probably three stones stronger than me, asking me for a teaching game.
This time, as an experiment, I first tried to get Sai to admit their cheating, but only got the all-too-common excuse along the lines of ‘I just studied with the ai really hard these past six months’. Finally, I just pressed the report button.
The first two cases were easy to figure out because of their context; but what about if we only have a game record or few with no context? In the third case, I’m looking at the first game record of the four uploaded in this post on L19 in a related thread.
All times are in Helsinki time (eet with summer time).
Check the current review credits balance
Public lectures on Twitch every 2nd and 4th Saturday of the month at 1 pm.
Jeff and Mikko stream on Twitch on Fridays at 6 pm.
I tried to replicate the method that Antti applied to analyze the last game of this post to one of my own games that I recently played where I had a nice flow and I know I didn't cheat :-)
These are the statistic (perhaps skewed towards more common moves since they are my own)
46 moves:
26 blue
11 green
3 yellow
5 red
1 no color
14 obvious
28 common
3 conceivable
1 surprising
Of obvious: All blue
Of common: 12 blue, 11 green,3 common, 2 yellow
Of conceivable: 2 red, 1 no color
Of surprising: 1 red
Now, I'm not 5 kyu, and this game had 40 minutes of main time. But the stats are not too much different from what was listed in the post.
Would people suspect me of using AI if they saw this game?
If it's not that uncommon for me to get a lot of blues, and that is the case for a group of people, how unlikely is it that someone happens to play 4 games with high occurrence of blues?
Be very careful with this kind of analysis, especially when you include opening moves and typical joseki patterns. You may be able to detect that a 5 kyu plays 'above their level', but who is to say that some 5 kyus are not well versed in opening and AI patterns? At what rank is it 'allowed' to play like that then?
And can you distinguish between a player who is sandbagging and one who is using AI?
I will guess that when you are sandbagging, it can become easier to play 'many blues' because your opponent does not challenge you at your level. You know the punishments to his typical mistakes.
Thanks for trying out the analysis! Could I ask you to also post the game so I can have a look at it? You can post game records on the Forum page.
I definitely agree that the decision shouldn’t rest on the ‘colour count’ only – at least, unless the ratio of blues is starting to be in the range of 90% or more.
As you say, there are types of blue moves that should basically be left out from the count. E.g., standard opening moves, joseki moves, and obvious moves such as saving stones from atari. Besides using these kinds of moves (including new AI-style middle-game joseki such as shoulder hits and attachments), I have found it difficult to ‘play like an AI’, as the AI constantly chooses its moves in a whole-board context. Often, the AI will switch elsewhere in what it seems is like the middle of a sequence, or totally twist a common sequence into a new direction. Unfortunately, when the non-qualifying moves are removed from the count, the amount of interesting data may be running low.
One more important analysis method, which I actually didn’t use in this post, is to proportion the types of moves by the score of the game. Smart cheaters will only use the AI when it’s necessary, i.e., when the game is relatively close; once the cheater has a safe lead, they will start playing suboptimally to hide their tracks. The only method I have so far thought of for catching this is to search for moves such as 61 in this game, which clearly seem to be by a different player than the rest of the content – and since this data set is small, the proportion of ‘noise’ will grow even bigger (who can prove, for instance, that 61 wasn’t a misclick?).
As I wrote in the post, it would be hard for me to convict the black player of cheating just from this one game. Luckily, there were three other games that were just as suspicious, which were enough to convince me. If the three other games didn’t exist and I really ‘had’ to make a judgement, I would try to arrange for a setting where the black player could prove their playing strength (e.g., have them play a game during a video call, video preferably taken from the side, with screen sharing added).
I uploaded the game - perhaps the move I labeled as surprising should be 'conceivable'. But it did surprise my opponent and the person who reviewed the game afterwards.
Also some moves were ambigious between green/blue and green/yellow, since the evaluation changes depending on the amount of playouts. I have a strong pc and didn't analyze based on a fixed number of playouts. I.e. a move could be blue with say 2000 playouts and then become green at 10.000 or vice versa.
Thank you!
I tried to replicate the same kind of analysis on your game as I did on this ‘alleged cheater’. I am of course not saying your analysis was no good, but instead made my analysis as a comparison. Also, since we probably have different ideas on what constitutes ‘obvious’ or ‘surprising’ (and because our AIs are different), this my analysis is probably more in line with the one I made in the post.
Here is what I got for your game:
Blue moves:: 31
Green moves: 4
Yellow moves: 1
Red moves: 10
Obvious moves: 10
Common moves: 32
Conceivable moves: 1 (move 82)
Surprising moves: 3 (moves 74, 78, and 84)
Of obvious: all blue (10)
Of common: 21 blue, 4 green, 1 yellow, 6 red
Of conceivable: all red (1)
Of surprising: all red (3)
While the blue counts are the same, they are all for obvious or common moves; most of the content up to move 92 was set sequences. Your red count, on the other hand, is somewhat bigger. Still, just from the data it is hard to tell the two games apart.
While writing this, I do have a general feeling that your game was more ‘human-like’ than that of black in the post, but this could very well just be me being biased. Also, I find it hard to articulate why I feel so. It may be that I am using some ‘clear human moves’ as an anchor; for instance 68, 74, 78, and 84. An AI would not pick any of these moves, and I also wouldn’t expect an AI-using cheater to pick 68 or 78. Also, by now the data set is growing increasingly small, so the trustworthiness of the whole analysis is starting to be at stake.
If anything, we can admire the difference in what we count as 'surprising' 'conceivable' etc.
I am suprised you categorize 84 as surprising. While you may have expected something else, to reply with this pull back nobi (don't remember the term from your post) after hane should be at least conceivable shapewise. Guess it comes down to the interpretation.
I was finding it difficult to choose between red and yellow at times.
'Valid move candidate' - is that valid by human standard or valid because katago looked at it?
Also should we interpret uncoloured as what comes after red in the hierarchy?
Blue+green differs only by 1 in the two examples, If we think about the conclusion that 'alleged cheater' gives rise to suspicion and my game does not, then we have to realize that's it's not the statistics of similarity that decide whether or not we believe a player was cheating (atleast not in this magnitude). Rather subjective nuances/gut feeling.
One thing I would look for when trying to assert a cheater, apart from overwhelming similarity, is how many 'surprising' or 'difficult' blue moves appeared in the game. When the opponent keeps hitting 'blue' spots that you wouldn't even think of, then something might be up.
I watched Blackie stream a bit this weekend, and he played some games late at night, where most likely all his opponents were using AI. In those games more often that not you saw reactions like "Oh here? He can do that?" "Ah right! I'm in trouble" i.e. where he was surprised by the sharpness of his opponent.
You’re right in that 84 should probably be classified as a ‘common’ move. I was definitely surprised when I saw it, however: the previous attachment of 82 seemed to me to imply that White would cut if black responded with a hane, but White instead drew back. I seem to be looking at White’s play as a plan or a sequence rather than one move; so, when the follow-up isn’t what to me seems logical, that makes the play as a whole seem inconsistent (hence my being surprised).
‘Valid move candidate’ was my interpretation. It’s not the AI’s best move or even the next move, but yellow moves are usually moves that don’t lose too much in terms of winrate. If, for example, a student played a yellow move in a game I was commenting, I would usually not criticise it.
As you say, ‘surprising good moves’ are one signal of a possible cheater. I have tried to look into this further in my anti-cheating research, and I have found a few positions where the AI basically only considers one move which doesn’t seem obvious at all to human eyes. Unfortunately, such moves seem fairly rare, so it can be hard to dedicate a whole anti-cheating program to them.
Perhaps a subgroup of ‘surprising good moves’ are ‘surprising not-bad moves’ – meaning moves that a strong human player usually couldn’t think of playing, yet which don’t significantly affect the winrate negatively. So far I have seen several cases of such moves that seemed to suggest a cheater. The obvious issue is in judging what ‘a human player cannot think of playing’.
Hello antti, I am the original poster of the debate on L19. It's great that we are talking about the very real online cheating issue which is likely to get worse. Communication is very helpful, but it's certainly not enough, action is needed too. The Corona Cup rules said "The main organizer is experienced enough to detect the cheating!" and "These rules will be applied very strictly". And then you as an organizer took no measures against cheating because "we figured we have to be extremely sure of a case – even a 99% subjective probability might not be good enough". See the contradiction? You were first assuring honest players that they would be protected from cheaters and then let them be cheated. That is not ok. I'm quite sure we both investigated one case related to Corona Cup and it was pretty obvious. I removed his KGS rank and then his teacher, a very strong player vouched for him so I gave it back. It was a mistake, meanwhile I learned from other cases that teachers will often wrongly vouch for their students. Hopefully we, regardless of being players, tournament organizers or server admins will quickly learn from our mistakes of letting cheaters cheat and start to take action appropriate to our roles.
Thanks for your comment, Adin!
As you say, we ‘didn’t take measures against cheating’ in the sense that we did not finally convict any of our suspects. I hope, however, that you aren’t thinking that we did not do anything at all; Lukas and I were constantly following the tournament results (and individual games, whenever feasible) for anomalies. A few cases were suspicious enough that we approached the players and questioned the fact that their level of play didn’t seem match their announced rank. As our evidence was only in terms of ‘likelihood of the player’s having cheated’, we of course gave the players a chance to defend themselves, as that way we could get a more complete idea of the case at hand.
The suspect you mentioned originally looked like a fairly clear case, but the suspect had provably streamed on Twitch at the time they were playing, and their teacher also gave us contextual information regarding how the suspect could have been able to play certain more complicated sequences. These two were at least weak evidence against the suspect’s having cheated, so we decided to establish the suspect’s playing strength by having them play a game online during a video call with screen sharing. This showed that the player’s real-life rank was not accurate, again working as evidence against cheating (i.e., increasing the likelihood of the player’s having played strongly by themselves).
While the above anti-cheating process was the most extensive one we did during the tournament, there were a few other cases that took us almost as much effort. After the experience of having worked at the tournament, I am now having second thoughts whether a ‘99% subjective probability’ is appropriate or not, but I do stand by our procedure and decisions during the Corona Cup. I still think that organisers have to be careful about convicting players for cheating, as in the current climate getting branded a cheater (especially unjustly) can significantly change a player’s whole go career – especially when a player is convicted under their real name, rather than an alias on a go server.
I was not aware of the extent of effort you took to investigate that case, it is really impressive. The problem is that we usually can't afford to do this. If online tournament organizers/admins need many hours including video conferences, playing games with suspects etc just for one case then it will never work. Even after all this effort one can't really know the truth for sure.
Instead we have to accept a small error rate. We are not "convicting" anyone, nobody is going to prison. We are simply saying to him that only in this online tournament his results have been so far removed from normal that we will not accept them. But he is free to play in any next live or online tournament. Or in case of Go servers that his account will have the rank disabled, but he is free to create another account and play again immediately.
Now obviously this is not to be taken lightly, the player reputation may suffer and the organizer reputation may also suffer if he is wrong. But this is something we must accept just because the alternative is much worse! That alternative is losing a lot of honest players like yourself. If I'm not wrong also there was at least one case of a Corona Cup honest player quitting the tournament because of this. So what is better, losing many honest players, or accepting that there's 1%-5% error rate and from those rare cases another rare minority would quit playing because they are offended? Doing the math 10% out of let's say 5% wrong conviction, means that out of 200 accused players (190 cheaters and 10 honest) exactly 1 honest player will not play online Go anymore. Keep in mind that probably there have not been 200 accused players in the whole world until now so statistically we are yet to lose that one honest player. But how many honest players did we lose already because like yourself they got tired of being cheated?
Indeed, me and Lukas also quickly found out that this our procedure was not very sustainable in the long run – and definitely impossible to apply on go servers outside of tournaments.
As you say, similarly to the chess world, probably the go world also has to accept that anti-cheating mechanisms cannot be perfect and may cause false positives from time to time. Right now many people (actually me included) may dislike this idea, but it doesn’t seem like there is going to be a better option available in the near future. Hopefully, when it has become more common that players get caught from and penalised for cheating, this will make people feel less strongly about it.