I tried to improve the genetic programming approach.
Player One uses a set of weightings giving different importance to various ways of evaluating a position. Then generate ten random sets of weightings around Player One's settings for Player Two to use. Get Player One to play twenty games against each of the sets of ten mutated values. If there was a clear winner then let Player One adopt those weightings and continue with ten new mutated sets based around them. If there was no clear winner then increase the variability slightly and try again with new mutations.
I left it running over night to see if it would improve dramatically.
It seems to have varied the weightings a lot but not really in one direction. Looking at the logs I can see one particular value would go up then down, repeatedly. This could mean there are different approaches to playing and that some of them have slight advantages over others, but not a significant advantage.
Or it could be like stone - paper - scissors played as if there was a best choice. Playing stone all the time leads your opponent to play paper, but if you adopt paper as the winning solution then they play scissors. When you play scissors they play stone and together you proceed around and around.
Not sure what to do about this, but I've added "Read a book on genetic algorithms" to my ever increasing to-do list along with "find out about alpha-beta pruning next time" (as recommended by El Walrus).