Eivind Kjosbakken - freeCodeCamp.org

How to Fine-Tune EasyOCR with a Synthetic Dataset

Eivind Kjosbakken — Fri, 05 Jan 2024 17:48:17 +0000

OCR is a valuable tool that you can use to extract text from images. But the OCR you are using may not work as intended for your specific needs. In such situations, fine-tuning your OCR engine is the way to go.

In this tutorial, I will show you how to fine-tune EasyOCR, a free, open-source OCR engine that you can use with Python.

Prerequisites
How to Install Required Packages
How to Clone the Git repository
How to Get a Dataset
How to Generate your Synthetic Dataset
Convert the dataset to lmdb format
How to Retrieve a Pre-trained OCR Model
How to Run the fine-tuning
How to Run Inference with your Fine-tuned Model
A Qualitative Test of Performance
Quantitative Test of Performance
Conclusion

Prerequisites

Basic knowledge of Python.
Basic knowledge of how to use the terminal

How to Install Required Packages

First off, let's install the required pip packages. I recommend making a virtual environment for this, though it is not required.

Run the commands below, one line at a time:

pip install fire
pip install lmdb
pip install opencv-python
pip install natsort
pip install nltk

You also need to install PyTorch from this website (choose your specifications and copy the pip install command. The command below is for my specifications). You can choose either the GPU version or the CPU version. The difference is that running the fine-tuning process will be slower on the CPU.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

How to Clone the Git Repository

You''ll need a Git repository that will help you run the fine-tuning. Clone this Git repo with the command below:

git clone https://github.com/clovaai/deep-text-recognition-benchmark

The deep-text-recognition-benchmark Github repo will give us some useful files for fine-tuning the EasyOCR model. Note that some of the terminal commands used in this article were taken from the repository and then adapted to my needs, so the repository is worth a read.

I would like to add a note here that Clova AI on Git has a lot of good repositories that have been of immense help to me, so feel free to check out other interesting repositories that they have.

Another interesting repo they have is the Donut model repo, and I have written an article on fine-tuning the Donut model that you should check out.

How to Get a Dataset

Before you can fine-tune your OCR, you'll need a dataset. You can either download a dataset or make one yourself.

Since I want my OCR to be particularly good at scanning supermarket receipts, I will make a dataset of items you can find in the supermarket, but feel free to make a dataset from whatever data you need your OCR to be good at. For this section, I made use of this GitHub page.

If you want to learn how to generate your own dataset, you can go to the next section right away, but if you want a simpler solution then you can use one of the options below:

Option 1 – Use my dummy dataset:

If you want to have this step as simple as possible (recommended if you are just testing), you can download a dummy dataset. I have made and uploaded one to this Google Drive (download the whole folder).

Option 2 – Download a dataset

If you want a larger dataset, you can download a dataset from this Dropbox page by downloading the data_lmdb_release.zip file (note that it is a bit over 18GB in size).

How to Generate your Synthetic Dataset

If you want a cooler approach to creating your own dataset, you can follow along with this section. I originally wrote about it in this Medium article.

For this section, you should use a separate Python file.

The great thing about a synthetic dataset is that you don't need any labor-intensive labeling, as you are creating the images based on provided textual descriptions. This means that you have both the input to the model (the image) and the label (the text of the images), the two components needed to fine-tune an AI model.

Make synthetic images like this by following this section

Clone the Synthetic Generation Repo

First, you have to clone this synthetic data generation repository to be able to create synthetic data. To clone it, open a new folder, and run this command:

git clone https://github.com/Belval/TextRecognitionDataGenerator.git

This repository allows you to create images from a given text description. You will then have the dataset you need: images, and a txt file stating the text on the images (the label).

Create a File to Generate the Synthetic Data

Now create a new file called generate_synth_data.py, and add the code below to import the useful packages:

from trdg.generators import (
    GeneratorFromStrings,
)
from tqdm.auto import tqdm
import os
import pandas as pd
import numpy as np
import random

To run them, you need these pip installations (run one line at a time in the terminal). Note that a specific Pillow version is needed (you will get an error if you have the newest Pillow version):

pip install trdg
pip install pandas
pip install Pillow==9.5.0

Next, define some hyperparameters (set them to whatever values you prefer):

NUM_IMAGES_TO_SAVE = 10
NUM_PRICES_TO_GENERATE = 10000

Now you need a large dataset with words you want to have on the images you create. Since I want my OCR to be good at reading supermarket receipts, I used Openfoodfacts, which is a website that contains a lot of supermarket items.

To make it as simple as possible, you can use the CSV file on this Google Drive page (just download it and place it in your folder).

Note that you can make use of any other data, instead of using mine. If you want to use your own data, all you need is a list of strings, which you can feed into the generator to create images.

Here's how you can read the CSV file containing supermarket items:

# helper funcs and data to generate images
df = pd.read_csv("openfoodfacts_export_csv.csv", on_bad_lines='skip', sep='\t', low_memory=True)
df[["product_name_nb", "generic_name_nb", "brands"]]
all_words = df[["product_name_nb", "generic_name_nb", "brands"]].to_numpy().flatten()

Here I am loading in my own data, but the code will look different if you are using your own data.

Here's how you can filter the data:

# ignore np nan 
num_before = len(all_words)
all_words = [x for x in all_words if str(x) != 'nan']
after_nan_filter = len(all_words)
print("removed: ", num_before - after_nan_filter, "words because of nan values")
all_words = list(set(all_words))
print("Removed", len(all_words), "duplicates")
print("Current number of words: ", len(all_words))

Note that I am always printing the amount of words removed in the filtering process. This is good practice, as it lets you have a better overview of the size and quality of your dataset.

I also want to have a price on the images, so I am randomly generating some prices with the code below:

#randomly generate 2 digits between 0-99
number_strings = []
for i in range(len(all_words)*9//10): #90 percent of all words
 digits = np.random.randint(1, 100, 4)
 before_comma = f"{str(digits[0])}" #before comma is just given as 1 digit if 0-9
 after_comma = f"{str(digits[1])}" if len(str(digits[1])) == 2 else f"0{str(digits[1])}"
 number_string = f"{before_comma},{after_comma}"
 number_strings.append(number_string)

#then create 10 percent of the words with price between 100-999
for i in range(len(all_words)*1//10): #90 percent of all words
 before_comma = np.random.randint(100, 999, 1)
 after_comma = np.random.randint(1, 99, 1)
 after_comma = f"{str(after_comma[0])}" if len(str(after_comma[0])) == 2 else f"0{str(after_comma[0])}"
 number_string = f"{str(before_comma[0])},{str(after_comma)}"
 number_strings.append(number_string)

The code below randomly combines the supermarket items with the prices:

#now given word list and number list, get all combinations
all_combinations = []
for word in tqdm(all_words):
 for number in random.sample(number_strings, 20): #only need 20 prices per product for example
  for num_tabs in [1]:
   combined_string = word + "    "*num_tabs + number
   all_combinations.append(combined_string)

Use the repository you cloned earlier to create the images from the list of strings we have created:

#generate the images
generator = GeneratorFromStrings(
    random.sample(all_combinations, 10000),

    # uncomment the lines below for some image augmentation options
    # blur=6,
    # random_blur=True,
    # random_skew=True,
    # skewing_angle=20,
    # background_type=1,
    # text_color="red",
)

There are a lot of options for generating the data, which you can read more about here. Some examples are: changing the background, adding blur, and adding skewing. You can try this out by uncommenting some of the lines in the code snippet above.

Then save the images from the generator to a specific format:

# save images from generator
# if output folder doesnt exist, create it
if not os.path.exists('output'):
    os.makedirs('output')
#if labels.txt doesnt exist, create it
if not os.path.exists('output/labels.txt'):
    f = open("output/labels.txt", "w")
    f.close()

#open txt file
current_index = len(os.listdir('output')) - 1 #all images minus the labels file
f = open("output/labels.txt", "a")

for counter, (img, lbl) in tqdm(enumerate(generator), total = NUM_IMAGES_TO_SAVE):
    if (counter >= NUM_IMAGES_TO_SAVE):
        break
    # img.show()
    #save pillow image
    img.save(f'output/image{current_index}.png')
    f.write(f'image{current_index}.png {lbl}\n')
    current_index += 1
    # Do something with the pillow images here.
f.close()

Generate the Synthetic Data

You can run the generate_synth_data.py file you created with this command in the terminal:

python generate_synth_data.py

You should see an image similar to the one below (you may have a different text, in your output folder):

This image was synthetically generated

Your images will be organized in the order in the image below, where the .png files are your images, and the labels.txt file contains the text in each image. This allows you to use the dataset for fine-tuning.

The output folder structure from running the code above.

Congrats, you can now make your own synthetic dataset. Since you now have both an image and the text of that image in a labels.txt file, you can use this to fine-tune an OCR engine, which I will talk more about below.

How to Convert the Dataset to LMDB Format

LMDB stands for Lightning Memory-Mapped Database Manager and is essentially an encoding you can use for your dataset to train AI models.

You can read more about it on the LMDB docs. After you have created your dataset, you should have a folder with your images, and the labels for all the images (the text in the images) in a labels.txt file.

Your folder should look similar to the image below, and should be inside the deep-text-recognition folder:

How the folder for your dataset should look before converting to LMDB format

NOTE: Make sure you have at least 10 images in your folder. You may get an error when running the training script later in the tutorial if you have fewer images.

You have to make some changes in the create_lmdb_dataset.py file in the deep-text-recognition-benchmark folder:

Set the map_size variable to a lower value — I was getting a disk memory error with the previous value. I set the new value for map_size to 1073741824, as can be seen below:

# OLD LINE
# ...
env = lmdb.open(outputPath, map_size=1099511627776)
# ...

# NEW LINE 
# ...
env = lmdb.open(outputPath, map_size=1073741824) 
# ...

I also got an error with the utf encoding, so I removed the utf-8 encoding when opening the gtFile. The new line then looks like this:

# OLD LINE
# ...
with open(gtFile, 'r', encoding='utf-8') as data:
# ...

# NEW LINE
# ...
with open(gtFile, 'r') as data:
# ...

Lastly, I changed the way imagePath was read:

# OLD LINE
# ...
imagePath, label = datalist[i].strip('\n').split('\t')
# ...

# NEW LINES
# ...
imagePath, label = datalist[i].strip('\n').split('.png')
imagePath += '.png'
# ...

The create_lmdb_dataset.py file should look like this (code from this Git repo, with the changes above applied):

import fire
import os
import lmdb
import cv2

import numpy as np


def checkImageIsValid(imageBin):
    if imageBin is None:
        return False
    imageBuf = np.frombuffer(imageBin, dtype=np.uint8)
    img = cv2.imdecode(imageBuf, cv2.IMREAD_GRAYSCALE)
    imgH, imgW = img.shape[0], img.shape[1]
    if imgH * imgW == 0:
        return False
    return True


def writeCache(env, cache):
    with env.begin(write=True) as txn:
        for k, v in cache.items():
            txn.put(k, v)


def createDataset(inputPath, gtFile, outputPath, checkValid=True):
    """
    Create LMDB dataset for training and evaluation.
    ARGS:
        inputPath  : input folder path where starts imagePath
        outputPath : LMDB output path
        gtFile     : list of image path and label
        checkValid : if true, check the validity of every image
    """
    os.makedirs(outputPath, exist_ok=True)
    env = lmdb.open(outputPath, map_size=1073741824) #TODO Changed map size
    cache = {}
    cnt = 1

    with open(gtFile, 'r') as data: #TODO removed utf-8 encoding here since I have norwegian letters
        datalist = data.readlines()

    nSamples = len(datalist)
    print(nSamples)
    for i in range(nSamples):
        #TODO changed the way imagePath is found as well to match my usecase
        imagePath, label = datalist[i].strip('\n').split('.png')
        imagePath += '.png'

        # imagePath, label = datalist[i].strip('\n').split('\t')
        imagePath = os.path.join(inputPath, imagePath)

        # # only use alphanumeric data
        # if re.search('[^a-zA-Z0-9]', label):
        #     continue

        if not os.path.exists(imagePath):
            print('%s does not exist' % imagePath)
            continue
        with open(imagePath, 'rb') as f:
            imageBin = f.read()
        if checkValid:
            try:
                if not checkImageIsValid(imageBin):
                    print('%s is not a valid image' % imagePath)
                    continue
            except:
                print('error occured', i)
                with open(outputPath + '/error_image_log.txt', 'a') as log:
                    log.write('%s-th image data occured error\n' % str(i))
                continue

        imageKey = 'image-%09d'.encode() % cnt
        labelKey = 'label-%09d'.encode() % cnt
        cache[imageKey] = imageBin
        cache[labelKey] = label.encode()

        if cnt % 1000 == 0:
            writeCache(env, cache)
            cache = {}
            print('Written %d / %d' % (cnt, nSamples))
        cnt += 1
    nSamples = cnt-1
    cache['num-samples'.encode()] = str(nSamples).encode()
    writeCache(env, cache)
    print('Created dataset with %d samples' % nSamples)


if __name__ == '__main__':
    fire.Fire(createDataset)

Next, move the folder over to the deep-text-recognition-benchmark folder (the Git repo you cloned). Then run the following command in the terminal:

python .\create_lmdb_dataset.py  in data folder> for your lmdb dataset>

Where:

is the name of your folder with images and labels.txt (output in my case)
is the + the labels.txt (so .\output\labels.tx_t_ in my case)
is the name of a folder that will be created for your dataset converted to LMDB format (I called it .\lmbd_output)

For me, this was the command (make sure to run this command inside the deep-text-recognition-benchmark folder):

python .\create_lmdb_dataset.py .\output .\output\labels.txt .\lmbd_output

Now, you should have a new folder, like the image below, in your deep-text-recognition-benchmark folder.

How the folder for your lmdb converted data should look

NOTE: Running the command on an existing folder does not overwrite the existing folder. Make sure you either delete a folder or give the lmdb_output a new name (this was something I struggled with for a while, so hopefully, this will help you avoid that error).

How to Retrieve a Pre-trained OCR Model

Next, you need a pre-trained OCR model that you can fine-tune with your dataset. For this, you can go to this Dropbox website and download the TPS-ResNet-BiLSTM-Attn.pth model.

Place the model in your deep-text-recognition-benchmark folder (I know this looks a bit shady, but this is the part of the instructions in the deep-text-recognition-benchmark repository. The Dropbox is not mine, and I am linking it here because it is linked in the Git repo text-recognition-benchmark)

How to Run the Fine-tuning

If you run on CPU (this can be ignored if you are using GPU), you'll likely get an error that says: "RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False".

This can be fixed by changing lines 85 and 87 in the train.py file:

# OLD LINES
# ...
if opt.FT:
    model.load_state_dict(torch.load(opt.saved_model), strict=False)
else:
    model.load_state_dict(torch.load(opt.saved_model))
# ...


# NEW LINES (change to this if you are using CPU)
#
if opt.FT:
    model.load_state_dict(torch.load(opt.saved_model,map_location='cpu'), strict=False)
else:
    model.load_state_dict(torch.load(opt.saved_model,map_location='cpu'))
# ...

Finally, you can then run the fine-tuning. To do that, you can use the command below in the terminal:

python train.py --train_data lmdb_output --valid_data lmdb_output --select_data "/" --batch_ratio 1.0 --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --batch_size 2 --data_filtering_off --workers 0 --batch_max_length 80 --num_iter 10 --valInterval 5 --saved_model TPS-ResNet-BiLSTM-Attn.pth

Some notes on the command:

data_filtering_off is set to True (you only have to use the flag, not give it a variable). I did not use data_filtering because I'll have no samples to train on if filtering was enabled.
Workers were to set to 0 to avoid errors. I think this has something to do with multi-GPU settings, and this is also referred to in the train.py file in the deep-text-recognition-benchmark folder.
batch_max_length is the maximum length of any text in the training dataset. If you are using a different dataset, feel free to change this variable. Make sure this variable is as large as the longest string you are using in your dataset, or you'll get an error.
For this tutorial, I use train_data and valid_data to refer to the same folder. In practice, I would create one folder with a training dataset, and one for a validation dataset and refer to those instead.
I set num_iter to 10 so you can make sure it works. Naturally, this variable must be set much higher when running the actual fine-tuning of a model.
saved_model is an optional parameter. If you don’t set it, you will train a model from scratch. You probably don't want that (as this will require a lot of training), so set the saved_model flag to the existing model you downloaded from Dropbox.

How to Run Inference with your Fine-tuned Model

After you have fine-tuned your model, you'd want to run inference with it. To do this, you can use the command below:

python demo.py --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --image_folder test on> --saved_model

Where:

is a folder consisting of PNG images you want to test on. For me, this was output
is the path to the saved model from your fine-tuning. For me, this was .\saved_models\TPS-ResNet-BiLSTM-Attn-Seed1111\best_accuracy.pth (the fine-tuning saves the fine-tuned model in a saved_models folder)

Here's the command that I used:

python demo.py --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --image_folder output --saved_model .\saved_models\TPS-ResNet-BiLSTM-Attn-Seed1111\best_accuracy.pth

The command simply outputs the model's prediction and confidence score for each image in the folder, so you can check the performance of the model by looking at the images yourself to see if the model made the right prediction. This is a qualitative test of the performance of the model.

A Qualitative Test of Performance

To see if the fine-tuning worked, I will do a qualitative test of the performance by testing the original model against my fine-tuned model on 10 specific words and numbers.

The words I tested are shown below (merged vertically into one image). I had to make it a bit difficult for the model by adding skewed and blurred texts.

Self-made images merged with [https://products.aspose.app/pdf/merger/png-to-png](https://products.aspose.app/pdf/merger/png-to-png" rel="noopener ugc nofollow). The words from top to bottom are: “vanskeligheter”, “uvanligheter”, “skrekkeksempel”, “rosenborg”

Considering that I want my OCR to read Norwegian supermarket receipts, I added some Norwegian words (the words are taken from http://openfoodfacts.com/, you can read more about it in this article).

Hopefully, my fine-tuned model should perform better on these words, as the original OCR model is not used to seeing Norwegian words. My fine-tuned model has been trained on some Norwegian words.

The texts in each image are:

image0 -> vanskeligheter
image1 -> uvanligheter
image2 -> skrekkeksempel
image3 -> rosenborg

Results for the original model (not fine-tuned):

Results for the original model (not fine-tuned) on a qualitative test. You can see the model struggles quite a bit

Results for fine-tuned model:

Results for the fine-tuned model. You can see the model achieves perfect accuracy because of the fine-tuning.

As you can see, the fine-tuning has worked, and the fine-tuned model achieves perfect results in this qualitative example.

To interpret your results qualitatively, you should grab a sample of documents that are representative for the full dataset and manually compare the OCR output and the ground truth. This will give you a feel of how well the model is performing, as you can see how often it makes errors.

You should note that you often cannot expect perfect results from the fine-tuned OCR engine, and you can therefore use the qualitative analysis to determine specific errors the model is making.

This could, for example, be the model having difficulties recognizing certain characters. If this is the case, you can train the model on more examples of those characters to further increase the performance of your model.

Quantitative Test of Performance

If you want a more quantitative test, you can either look at the validation results that show up during fine-tuning, or you can use the command below:

python test.py --eval_data test data set in lmdb format> --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --saved_model test> --batch_max_length 70 --workers 0 --batch_size 2 --data_filtering_off

Where:

is the path to the folder containing the test data in LMDB format. For me, this was: lmdb_norwegian_data_test
is the path to the model you want to test its performance of. For me, this was: saved_models/TPS-ResNet-BiLSTM-Attn-Seed1111/best_accuracy.pth.

The command I used was therefore:

python test.py --eval_data lmdb_norwegian_data_test --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --saved_model saved_models/TPS-ResNet-BiLSTM-Attn-Seed1111/best_accuracy.pth --batch_max_length 70 --workers 0 --batch_size 2 --data_filtering_off

This will output accuracy in percentage, so a number between 0 and 100, which is the accuracy the OCR model achieves on your test dataset.

In my experience, the model you downloaded from Dropbox needs a bit of training. At first, the model will make inaccurate predictions, but if you let it train for 30 minutes or so you should start seeing some improvements.

I then ran the test.py on the 4 images I showed above and got the results in the images below: with the old (not fine-tuned) model to on top and the new fine-tuned model below.

Results from the old model:

Result for the old model, which acieves an accuracy of 50%.

Results from the fine-tuned model:

Result for the new fine-tuned model which achieves an accuracy of 100%, which indicates the fine-tuning worked

You can see that the new fine-tuned model performs better with a accuracy of 100 percent.

Conclusion

Congrats, you can now fine-tune your OCR model. To make a significant impact on a larger model and generalize it, you probably have to make a larger dataset. You can learn about that in this tutorial, and then let the model train for a while.

In the end, the OCR model will hopefully perform better for your specific use case.

This tutorial was originally written part by part on my Medium, you can check out each part here:

If you are interested and want to learn more about similar topics, you can find me on:

Cover image: Use OCR to read documents. Image made with DALL-E. OpenAI. (2023). ChatGPT (Large language model) https://chat.openai.com.

Create a Self-Playing AI Chess Engine from Scratch with Imitation Learning

Eivind Kjosbakken — Thu, 21 Sep 2023 08:11:38 +0000

This is an article on how I created an AI chess engine, starting completely from scratch to building my very own AI chess engine.

Because creating an AI chess engine from scratch is a relatively complex task, this will be a long article, but stay tuned, as the product you will end up with will be a cool project to showcase!

Prerequisites

This article will explain most concepts in detail. However, there are some recommended prerequisites to follow the tutorial. You should be familiar with the following:

Python
How to use the terminal
Jupyter Notebook
Fundamental AI concepts
Chess rules

I will also use the following tools:

Python
Different Python packages
Stockfish

Part 1 : How to Generate a Dataset
Part 2 : How to Encode data
Part 3: How to Train the AI model
Conclusion

Part 1: How to Generate a Dataset

In this part, I will use Stockfish to generate a large dataset of moves from different positions. This data can then be used later on to train the chess AI.

How to download Stockfish

The most important component of my chess engine is Stockfish, so I will therefore show you how to install it.

Go to the Stockfish website download page and download the version for you. I am using Windows myself, so I chose the Windows (faster) version:

Press the download button marked in red if you have a Windows PC

After downloading, extract the zip file to whatever location on your PC you want your chess engine to be. Remember where you place it as you need the path for the next step.

How to incorporate Stockfish with Python

Now you also need to incorporate the engine into Python. You could manually do this, but I found it easier to use the Python Stockfish package as it has all the functions you need.

First install the package from pip (preferably in your virtual environment):

pip install stockfish

You can then import it using the following command:

from stockfish import Stockfish
stockfish = Stockfish(path=r"C:\Users\eivin\Documents\ownProgrammingProjects18062023\ChessEngine\stockfish\stockfish\stockfish-windows-2022-x86-64-avx2")

Note that you need to give your own path to the Stockfish executable file:

The stockfish executable file is the second file from the bottom

You can copy the file path from the folder structure, or if you are on Windows 11 you can press ctrl + shift + c to automatically copy the file path.

Great! Now you have Stockfish available in Python!

How to generate a dataset

Now you need a dataset so you can train the AI chess engine! You can do this by making Stockfish play games and remembering each position and the moves you could take from there.

Those moves will be along the best possible moves, considering Stockfish is a strong chess engine.

First, install a chess package and NumPy (there are plenty to choose from, but I will be using the one below).

Enter each line (individually) in the terminal:

pip install chess
pip install numpy

Then import packages (remember to also import Stockfish as shown earlier in this article):

import chess
import random
from pprint import pprint
import numpy as np
import os
import glob
import time

You also need some helper functions here:

#helper functions:
def checkEndCondition(board):
 if (board.is_checkmate() or board.is_stalemate() or board.is_insufficient_material() or board.can_claim_threefold_repetition() or board.can_claim_fifty_moves() or board.can_claim_draw()):
  return True
 return False

#save
def findNextIdx():
 files = (glob.glob(r"C:\Users\eivin\Documents\ownProgrammingProjects18062023\ChessEngine\data\*.npy"))
 if (len(files) == 0):
  return 1 #if no files, return 1
 highestIdx = 0
 for f in files:
  file = f
  currIdx = file.split("movesAndPositions")[-1].split(".npy")[0]
  highestIdx = max(highestIdx, int(currIdx))

 return int(highestIdx)+1

def saveData(moves, positions):
 moves = np.array(moves).reshape(-1, 1)
 positions = np.array(positions).reshape(-1,1)
 movesAndPositions = np.concatenate((moves, positions), axis = 1)
 nextIdx = findNextIdx()
 np.save(f"data/movesAndPositions{nextIdx}.npy", movesAndPositions)
 print("Saved successfully")

def runGame(numMoves, filename = "movesAndPositions1.npy"):
 """run a game you stored"""
 testing = np.load(f"data/{filename}")
 moves = testing[:, 0]
 if (numMoves > len(moves)):
  print("Must enter a lower number of moves than maximum game length. Game length here is: ", len(moves))
  return

 testBoard = chess.Board()

 for i in range(numMoves):
  move = moves[i]
  testBoard.push_san(move)
 return testBoard

Remember to change the file path in the findNextIdx function, as this is personal for your computer.

Create a data folder within the folder you are coding, and copy the path (but still keep the *.npy at the end)

The checkEndCondition function uses functions from the Chess pip package to check if the game is to be ended.

The saveData function saves a game to npy files which is a highly optimized way of storing arrays.

The function uses the findNextIdx function to save to a new file (remember here to create a new folder called data to store all data in).

Finally, the runGame function makes it so you can run a game that you saved to check the positions after numMoves number of moves.

Then you can finally get to the function that mines the chess games:

def mineGames(numGames : int):
 """mines numGames games of moves"""
 MAX_MOVES = 500 #don't continue games after this number

 for i in range(numGames):
  currentGameMoves = []
  currentGamePositions = []
  board = chess.Board()
  stockfish.set_position([])

  for i in range(MAX_MOVES):
   #randomly choose from those 3 moves
   moves = stockfish.get_top_moves(3)
   #if less than 3 moves available, choose first one, if none available, exit
   if (len(moves) == 0):
    print("game is over")
    break
   elif (len(moves) == 1):
    move = moves[0]["Move"]
   elif (len(moves) == 2):
    move = random.choices(moves, weights=(80, 20), k=1)[0]["Move"]
   else:
    move = random.choices(moves, weights=(80, 15, 5), k=1)[0]["Move"]

   currentGamePositions.append(stockfish.get_fen_position())
   board.push_san(move)
   currentGameMoves.append(move)
   stockfish.set_position(currentGameMoves)
   if (checkEndCondition(board)):
    print("game is over")
    break
  saveData(currentGameMoves, currentGamePositions)

Here you first set a max limit so a game does not last infinitely long.

Then, you run the number of games you want to run and make sure both Stockfish and the Chess pip package are reset to the starting position.

Next, you get the top 3 moves suggested by Stockfish and choose one of them to play (80 % change for the best move, 15 % change for the second best move, 5 % change for the third best move). The reason you are not always choosing the best move is for the move selection to be more stochastic.

Then, you choose a move (making sure no error occurs even if there are less than three possible moves), save the board position using FEN (a way of encoding a chess position), as well as the move done from that position.

If the game is done, you break the loop and store all positions and the moves made from those positions. If the game is not done, you continue making moves until the game is over.

You can then mine one game with:

mineGames(1)

Remember to create a data folder here, as this is where I store the games!

How to review a mined game

Run the mineGames function to mine one game using the following command:

mineGames(1)

You can access this game with a helper function shown earlier using the following command:

testBoard = runGame(12, "movesAndPositions1.npy")
testBoard

Assuming there have been 12 moves in the game, you will then see something like this:

Output from printing board position after 12 moves. (Note that the last line with just testBoard is printed, since in a Jupyter notebook, a variable is printed if it is written alone at the bottom of a cell).

And that’s it, you can now mine as many games as you would like.

It is going to take some time, and there are potentials for optimizing this mining process, such as parallelizing the game simulations (since each game is completely separate from the other).

For the full code from part 1, you can check out the full code on my GitHub.

Part 2 : How to Encode Data

In this part, you will encode chess moves and positions in the same way DeepMind did with AlphaZero!

I will use the data you gathered in part 1 of this series.

As a reminder, you installed Stockfish and made sure you could access it on the computer. You then made it play games against itself, while you stored all moves and positions.

You now have a supervised learning problem, since the input is the current position, and the label (the correct move from the positions) is the move that Stockfish decided was the best.

How to install and import packages

First, you need to install and import all required packages, some of which you may already have if you followed part 1 of this series.

All imports are below – remember to only input one line at a time when installing via pip:

pip install numpy
pip install gym-chess
pip install chess

Additionally, you need to make a small change in one of the files in the gym-chess package since np.int was used, which is now deprecated.

In the file with the relative path (from the virtual environment) venv\Lib\site-packages\gym_chess\alphazero\board_encoding.py where venv is the name of my virtual environment, you have to search for "np.int" and replace them with "int".

If you don't, you will see an error message stating that np.int is deprecated.

I also had to restart VS Code after replacing "np.int" with "int", to make it work.

All imports you need are below:

import numpy as np
import gym
import chess
import os
import gym.spaces
from gym_chess.alphazero.move_encoding import utils, queenmoves, knightmoves, underpromotions
from typing import List

And then you also need to create the gym environment to encode and decode moves:

env = gym.make('ChessAlphaZero-v0')

How to encode board positions and moves

Encoding is an important element within AI, as it allows us to represent problems in a readable way for the AI.

Instead of an image of a chess board, or a string representing a chess move like "d2d4", you instead represent this using arrays (lists of numbers).

Finding out how to do this manually is quite challenging, but luckily for us, the gym-chess Python package has already solved this problem for us.

I am not going to go into more details on how they encoded it, but you can see using the code below that a position is represented with an (8,8,119) shaped array, and all possible moves are given with a (4672) array (1 column with 4672 values).

If you want to read more about this, you can check out the AlphaZero paper, though this is quite a complicated paper to fully understand.

#code to print action and state space
env = gym.make('ChessAlphaZero-v0')
env.reset()
print(env.observation_space)
print(env.action_space)

Which outputs:

Output from printing state (first line) and action space (second line)

You can also check out the encoding of a move. From string notation to encoded notation. Make sure to reset the environment as it may give an error if you do not:

#first set the environment and make sure to reset the positions
env = gym.make('ChessAlphaZero-v0')
env.reset()

#encoding the move e2 to e4
move = chess.Move.from_uci('e2e4')
print(env.encode(move))
# -> outputs: 877

#decoding the encoded move 877
print(env.decode(877))
# -> outputs: Move.from_uci('e2e4')

With this, you can now have functions to encode the moves and positions you stored from part 1 where you generated a dataset.

How to create functions for encoding moves

These functions are copied from the Gym-Chess package, but with small tweaks so it is not dependent on a class.

I manually changed these functions so that it was easier to encode. I would not worry too much about understanding these functions fully, as they are quite complicated.

Just know that they are a way of making sure moves that humans understand, are converted to a way that computers can understand.

#fixing encoding funcs from openai

def encodeKnight(move: chess.Move):
    _NUM_TYPES: int = 8

    #: Starting point of knight moves in last dimension of 8 x 8 x 73 action array.
    _TYPE_OFFSET: int = 56

    #: Set of possible directions for a knight move, encoded as 
    #: (delta rank, delta square).
    _DIRECTIONS = utils.IndexedTuple(
        (+2, +1),
        (+1, +2),
        (-1, +2),
        (-2, +1),
        (-2, -1),
        (-1, -2),
        (+1, -2),
        (+2, -1),
    )

    from_rank, from_file, to_rank, to_file = utils.unpack(move)

    delta = (to_rank - from_rank, to_file - from_file)
    is_knight_move = delta in _DIRECTIONS

    if not is_knight_move:
        return None

    knight_move_type = _DIRECTIONS.index(delta)
    move_type = _TYPE_OFFSET + knight_move_type

    action = np.ravel_multi_index(
        multi_index=((from_rank, from_file, move_type)),
        dims=(8, 8, 73)
    )

    return action

def encodeQueen(move: chess.Move):
    _NUM_TYPES: int = 56 # = 8 directions * 7 squares max. distance
    _DIRECTIONS = utils.IndexedTuple(
        (+1,  0),
        (+1, +1),
        ( 0, +1),
        (-1, +1),
        (-1,  0),
        (-1, -1),
        ( 0, -1),
        (+1, -1),
    )

    from_rank, from_file, to_rank, to_file = utils.unpack(move)

    delta = (to_rank - from_rank, to_file - from_file)

    is_horizontal = delta[0] == 0
    is_vertical = delta[1] == 0
    is_diagonal = abs(delta[0]) == abs(delta[1])
    is_queen_move_promotion = move.promotion in (chess.QUEEN, None)

    is_queen_move = (
        (is_horizontal or is_vertical or is_diagonal) 
            and is_queen_move_promotion
    )

    if not is_queen_move:
        return None

    direction = tuple(np.sign(delta))
    distance = np.max(np.abs(delta))

    direction_idx = _DIRECTIONS.index(direction)
    distance_idx = distance - 1

    move_type = np.ravel_multi_index(
        multi_index=([direction_idx, distance_idx]),
        dims=(8,7)
    )

    action = np.ravel_multi_index(
        multi_index=((from_rank, from_file, move_type)),
        dims=(8, 8, 73)
    )

    return action

def encodeUnder(move):
    _NUM_TYPES: int = 9 # = 3 directions * 3 piece types (see below)
    _TYPE_OFFSET: int = 64
    _DIRECTIONS = utils.IndexedTuple(
        -1,
        0,
        +1,
    )
    _PROMOTIONS = utils.IndexedTuple(
        chess.KNIGHT,
        chess.BISHOP,
        chess.ROOK,
    )

    from_rank, from_file, to_rank, to_file = utils.unpack(move)

    is_underpromotion = (
        move.promotion in _PROMOTIONS 
        and from_rank == 6 
        and to_rank == 7
    )

    if not is_underpromotion:
        return None

    delta_file = to_file - from_file

    direction_idx = _DIRECTIONS.index(delta_file)
    promotion_idx = _PROMOTIONS.index(move.promotion)

    underpromotion_type = np.ravel_multi_index(
        multi_index=([direction_idx, promotion_idx]),
        dims=(3,3)
    )

    move_type = _TYPE_OFFSET + underpromotion_type

    action = np.ravel_multi_index(
        multi_index=((from_rank, from_file, move_type)),
        dims=(8, 8, 73)
    )

    return action

def encodeMove(move: str, board) -> int:
    move = chess.Move.from_uci(move)
    if board.turn == chess.BLACK:
        move = utils.rotate(move)

    action = encodeQueen(move)

    if action is None:
        action = encodeKnight(move)

    if action is None:
        action = encodeUnder(move)

    if action is None:
        raise ValueError(f"{move} is not a valid move")

    return action

So now you can give in a move as a string (for example: "e2e4" for the move from e2 to e4), and it outputs a number (the encoded version of the move).

How to create a function for encoding positions

Encoding the positions is a bit more difficult. I took a function from the gym-chess package ("encodeBoard") since I had some issues using the package directly. The function I copied is below:

def encodeBoard(board: chess.Board) -> np.array:
 """Converts a board to numpy array representation."""

 array = np.zeros((8, 8, 14), dtype=int)

 for square, piece in board.piece_map().items():
  rank, file = chess.square_rank(square), chess.square_file(square)
  piece_type, color = piece.piece_type, piece.color

  # The first six planes encode the pieces of the active player, 
  # the following six those of the active player's opponent. Since
  # this class always stores boards oriented towards the white player,
  # White is considered to be the active player here.
  offset = 0 if color == chess.WHITE else 6

  # Chess enumerates piece types beginning with one, which you have
  # to account for
  idx = piece_type - 1

  array[rank, file, idx + offset] = 1

 # Repetition counters
 array[:, :, 12] = board.is_repetition(2)
 array[:, :, 13] = board.is_repetition(3)

 return array

def encodeBoardFromFen(fen: str) -> np.array:
 board = chess.Board(fen)
 return encodeBoard(board)

I also added the encodeBoardFromFen function, since the copied function required a chess board represented using the Python Chess package, so I first convert from FEN-notation (a way of encoding chess positions to a string – you cannot use this as you need the encoding to be in numbers) to a chess board given in that package.

Then you have all you need to encode all your files.

How to automate encoding for all raw data files

Now that you can encode moves and positions, you will automate this process for all files in your folder that you generated from part 1 of this series. This involves finding all files in which you have to encode the data and saving these to new files.

Note that from part 1 I changed the folder structure slightly.

I now have a parent Data folder, and within this folder, I have the rawData, which is the moves in string format and positions in FEN-format (from part 1).

I also have the preparedData folder under the data folder, where the encoded moves and positions will be stored.

Note that the encoded moves and positions will be stored in separate files since the encodings have different dimensions.

Folder structure for the data. Make sure to have two folders called preparedData and rawData within the Data folder. The Data folder is on the same level as your notebook files.

#function to encode all moves and positions from rawData folder
def encodeAllMovesAndPositions():
    board = chess.Board() #this is used to change whose turn it is so that the encoding works
    board.turn = False #set turn to black first, changed on first run

    #find all files in folder:
    files = os.listdir('data/rawData')
    for idx, f in enumerate(files):
        movesAndPositions = np.load(f'data/rawData/{f}', allow_pickle=True)
        moves = movesAndPositions[:,0]
        positions = movesAndPositions[:,1]
        encodedMoves = []
        encodedPositions = []

        for i in range(len(moves)):
            board.turn = (not board.turn) #swap turns
            try:
                encodedMoves.append(encodeMove(moves[i], board)) 
                encodedPositions.append(encodeBoardFromFen(positions[i]))
            except:
                try:
                    board.turn = (not board.turn) #change turn, since you  skip moves sometimes, you  might need to change turn
                    encodedMoves.append(encodeMove(moves[i], board)) 
                    encodedPositions.append(encodeBoardFromFen(positions[i]))
                except:
                    print(f'error in file: {f}')
                    print("Turn: ", board.turn)
                    print(moves[i])
                    print(positions[i])
                    print(i)
                    break

        np.save(f'data/preparedData/moves{idx}', np.array(encodedMoves))
        np.save(f'data/preparedData/positions{idx}', np.array(encodedPositions))

encodeAllMovesAndPositions()

#NOTE: shape of files:
#moves: (number of moves in gamew)
#positions: (number of moves in game, 8, 8, 14) (number of moves in game is including both black and white moves)

I first create the environment and reset it.

Then, I open all raw data files made from part 1 and encode this. I also do it in a try/catch statement, as I sometimes see errors with move encodings.

The first except statement is for if a move is skipped (so the program thinks it’s the wrong turn). If this happens, the encoding will not work, so the except statement changes the turn and tries again. This is not the most optimal code, but the encoding is a minor part of the total runtime to creating an AI chess engine, and it is therefore acceptable.

Make sure you have the correct folder structure and have created all the different folders. If not, you will receive an error.

You have now encoded your chess board and moves. If you want to, you can check out the full code from this part on my GitHub.

Part 3: How to Train the AI model

This is the third and last part in the for creating your own AI chess engine!

In part 1 you learned how to create a dataset, and in part 2 you looked at encoding the dataset so that it could be used for an AI.

You will now use this encoded dataset to train your own AI using PyTorch!

How to import packages

As always, you have all the imports that will be used in the tutorial. Most are straightforward, but you need to install PyTorch, which I recommend installing using this website.

Here you can scroll down a bit, where you see some options for which build and operating system you are on.

After selecting the options that apply to you, you will get some code you can paste into the terminal to install PyTorch.

You can see the options I chose in the image below, but in general, I recommend using the stable build and choosing your own operating system.

Then, select what package you are most used to (Conda or pip is probably the easiest as you can just paste it into the terminal).

Select CUDA 11.7/11.8 (does not matter which one), and install using the given command at the bottom.

My selections when installing PyTorch.

You can then import all your packages with the code below:

import numpy as np
import torch
import torch.nn as nn
import torch.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.tensorboard import SummaryWriter
from datetime import datetime
import gym
import gym_chess
import os
import chess
from tqdm import tqdm
from gym_chess.alphazero.move_encoding import utils
from pathlib import Path
from typing import Optional

How to Install CUDA

This is an optional step, that allows you to utilize your GPU to train your model much faster. It is not required, but will save you some time when training your AI.

The way you install CUDA varies depending on your operating system, but I am using Windows and followed this tutorial.

If you are on MacOS or Linux, then you can find a tutorial by googling: “installing CUDA Mac/Linux”.

To check if you have CUDA available (your GPU is available), you can use this code:

#check if cuda available
torch.cuda.is_available()

Which outputs True if your GPU is available. If you do not have a GPU available however, do not worry, the only downside here is training the model takes longer, which is not that big of a deal when doing hobby projects like this one.

How to create encoding methods

I then define some helper methods for encoding and decoding from the Python Gym-Chess package.

I had to make some modifications to the package, to make it work. Most of the code is copied from the package, with just a few small tweaks making the code not dependent on a class and so forth.

Note that you do not have to understand all the code below, as the way Deepmind encodes all moves in chess is complicated.

#helper methods:

#decoding moves from idx to uci notation
def _decodeKnight(action: int) -> Optional[chess.Move]:
    _NUM_TYPES: int = 8

    #: Starting point of knight moves in last dimension of 8 x 8 x 73 action array.
    _TYPE_OFFSET: int = 56

    #: Set of possible directions for a knight move, encoded as 
    #: (delta rank, delta square).
    _DIRECTIONS = utils.IndexedTuple(
        (+2, +1),
        (+1, +2),
        (-1, +2),
        (-2, +1),
        (-2, -1),
        (-1, -2),
        (+1, -2),
        (+2, -1),
    )

    from_rank, from_file, move_type = np.unravel_index(action, (8, 8, 73))

    is_knight_move = (
        _TYPE_OFFSET <= move_type
        and move_type < _TYPE_OFFSET + _NUM_TYPES
    )

    if not is_knight_move:
        return None

    knight_move_type = move_type - _TYPE_OFFSET

    delta_rank, delta_file = _DIRECTIONS[knight_move_type]

    to_rank = from_rank + delta_rank
    to_file = from_file + delta_file

    move = utils.pack(from_rank, from_file, to_rank, to_file)
    return move

def _decodeQueen(action: int) -> Optional[chess.Move]:

    _NUM_TYPES: int = 56 # = 8 directions * 7 squares max. distance

    #: Set of possible directions for a queen move, encoded as 
    #: (delta rank, delta square).
    _DIRECTIONS = utils.IndexedTuple(
        (+1,  0),
        (+1, +1),
        ( 0, +1),
        (-1, +1),
        (-1,  0),
        (-1, -1),
        ( 0, -1),
        (+1, -1),
    )
    from_rank, from_file, move_type = np.unravel_index(action, (8, 8, 73))

    is_queen_move = move_type < _NUM_TYPES

    if not is_queen_move:
        return None

    direction_idx, distance_idx = np.unravel_index(
        indices=move_type,
        shape=(8,7)
    )

    direction = _DIRECTIONS[direction_idx]
    distance = distance_idx + 1

    delta_rank = direction[0] * distance
    delta_file = direction[1] * distance

    to_rank = from_rank + delta_rank
    to_file = from_file + delta_file

    move = utils.pack(from_rank, from_file, to_rank, to_file)
    return move

def _decodeUnderPromotion(action):
    _NUM_TYPES: int = 9 # = 3 directions * 3 piece types (see below)

    #: Starting point of underpromotions in last dimension of 8 x 8 x 73 action 
    #: array.
    _TYPE_OFFSET: int = 64

    #: Set of possibel directions for an underpromotion, encoded as file delta.
    _DIRECTIONS = utils.IndexedTuple(
        -1,
        0,
        +1,
    )

    #: Set of possibel piece types for an underpromotion (promoting to a queen
    #: is implicitly encoded by the corresponding queen move).
    _PROMOTIONS = utils.IndexedTuple(
        chess.KNIGHT,
        chess.BISHOP,
        chess.ROOK,
    )

    from_rank, from_file, move_type = np.unravel_index(action, (8, 8, 73))

    is_underpromotion = (
        _TYPE_OFFSET <= move_type
        and move_type < _TYPE_OFFSET + _NUM_TYPES
    )

    if not is_underpromotion:
        return None

    underpromotion_type = move_type - _TYPE_OFFSET

    direction_idx, promotion_idx = np.unravel_index(
        indices=underpromotion_type,
        shape=(3,3)
    )

    direction = _DIRECTIONS[direction_idx]
    promotion = _PROMOTIONS[promotion_idx]

    to_rank = from_rank + 1
    to_file = from_file + direction

    move = utils.pack(from_rank, from_file, to_rank, to_file)
    move.promotion = promotion

    return move

#primary decoding function, the ones above are just helper functions
def decodeMove(action: int, board) -> chess.Move:
        move = _decodeQueen(action)
        is_queen_move = move is not None

        if not move:
            move = _decodeKnight(action)

        if not move:
            move = _decodeUnderPromotion(action)

        if not move:
            raise ValueError(f"{action} is not a valid action")

        # Actions encode moves from the perspective of the current player. If
        # this is the black player, the move must be reoriented.
        turn = board.turn

        if turn == False: #black to move
            move = utils.rotate(move)

        # Moving a pawn to the opponent's home rank with a queen move
        # is automatically assumed to be queen underpromotion. However,
        # since queenmoves has no reference to the board and can thus not
        # determine whether the moved piece is a pawn, you have to add this
        # information manually here
        if is_queen_move:
            to_rank = chess.square_rank(move.to_square)
            is_promoting_move = (
                (to_rank == 7 and turn == True) or 
                (to_rank == 0 and turn == False)
            )

            piece = board.piece_at(move.from_square)
            if piece is None: #NOTE I added this, not entirely sure if it's correct
                return None
            is_pawn = piece.piece_type == chess.PAWN

            if is_pawn and is_promoting_move:
                move.promotion = chess.QUEEN

        return move

def encodeBoard(board: chess.Board) -> np.array:
 """Converts a board to numpy array representation."""

 array = np.zeros((8, 8, 14), dtype=int)

 for square, piece in board.piece_map().items():
  rank, file = chess.square_rank(square), chess.square_file(square)
  piece_type, color = piece.piece_type, piece.color

  # The first six planes encode the pieces of the active player, 
  # the following six those of the active player's opponent. Since
  # this class always stores boards oriented towards the white player,
  # White is considered to be the active player here.
  offset = 0 if color == chess.WHITE else 6

  # Chess enumerates piece types beginning with one, which you have
  # to account for
  idx = piece_type - 1

  array[rank, file, idx + offset] = 1

 # Repetition counters
 array[:, :, 12] = board.is_repetition(2)
 array[:, :, 13] = board.is_repetition(3)

 return array

How to load the data

In part 1, you mined some chess games, and then in part 2, you encoded it so that it could be used to train a model.

You now load this data in PyTorch data loader objects, so it’s available for the model to train on. In case you have not done part 1 or 2 of this tutorial, you can find some ready-made training files in this Google Drive folder.

First, define some hyperparameters:

FRACTION_OF_DATA = 1
BATCH_SIZE = 4

The FRACTION_OF_DATA variable, is there just in case you want to train the model fast and do not want to train it on the full dataset. Make sure this value is > 0 and ≤ 1.

The BATCH_SIZE variable decides the batch size the model trains on. In general, a higher batch size means the model can train faster, but your batch size is limited by the power of your GPU.

I recommend testing with a low batch size of 4 and then trying to increase it and see if training still works as it should. If you get a memory error of some sort, try decreasing the batch size again.

You then load the data with the code below. Make sure your folder structure and file naming is correct here. You should have an initial data folder in the same place where your code is.

Then inside this data folder, you should have a preparedData folder, that contains the files you want to train on. These files have to be named moves{i}.npy and positions{i}.npy, where i is the index of the file. If you encoded the files as I did earlier, everything should be correct.

The folder structure. Yellow are folders, and turquoise are files.

#dataset

#loading training data

allMoves = []
allBoards = []

files = os.listdir('data/preparedData')
numOfEach = len(files) // 2 # half are moves, other half are positions

for i in range(numOfEach):
    try:
        moves = np.load(f"data/preparedData/moves{i}.npy", allow_pickle=True)
        boards = np.load(f"data/preparedData/positions{i}.npy", allow_pickle=True)
        if (len(moves) != len(boards)):
            print("ERROR ON i = ", i, len(moves), len(boards))
        allMoves.extend(moves)
        allBoards.extend(boards)
    except:
        print("error: could not load ", i, ", but is still going")

allMoves = np.array(allMoves)[:(int(len(allMoves) * FRACTION_OF_DATA))]
allBoards = np.array(allBoards)[:(int(len(allBoards) * FRACTION_OF_DATA))]
assert len(allMoves) == len(allBoards), "MUST BE OF SAME LENGTH"

#flatten out boards
# allBoards = allBoards.reshape(allBoards.shape[0], -1)

trainDataIdx = int(len(allMoves) * 0.8)

#NOTE transfer all data to GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
allBoards = torch.from_numpy(np.asarray(allBoards)).to(device)
allMoves = torch.from_numpy(np.asarray(allMoves)).to(device)

training_set = torch.utils.data.TensorDataset(allBoards[:trainDataIdx], allMoves[:trainDataIdx])
test_set = torch.utils.data.TensorDataset(allBoards[trainDataIdx:], allMoves[trainDataIdx:])
# Create data loaders for your datasets; shuffle for training, not for validation

training_loader = torch.utils.data.DataLoader(training_set, batch_size=BATCH_SIZE, shuffle=True)
validation_loader = torch.utils.data.DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

How to define the deep learning model

You can then define the model architecture:

class Model(torch.nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.INPUT_SIZE = 896 
        # self.INPUT_SIZE = 7*7*13 #NOTE changing input size for using cnns
        self.OUTPUT_SIZE = 4672 # = number of unique moves (action space)

        #can try to add CNN and pooling here (calculations taking into account spacial features)

        #input shape for sample is (8,8,14), flattened to 1d array of size 896
        # self.cnn1 = nn.Conv3d(4,4,(2,2,4), padding=(0,0,1))
        self.activation = torch.nn.ReLU()
        self.linear1 = torch.nn.Linear(self.INPUT_SIZE, 1000)
        self.linear2 = torch.nn.Linear(1000, 1000)
        self.linear3 = torch.nn.Linear(1000, 1000)
        self.linear4 = torch.nn.Linear(1000, 200)
        self.linear5 = torch.nn.Linear(200, self.OUTPUT_SIZE)
        self.softmax = torch.nn.Softmax(1) #use softmax as prob for each move, dim 1 as dim 0 is the batch dimension

    def forward(self, x): #x.shape = (batch size, 896)
        x = x.to(torch.float32)
        # x = self.cnn1(x) #for using cnns
        x = x.reshape(x.shape[0], -1)
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.activation(x)
        x = self.linear3(x)
        x = self.activation(x)
        x = self.linear4(x)
        x = self.activation(x)
        x = self.linear5(x)
        # x = self.softmax(x) #do not use softmax since you are using cross entropy loss
        return x

    def predict(self, board : chess.Board):
        """takes in a chess board and returns a chess.move object. NOTE: this function should definitely be written better, but it works for now"""
        with torch.no_grad():
            encodedBoard = encodeBoard(board)
            encodedBoard = encodedBoard.reshape(1, -1)
            encodedBoard = torch.from_numpy(encodedBoard)
            res = self.forward(encodedBoard)
            probs = self.softmax(res)

            probs = probs.numpy()[0] #do not want tensor anymore, 0 since it is a 2d array with 1 row

            #verify that move is legal and can be decoded before returning
            while len(probs) > 0: #try max 100 times, if not throw an error
                moveIdx = probs.argmax()
                try: #TODO should not have try here, but was a bug with idx 499 if it is black to move
                    uciMove = decodeMove(moveIdx, board)
                    if (uciMove is None): #could not decode
                        probs = np.delete(probs, moveIdx)
                        continue
                    move = chess.Move.from_uci(str(uciMove))
                    if (move in board.legal_moves): #if legal, return, else: loop continues after deleting the move
                        return move 
                except:
                    pass
                probs = np.delete(probs, moveIdx) #TODO probably better way to do this, but it is not too time critical as it is only for predictions
                                             #remove the move so its not chosen again next iteration

            #TODO can return random move here as well!
            return None #if no legal moves found, return None

You are free to change the architecture however you like.

Here, I have just chosen some simple parameters that worked decently, though there is room for improvement. Some examples of changes you can make are:

Add PyTorch CNN modules (remember to not flatten the array before adding these)
Change the activation functions in hidden layers. I am now using ReLU, but this could be changed to for example Sigmoid or Tanh, which you can read more about here.
Change the number of hidden layers. When changing this, you must remember to add an activation function between each layer in the forward() function.
Change the number of neurons in each hidden layer. If you are going to change the number of neurons, you must remember the rule that the number of neurons out in layer n, should be the neurons in, in layer n+1. So for example, linear1 takes in 1000 neurons, and outputs 2000 neurons. Then linear2 must take in 2000 neurons. You can then freely choose the number of output neurons on linear2, but the amount must match the number of input neurons in linear 3, and so on. The input to layer 1 and the output from the last layer however are set with the parameters INPUT_SIZE, and OUTPUT_SIZE.

In addition to the model architecture and forward functions, which are obligatory when creating a deep model, I also defined a predict() function, to make it easier to give a chess position to the model, and then it outputs the move it recommends.

How to train the model

When you have all the required data and the model is defined, you can begin training the model. First, you define a function to train one epoch and save the best model:

#helper functions for training
def train_one_epoch(model, optimizer, loss_fn, epoch_index, tb_writer):
    running_loss = 0.
    last_loss = 0.

    # Here, you use enumerate(training_loader) instead of
    # iter(training_loader) so that you can track the batch
    # index and do some intra-epoch reporting
    for i, data in enumerate(training_loader):

        # Every data instance is an input + label pair
        inputs, labels = data

        # Zero your gradients for every batch!
        optimizer.zero_grad()

        # Make predictions for this batch
        outputs = model(inputs)

        # Compute the loss and its gradients
        loss = loss_fn(outputs, labels)
        loss.backward()

        # Adjust learning weights
        optimizer.step()

        # Gather data and report
        running_loss += loss.item()
        if i % 1000 == 999:
            last_loss = running_loss / 1000 # loss per batch
            # print('  batch {} loss: {}'.format(i + 1, last_loss))
            tb_x = epoch_index * len(training_loader) + i + 1
            tb_writer.add_scalar('Loss/train', last_loss, tb_x)
            running_loss = 0.

    return last_loss

#the 3 functions below help store the best model you have created yet
def createBestModelFile():
    #first find best model if it exists:
    folderPath = Path('./savedModels')
    if (not folderPath.exists()):
        os.mkdir(folderPath)

    path = Path('./savedModels/bestModel.txt')

    if (not path.exists()):
        #create the files
        f = open(path, "w")
        f.write("10000000") #set to high number so it is overwritten with better loss
        f.write("\ntestPath")
        f.close()

def saveBestModel(vloss, pathToBestModel):
    f = open("./savedModels/bestModel.txt", "w")
    f.write(str(vloss.item()))
    f.write("\n")
    f.write(pathToBestModel)
    print("NEW BEST MODEL FOUND WITH LOSS:", vloss)

def retrieveBestModelInfo():
    f = open('./savedModels/bestModel.txt', "r")
    bestLoss = float(f.readline())
    bestModelPath = f.readline()
    f.close()
    return bestLoss, bestModelPath

Note that this function is essentially copied from the PyTorch docs, with a slight change by importing the model, optimizer, and loss function as function parameters.

You then define the hyperparameters like below. Note that this is something you can tune, to further improve your model.

#hyperparameters
EPOCHS = 60
LEARNING_RATE = 0.001
MOMENTUM = 0.9

Run the training with the code below:

#run training

createBestModelFile()

bestLoss, bestModelPath = retrieveBestModelInfo()

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
writer = SummaryWriter('runs/fashion_trainer_{}'.format(timestamp))
epoch_number = 0

model = Model()
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE, momentum=MOMENTUM)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

best_vloss = 1_000_000.

for epoch in tqdm(range(EPOCHS)):
    if (epoch_number % 5 == 0):
        print('EPOCH {}:'.format(epoch_number + 1))

    # Make sure gradient tracking is on, and do a pass over the data
    model.train(True)
    avg_loss = train_one_epoch(model, optimizer, loss_fn, epoch_number, writer)

    running_vloss = 0.0
    # Set the model to evaluation mode, disabling dropout and using population
    # statistics for batch normalization.

    model.eval()

    # Disable gradient computation and reduce memory consumption.
    with torch.no_grad():
        for i, vdata in enumerate(validation_loader):
            vinputs, vlabels = vdata
            voutputs = model(vinputs)

            vloss = loss_fn(voutputs, vlabels)
            running_vloss += vloss

    avg_vloss = running_vloss / (i + 1)

    #only print every 5 epochs
    if epoch_number % 5 == 0:
        print('LOSS train {} valid {}'.format(avg_loss, avg_vloss))

    # Log the running loss averaged per batch
    # for both training and validation
    writer.add_scalars('Training vs. Validation Loss',
                    { 'Training' : avg_loss, 'Validation' : avg_vloss },
                    epoch_number + 1)
    writer.flush()

    # Track best performance, and save the model's state
    if avg_vloss < best_vloss:
        best_vloss = avg_vloss

        if (bestLoss > best_vloss): #if better than previous best loss from all models created, save it
            model_path = 'savedModels/model_{}_{}'.format(timestamp, epoch_number)
            torch.save(model.state_dict(), model_path)
            saveBestModel(best_vloss, model_path)

    epoch_number += 1

print("\n\nBEST VALIDATION LOSS FOR ALL MODELS: ", bestLoss)

This code is also heavily inspired by the PyTorch docs.

Depending on the number of layers in your model, the number of neurons in the layers, the number of epochs, if you are using GPU or not, and several other factors, your time to train the model can take anywhere from seconds, to several hours.

As you can see below, the estimated time to train my model here was about 2 minutes.

Video of the model training. Recorded using [LICEcap](https://www.cockos.com/licecap/" rel="noopener ugc nofollow noopener noopener)

How to test your model

Testing your model is a vital part of checking if what you created works. I have implemented two ways of checking the model:

Yourself vs AI

The first way is to play yourself against the AI. Here you decide a move, then you let the AI decide the move, and so on. I recommend doing this in a notebook, so you can run different cells for different actions.

First, load a model that was saved from training. Here, I get the path to the file from the file created when running training, that stores the path to your best model. You can of course also manually change the path to the model you prefer to use.

saved_model = Model()

#load best model path from your file
f = open("./savedModels/bestModel.txt", "r")
bestLoss = float(f.readline())
model_path = f.readline()
f.close()

model.load_state_dict(torch.load(model_path))

Then, define the chess board:

#play your own game
board = chess.Board()

Then you can make a move by running the code in the cell below by changing the string in the first line. Make sure it is a legal move:

moveStr = "e2e4"
move = chess.Move.from_uci(moveStr)
board.push(move)

Then you can let the AI decide the next move with the cell below:

#make ai move:
aiMove = saved_model.predict(board)
board.push(aiMove)
board

This will also print the board state so you can decide your own move more easily:

Printing the board state after the AI makes a move

Continue making every other move, let the AI play every other move, and see who wins!

If you want to regret a move, you can use:

#regret move:
board.pop()

Stockfish vs your AI

You can also automate the testing process, by setting Stockfish to a specific ELO, and letting your AI play against it:

First, load your model (make sure to change the model_path to your own model):

saved_model = Model()
model_path = "savedModels/model_20230702_150228_46" #TODO CHANGE THIS PATH
model.load_state_dict(torch.load(model_path))

Then import Stockfish, and set it to a specific ELO. Remember to change the path to the Stockfish engine to your own path where you have the Stockfish program):

# test elo  against stockfish
ELO_RATING = 500
from stockfish import Stockfish
#TODO CHANGE PATH BELOW
stockfish = Stockfish(path=r"C:\Users\eivin\Documents\ownProgrammingProjects18062023\ChessEngine\stockfish\stockfish\stockfish-windows-2022-x86-64-avx2")
stockfish.set_elo_rating(ELO_RATING)

A 100 ELO rating is quite bad, and something your engine will hopefully beat.

Then play the game with this script, which will run:

board = chess.Board()
allMoves = [] #list of strings for saving moves for setting pos for stockfish

MAX_NUMBER_OF_MOVES = 150
for i in range(MAX_NUMBER_OF_MOVES): #set a limit for the game

 #first my ai move
 try:
  move = saved_model.predict(board)
  board.push(move)
  allMoves.append(str(move)) #add so stockfish can see
 except:
  print("game over. You lost")
  break

 # #then get stockfish move
 stockfish.set_position(allMoves)
 stockfishMove = stockfish.get_best_move_time(3)
 allMoves.append(stockfishMove)
 stockfishMove = chess.Move.from_uci(stockfishMove)
 board.push(stockfishMove)

stockfish.reset_engine_parameters() #reset elo rating

board

Which will print the board position after the game is over.

Position after your chess engine lost a game to Stockfish

Reflection on the performance of the chess engine

I tried training the model on about 100k positions and moves and discovered that the performance of the model still is not enough to beat a low-level (500 ELO) chess bot.

There could be several reasons for this. Chess is a highly complicated game, that probably requires a lot more moves and positions for a decent bot to be developed.

Furthermore, there are several elements of the bot you change potentially change to improve it. The architecture can be improved, for example by adding a CNN at the beginning of the forward function, so that the bot takes in spatial information.

You can also change the number of hidden layers in the fully connected layers, or the amount of neurons in each layer.

A safe way to further improve the model is to feed it more data, as you have access to an infinite amount of data by using the mining code in this article.

Additionally, I think this shows that an imitation learning chess engine either needs a lot of data or training a chess engine solely from imitation learning might not be an optimal idea.

Still, imitation learning can be used as part of a chess engine, for example, if you also implement traditional searching methods, and add imitation learning on top of it.

Conclusion

Congrats! You have now made your own AI chess engine from scratch, and I hope you learned something along the way. You can constantly make this engine better if you want to improve it, and make sure it beats better and better competition.

If you want to full code, check out my GitHub.

This tutorial was originally written part by part on my Medium, you can check out each part here:

Part 1: Generating the dataset
Part 2: Encoding with the AlphaZero method
Part 3: Training the model

If you are interested and want to learn more about similar topics, you can find me on:

✅ Medium
✅ Twitter
✅ LinkedIn

How to Fine-Tune the Donut Model – With Example Use Case

Eivind Kjosbakken — Tue, 12 Sep 2023 17:59:17 +0000

The Donut model in Python is a model you can use to extract text from a given image. This can be useful in various scenarios, like scanning receipts, for example.

You can easily download the Donut model from GitHub. But as is common with AI models, you should fine-tune the model for your specific needs.

I wrote this tutorial because I did not find any resources showing me exactly how to fine-tune the Donut model with my dataset. So I had to learn this from other tutorials (which I'll share throughout this guide) and figure out issues myself.

These issues were especially prevalent as I did not have a GPU on my local computer So to simplify the process for others, I made this tutorial.

_Extract information from receipts. The picture was taken from [this Google Colab file](https://colab.research.google.com/drive/1NMSqoIZl39wyRD7yVjw2FIuU2aglzJi?usp=sharing#scrollTo=f7RoSOEXUa6i" rel="noopener) using a photo taken by me

Here's what we'll cover:

How to find a dataset to fine-tune with
Fine-tuning with Google Colab
How to change parameters
Fine-tuning locally

How to Find a Dataset to Fine-tune with

Finding a dataset online

To fine-tune the model, we need a dataset we will fine-tune with. If you want a simple solution, you can find a prepared dataset in this folder on Google Drive.

You should then copy this dataset over to your own Google Drive. Note that this was taken from this tutorial under the “Downloading and parsing SROIE” headline. The tutorial is a great read which inspired this article, as I wanted to create a more in-depth tutorial for fine-tuning the Donut model in Google Colab. So if you want a more in-depth look at generating the dataset, I recommend reading the tutorial above.

The dataset linked above may not necessarily be for your specific purpose. If you want to fine-tune a model to your specific needs, you either need to find a fitting dataset online, or create a dataset yourself.

Annotating your own dataset

This is another option if you can't or don't want to find a dataset online (so if you did that, you can ignore this subsection).

Annotating your own dataset is a surefire way to create a dataset that perfectly fits your needs.

There are many annotating tools online, but a free one I recommend is the Sparrow UI data annotation tool. Here you can upload your image, put bounding boxes on the image, and label each bounding box. You can then extract the labeled data in JSON format, and use it following the rest of the tutorial.

Make sure your dataset is in the same format as the dataset I provided earlier. For more details on annotating data with the Sparrow UI, you can check out my article on using the Donut model for self-annotated data. Note that this article assumes you are already able to finetune the Donut model (which you will learn in this article).

Annotating a receipt with the Sparrow UI data annotation tool

Fine-Tuning with Google Colab

To make the fine-tuning process as simple as possible, I provided a Google Colab file you can use here. (Some code is taken from this GitHub page).

Note that package versions need to be exactly as provided in the Drive, as wrong package versions were the root of a lot of the problems I faced fine-tuning the Donut model myself.

Before fine-tuning using the Google Colab file, there are 2 things you need to do:

Upload data to your Google Drive.

Upload the dataset I provided earlier to your Google Drive under a parent folder called preparedFinetuneData (see the file structure in the image below).

Make sure to add the parent folder in the root folder for your Google Drive. Also, download this config file and add it to the root folder of your Google Drive.

How your dataset should look in the root folder of Google Drive

Link your Google Drive to your Google Colab.

When you run the cell which mounts the Google Drive, you might get a prompt, in which case you can just accept it and ignore the rest of this paragraph.

If you do not get a prompt, press the files icon (red in the image below), and the Mount Drive Icon (blue in the image below). Then you will get a code snippet that you can run, and now your Google Drive is connected.

Note that if you have not connected Google Colab to Google Drive before, you have to log into your Google Drive after pressing the Drive icon, and give permission for Colab to access Drive (prompts for this should appear automatically when you try to link the Drive)

Files icon (red). Mount Google Drive (blue)

Finally, restart your runtime. After altering files on Google Colab, you always have to restart your runtime to see the latest updates.

How to Change Parameters

Great! Now you can run the cells in the notebook, and you should receive a fine-tuned model. Remember you can also change the Config parameters to, for example, train for longer, use more workers, and so on.

Example of Config parameters you can change.

Note that I am working with the Donut model fine-tuned on the CORD dataset, as I want to be able to read receipts. You can also find other Donut models here, with the other options being document parsing, document classification, or document visual question answering (DocVQA).

Fine-tuning Locally

Fine-tuning can also be run locally, which will be mostly relevant for you if you have a GPU, as CPU training will take a long time.

To run locally you have to:

First, clone this GitHub repository
Add the prepared fine-tuning dataset to the root folder.
If you want to save the fine-tuned model, add the line below to train.py line 164, right below trainer.fit(…)

#...
trainer.save_checkpoint(f"{Path(config.result_path)}/{config.exp_name}/{config.exp_version}/model_checkpoint.ckpt")
#...

You then need to comment out GPU processes in the PyTorch Lightning Trainer, and add the line: accelerator=”cpu”:

#train.py file
#... 
trainer = pl.Trainer(
        #Comment out the lines above
        # num_nodes=config.get("num_nodes", 1),
        # devices=torch.cuda.device_count(),
        # strategy="dp",
        # accelerator="gpu",
        accelerator="cpu", #TODO add this line
        plugins=custom_ckpt,
        max_epochs=config.max_epochs,
        max_steps=config.max_steps,
        val_check_interval=config.val_check_interval,
        check_val_every_n_epoch=config.check_val_every_n_epoch,
        gradient_clip_val=config.gradient_clip_val,
        precision=16,
        num_sanity_val_steps=0,
        logger=logger,
        callbacks=[lr_callback, checkpoint_callback, bar],
    )
#...

Make sure the max_epochs parameter in your Config file is set to -1 (if not you will get a division by 0 error). You can decide training time by setting the parameter _maxsteps.
You can then run fine-tuning can then be run with the following command in the terminal:

python train.py --config config/train_cord.yaml

Where _traincord.yaml is the Configuration file you want to use.

Running on CPU

If you are running on CPU after all, you will encounter some problems unless you make some changes:

donut/train.py, change the accelerator parameter to “cpu” (from “gpu”), and remove the parameters: _numnodes, devices, and strategy).
Then in your Config file (for example _traincord.yaml), set _maxepochs to -1, and then specify the parameter _maxsteps. This is because you will encounter a division by 0 error if you have _maxepoch larger than 0

After these changes, running on a CPU should work as well.

Conclusion

In this article, I have shown you how to easily fine-tune the Donut model using your own data, something which will hopefully result in improved accuracy for your fine-tuned Donut model.

The applicabilities of the Donut model are many, and this is just one way to use it, which I hope is useful.

If you are interested and want to learn more about similar topics, you can find me on:

Eivind Kjosbakken - freeCodeCamp.org

How to Fine-Tune EasyOCR with a Synthetic Dataset

Table of Contents

Prerequisites

How to Install Required Packages

How to Clone the Git Repository

How to Get a Dataset

Option 1 – Use my dummy dataset:

Option 2 – Download a dataset

How to Generate your Synthetic Dataset

Clone the Synthetic Generation Repo

Create a File to Generate the Synthetic Data

Generate the Synthetic Data

How to Convert the Dataset to LMDB Format

How to Retrieve a Pre-trained OCR Model

How to Run the Fine-tuning

How to Run Inference with your Fine-tuned Model

A Qualitative Test of Performance

Quantitative Test of Performance

Conclusion

Create a Self-Playing AI Chess Engine from Scratch with Imitation Learning

Prerequisites

Table of Contents

Part 1: How to Generate a Dataset

How to download Stockfish

How to incorporate Stockfish with Python

How to generate a dataset

How to review a mined game

Part 2 : How to Encode Data

How to install and import packages

How to encode board positions and moves

How to create functions for encoding moves

How to create a function for encoding positions

How to automate encoding for all raw data files

Part 3: How to Train the AI model

How to import packages

How to Install CUDA

How to create encoding methods

How to load the data

How to define the deep learning model

How to train the model

How to test your model

Yourself vs AI

Stockfish vs your AI

Reflection on the performance of the chess engine

Conclusion

How to Fine-Tune the Donut Model – With Example Use Case

Here's what we'll cover:

How to Find a Dataset to Fine-tune with

Finding a dataset online

Annotating your own dataset

Fine-Tuning with Google Colab

Upload data to your Google Drive.

Link your Google Drive to your Google Colab.

How to Change Parameters

Fine-tuning Locally

Running on CPU

Conclusion