How I taught a neural network to play doodle jump (Attempt 1)

Recently, I had an idea. What if I tried to train a neural network using classification to teach it to play Doodle Jump? The idea was: play the game myself, taking screenshots of the game and naming each screenshot according to the key pressed at that moment. For simplicity, I used only Left, Right, and no key press. Then I generated many such images using a browser and Playwright, loaded them into a convolutional neural network, created a model, and implemented the same method for Playwright to press what the model predicts.

Code

Below I’ll provide the code. Besides the data, I ended up with 2 files: main.py and train.ipynb

To make the code work, you need to install playwright and fastai

pip install playwright fastai

main.py

import uuid
from playwright.sync_api import sync_playwright
from fastai.vision.all import *

def train():
    with sync_playwright() as p:
        browser = p.webkit.launch(headless=False)
        page = browser.new_page(viewport=None)
        page.goto("https://doodlejumporiginal.com/")
        page.evaluate("""
document.addEventListener('keyup', e => {
    document.pressed = null
    console.log(e)
})

document.addEventListener('keyup', e => {
    document.pressed = e.key
})
                      """)
        loc = page.locator("canvas")
        while True:
            button = page.evaluate("document.pressed")
            loc.screenshot(
                path="./screenshots/"+ str(button) + "_"+ str(uuid.uuid4()) +".png",
                caret="initial",
                scale="css",
                animations="allow",
                quality=3,
                type="jpeg", 
            )

def make_label(x): return (x.split("_"))[0] 

def play():
    learn_inf = load_learner('model.pkl', cpu=True)
    with sync_playwright() as p:
        browser = p.webkit.launch(headless=False)
        page = browser.new_page(viewport=None)
        page.goto("https://doodlejumporiginal.com/")
        loc = page.locator("canvas")
        key = "Backquote"
        while True:
            scr = "./screenshotsVal/"+ str(uuid.uuid4()) +".png"
            loc.screenshot(
                path=scr,
                caret="initial",
                scale="css",
                animations="allow",
                quality=3,
                type="jpeg", 
            )
            key = learn_inf.predict(scr)
            key = key[0]
            print(key)
            if(key == "None"):
                key = "Backquote"
            key = key + "+"
            page.keyboard.press(key*200 + "Backquote")


play()
# or train() to generate screenshots for training

train - generated screenshots for subsequent model training play - played Doodle Jump. You need to click play in the opened browser for the game to start

The sequence is as follows:

train() - to generate screenshots
run train.ipynb to train the model
play() - for the neural network to play the game

The training notebook looked like this:

train.ipynb

from fastai.vision.all import *

# prepare data
def make_label(x): return (x.split("_"))[0] 

dls = ImageDataLoaders.from_name_func('.',
        get_image_files("screenshots"), valid_pct=0.2, seed=42,
        label_func=make_label,
        item_tfms=Resize(192)
    )

dls.train.show_batch(max_n=4, nrows=1, unique=True)

# fine tune model
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

# export model
learn.export('model.pkl')

# see confusion matrix
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

After running the game, the best result achieved was just over three thousand points, which is quite poor.

What conclusions I drew

The neural network classified the current move, but this is not the best next move. At some point, it understood that if the Doodle’s nose is turned right, it’s the Right category, and if left, it’s the Left category. It then started jumping only right or only left. The next step would be to think about what classification or metric to come up with so that it represents the best next move. Also need to figure out how to specify how many times to press left or right. A single press moves the Doodle a couple of pixels. By default, I hardcoded 200 presses.

How I taught a neural network to play doodle jump (Attempt 1)

Code

What conclusions I drew

Game Video

More posts

On this page

Share