How I taught a neural network to play doodle jump (Attempt 1)

January 21, 2024

Recently, I had an idea. What if I tried to train a neural network using classification to teach it to play Doodle Jump? The idea was: play the game myself, taking screenshots of the game and naming each screenshot according to the key pressed at that moment. For simplicity, I used only Left, Right, and no key press. Then I generated many such images using a browser and Playwright, loaded them into a convolutional neural network, created a model, and implemented the same method for Playwright to press what the model predicts.

Code

Below I’ll provide the code. Besides the data, I ended up with 2 files: main.py and train.ipynb

To make the code work, you need to install playwright and fastai

pip install playwright fastai

main.py

import uuid
from playwright.sync_api import sync_playwright
from fastai.vision.all import *

def train():
    with sync_playwright() as p:
        browser = p.webkit.launch(headless=False)
        page = browser.new_page(viewport=None)
        page.goto("https://doodlejumporiginal.com/")
        page.evaluate("""
document.addEventListener('keyup', e => {
    document.pressed = null
    console.log(e)
})

document.addEventListener('keyup', e => {
    document.pressed = e.key
})
                      """)
        loc = page.locator("canvas")
        while True:
            button = page.evaluate("document.pressed")
            loc.screenshot(
                path="./screenshots/"+ str(button) + "_"+ str(uuid.uuid4()) +".png",
                caret="initial",
                scale="css",
                animations="allow",
                quality=3,
                type="jpeg", 
            )

def make_label(x): return (x.split("_"))[0] 

def play():
    learn_inf = load_learner('model.pkl', cpu=True)
    with sync_playwright() as p:
        browser = p.webkit.launch(headless=False)
        page = browser.new_page(viewport=None)
        page.goto("https://doodlejumporiginal.com/")
        loc = page.locator("canvas")
        key = "Backquote"
        while True:
            scr = "./screenshotsVal/"+ str(uuid.uuid4()) +".png"
            loc.screenshot(
                path=scr,
                caret="initial",
                scale="css",
                animations="allow",
                quality=3,
                type="jpeg", 
            )
            key = learn_inf.predict(scr)
            key = key[0]
            print(key)
            if(key == "None"):
                key = "Backquote"
            key = key + "+"
            page.keyboard.press(key*200 + "Backquote")


play()
# or train() to generate screenshots for training

train - generated screenshots for subsequent model training play - played Doodle Jump. You need to click play in the opened browser for the game to start

The sequence is as follows:

  • train() - to generate screenshots
  • run train.ipynb to train the model
  • play() - for the neural network to play the game

The training notebook looked like this:

train.ipynb

from fastai.vision.all import *

# prepare data
def make_label(x): return (x.split("_"))[0] 

dls = ImageDataLoaders.from_name_func('.',
        get_image_files("screenshots"), valid_pct=0.2, seed=42,
        label_func=make_label,
        item_tfms=Resize(192)
    )

dls.train.show_batch(max_n=4, nrows=1, unique=True)

# fine tune model
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)

# export model
learn.export('model.pkl')

# see confusion matrix
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

After running the game, the best result achieved was just over three thousand points, which is quite poor.

What conclusions I drew

The neural network classified the current move, but this is not the best next move. At some point, it understood that if the Doodle’s nose is turned right, it’s the Right category, and if left, it’s the Left category. It then started jumping only right or only left. The next step would be to think about what classification or metric to come up with so that it represents the best next move. Also need to figure out how to specify how many times to press left or right. A single press moves the Doodle a couple of pixels. By default, I hardcoded 200 presses.

Game Video

Tags:
ai game

More posts

Nov 11, 2025
Nov 11, 2025
Nov 11, 2025