12/21/2020 | 11 min read

Automate the Web - Basics

A beginner tutorial that shows how you can start automating the web using JavaScript.

Automate the Web Series

This tutorial series is aimed at beginners and intermediate programmers who have yet to fully utilize JavaScript in their everyday lives for automating tasks. We’ll mostly be automating tasks that I’ve needed to automate, but these concepts apply anywhere.

What You’ll Learn

The 20% of code that covers 80% of automation use cases
How to leverage the power of delayed recursive loops
How to write quick automation scripts and be able to reuse them again later all from your browser

The Setup and Requirements

A web browser, preferably Chrome or Firefox
Basic JavaScript Knowledge—I assume you’re at least a beginner learning programming and have some knowledge about functions, variables, etc
An idea or goal for something you want to automate

In this example, I’ll be using Chrome but you can use any browser and you’ll see why I use Chrome soon. The scripts will for Udemy and Lichess and if you want to follow along, you need an account on both sites. You can also find something else to automate with a focus on repeated clicking and any kind of confirmation modal/popup.

Script 1: Udemy Progress Reset

This first script is for Udemy and we’ll be using the free Web Design for Developers course as an example. The script will simply uncheck any completed lectures, effectively resetting your progress at the click of a button.

Why does this need to be automated?

I ran into the need for this script when I was visiting an old Udemy course and wanted to reset the progress. There used to be a reset button but at the time of writing this, it’s nowhere to be found. I can handle manually clicking maybe twenty lectures but not when there are hundreds of lectures.

Scoping the DOM

The manual way to reset progress is pretty self-explanatory, simply click the checkbox next to a completed lesson… then repeat for every single lecture. You can find the list of lectures under the “Course content” tab below the video player while watching lessons. (you need to click “go to course” from the course page to get to this list)

Since we know we’re working with checkboxes and how to do the task manually, we can plan the steps for our tiny program.

Find all checkbox elements that are checked
While looping through the list of checkboxes, click each one

It sounds simple and it is simple. There’s just one gotcha… checking or unchecking the checkbox is an asynchronous process—it sends information to the server, the server updates the completed lectures on your account and if it’s successful, the server sends back an OK, and only then does the checkbox update. You can see this in action if you disconnect your internet and try to change the checkbox by clicking it.

All this means is that we need to wait before checking the next checkbox to avoid flooding the server with requests and running into errors. We’ll explore the options we have in the code when we get to this point.

Developer Tools

It’s time to open your browser’s developer tools and start inspecting the document object model (DOM). There are several shortcuts to open them:

CTRL + SHIFT + i
CMD + SHIFT + i
F12

Or from the interface:

Chrome: 3 dots menu at top right > More Tools > Developer Tools
Firefox: Hamburger menu at top right > Web Developer > Toggle Tools
Chrome and Firefox: Right click -> Inspect Element

Find the Checkboxes

From your developer tools, head to the Elements tab to see the DOM tree, click the mouse icon in the top left, then hover over one of the checkboxes. It’ll then scroll to where that checkbox exists in the HTML. When scoping the DOM like this we want to look for anything we can use to target only the things we want i.e. unique classes, container elements, etc. Here, it’s very easy. We have a container element and our targets are inputs of type checkbox, which is easy to grab.

Writing the Code

We’ll grab all checkboxes that are nested in the <div data-purpose="curriculum-section-container"> (see screenshot above), this element doesn’t have any classes or id but this is fine because we’ll use the data-purpose attribute on it. Alternatively, you could go up to the next parent that has a class or id.

Firefox Users: Head over to the console tab to begin writing the JavaScript.

Chrome Users:

Chrome has something better than just using the console, Snippets. Instead of going to the Console tab, go to “Sources” and from there click “Snippets” at the top left, and finally “New snippet”. The benefit of snippets is that you can save your code in the browser and it’s much easier to rerun code while developing it. Here’s a detailed guide to get there if you’re having trouble finding it.

Note that while writing code on your own you’ll likely need to go back and forth between the Elements and Console tab

Grab the Container Element, then the checkboxes

We don’t want every checkbox on the page so we want to first grab the container element then all checkboxes nested within. To do this we’re going to use the document.querySelector(...) method. It takes in a CSS Selector and returns the first element it finds that matches the selector.

Mini-Challenge: Which selector do you think we should use? (check the link of CSS Selectors above)
Reminder of what our container element looks like: <div data-purpose="curriculum-section-container">

…

Our container doesn’t have any unique class or id and targeting all <div> elements would select too many things here. Luckily, it has an attribute data-purpose which we can target using an attribute selector.

document.querySelector("[data-purpose=curriculum-section-container]");

This gives us the container element, but we have no use for it besides selecting the checkboxes, so before making it a variable we’ll chain another selector onto the end. This time we want to use querySelectorAll, which returns a NodeList of all matching elements instead of just one element.

// use 'var' instead of 'let' or 'const' to avoid "variable is already defined" issues
var checkboxes = document
  .querySelector("[data-purpose=curriculum-section-container]")
  .querySelectorAll("input[type=checkbox]:checked");

console.log(checkboxes); // NodeList of all visible checked checkboxes

The selector for the checkboxes is a little more advanced than the first selector. We use an element selector to target all inputs, then we narrow it further by saying inputs that also have the type attribute equal to checkbox, and even further by saying only checked checkboxes. The above code successfully gives us all visible checkboxes, but there’s a bug… or a feature?

When we run the code, it skips any checkboxes whose section is collapsed (i.e. checkboxes that are not on screen). One solution would be to call it a feature, since it allows you to choose which sections you want to reset by collapsing the others. Another solution would be to add more code to first expand all sections, then grab the checkboxes. I’ll be going with the first option and leave the second as another challenge for you.

Loop over the checkboxes and click them

Remember how I mentioned we need to wait for the server?

…checking or unchecking the checkbox is an asynchronous process—it sends information to the server, the server updates the completed lectures on your account and if it’s successful, the server sends back an OK, and only then does the checkbox update…
All this means is that we need to wait before checking the next checkbox to avoid flooding the server with requests and running into errors. We’ll explore the options we have in the code when we get to this point.

Well, we’re at that point and I think it’s probably easiest if we do it the error-prone way first then the better way after. So let’s jump right into it.

There are plenty of ways to loop in JavaScript, but we’ll be using the .forEach(fn) method which uses a function that’s passed the currentValue and index. You can use a for loop or something else if you prefer.

var checkboxes = document
  .querySelector("[data-purpose=curriculum-section-container]")
  .querySelectorAll("input[type=checkbox]:checked");

// arrow function (faster than writing "function")
checkboxes.forEach((checkbox, i) => {
  console.log(checkbox, i);
});

// normal function (slower to write)
// checkboxes.forEach(function (checkbox, i) => {
//     console.log(checkbox, i);
// });

Next, clicking the checkboxes is very simple using the element.click() method.

checkbox.click();

Before clicking we need to wait for the previous click to be finished so we’ll wrap it in a setTimeout(fn, delay) to delay it but the delay amount has to be dynamic otherwise they’ll all trigger at the same time. We can use the i variable for this.

// grab all *checked* checkboxes in the curriculum section
var checkboxes = document
  .querySelector("[data-purpose=curriculum-section-container]")
  .querySelectorAll("input[type=checkbox]:checked");

// loop over the checkboxes, queuing them to be clicked after a delay
// based on their position in the NodeList (i)
checkboxes.forEach((checkbox, i) => {
  // calls function after delay in milliseconds
  setTimeout(() => checkbox.click(), i * 50);
});

// TIMES
// the first checkbox is at position 0 = (0 * 50 = 0) = starts immediately
// 2nd checkbox = 1 * 50 = 50 ms delay
// 3rd = 2 * 50 = 100 ms delay
// etc

// BAD - these would all trigger after 1 second at the same time
// checkboxes.forEach(checkbox => {
//     setTimeout(() => checkbox.click(), 1000);
// });

If we run it, it will work… assuming nothing goes wrong on the server and it can complete fast enough for our 50ms delay. You can increase the delay if your internet is slow but the wait time also increases. So this method is error prone and should only be used sparingly for asynchronous tasks.

Try running this in your browser to see why we need to use the i variable in our setTimeout delay.
[1,2,3].forEach((n) => setTimeout(() => console.log(n), 1000))

The better method will be covered in the next script below and left as a challenge for you to implement for this one. If you’re not up for the challenge, you can find the solution at the bottom of this article.

Script 2: Lichess Interactive Study Converter

This is a pretty niche script that few people would actually use, but the focus is going to be learning some new techniques using functional loops and dealing with modals. I’m assuming you already know the basics from the first script (querying the DOM, and a few methods like .click).

What the Script does

On Lichess (a chess website) it has community made, interactive studies for studying openings and games. There are various types of studies, for example, you can read annotations left by the creator of the study and hit a “next” button to scroll through it like a book, this is the “normal” mode. Another mode is “interactive lesson” that has you play the moves instead of just reading. I prefer the interactive lessons and luckily you can convert a normal study to an interactive one, but it’s a tedious process. (I’ll show the process in a gif below)

If you want to follow along, you must make a Lichess account and follow the steps below to clone a study.

After making an account and logging in, go to Lichess/study
Pick any study you want to use, one that’s not already an interactive lesson i.e. Traps with 1.e4
Clone the study by navigating to the sharing tab at the bottom and click “CLONE”.

Now, in your cloned version of the study, you have access to more options. The ones we’ll be focusing on are accessed by clicking the gear icon next to each chapter in the left pane. (see gif below for the full process that we’ll be automating)

Converting to an Interactive Study

lichess-convert-to-interactive-study

The process in code is going to be something like the following:

Find all chapters > gear buttons and loop through each doing these steps

Click the gear button
Wait for the modal to appear (we can’t continue until the modal is available)
Change the value for the “Analysis mode” input to “Interactive lesson”
Click save
Wait for the modal to disappear—in this case, we can skip this step because it always closes even if there’s an error or no internet connection.
Repeat steps

Scoping the DOM

Next, we can make a list of all the selectors we’re going to use based on the steps above. I explained this process for the first script, so if you’re following along I encourage you to find some selectors on your own as a challenge. (skip ahead if not coding along) We need selectors for every element we plan to interact with or monitor which includes the gear buttons, chapter modal/popup, analysis mode input, and the save button.

…

!! Challenge Solution Ahead !!

…

Selectors

Gear Buttons: document.querySelectorAll('.study__chapters act')
Chapter Modal/Popup: document.querySelector('.study__modal')
Analysis Mode: document.getElementById('chapter-mode')
Modal Save Button: studyModal.querySelector('.form-actions > button:nth-child(1)')

Note: You may be tempted to select the <form> element and trigger a submit event on it, but this will not work as Lichess likely calls event.preventDefault() when you click the button and handles it differently than the default way i.e. it works like most single page apps.

Writing the Code

We’ll start the same as usual, grabbing the main elements we’ll be looping over; the gear buttons in this case. Again, using var to avoid any “variable has already been defined” errors when running code in the console.

// all gear buttons for the chapters
var chapterOptionButtons = document.querySelectorAll(".study__chapters act");

Next, there are multiple options we can change in the chapter options modal and in the future we might want our code to change those too. With that in mind, I’m going to set up an options object that can be added to later. Notice that the key is the same as the id of the analysis mode input we found earlier.

// "elementId": "value"
var defaultOptions = {
  // available options: normal, practice, conceal, gamebook
  "chapter-mode": "gamebook",

  // can optionally add the orientation | white or black
  // "chapter-orientation": "white"
};

The Loop

Because we know we’re going to be waiting for the modal to popup and for asynchronous requests, a normal for loop won’t cut it. We’re going to use a flexible function loop that uses an index value i and has one exit case—it looks very much like a for-loop.

// function loop
function loopChapterOptions(i = 0) {
  // always start with an exit case to avoid infinite loops
  if (i > chapterOptionButtons.length - 1) return;
}

It starts at 0 like most loops by using a default parameter and exits by using return when we reach the end of the gear buttons. We can loop by saying loopChapterOptions(i + 1) but we’re going to do that elsewhere. This function will only be in charge of clicking the optionsButton[i] and call a function that handles the next step.

As a reminder, here are the steps we wrote above:

Click the gear button (the loop handles this)
Wait for the modal to appear (we can’t continue until the modal is available)
Change the value for the “Analysis mode” input to “Interactive lesson”
Click save
Wait for the modal to disappear
Repeat steps

// all gear buttons for the chapters
var chapterOptionButtons = document.querySelectorAll(".study__chapters act");

// "id": "value"
var defaultOptions = {
  // available options: normal, practice, conceal, gamebook
  "chapter-mode": "gamebook",
  // "chapter-orientation": "white"
};

// loop over the gear buttons, clicking each
function loopChapterOptions(i = 0) {
  // always start with an exit case to avoid infinite loops
  if (i > chapterOptionButtons.length - 1) return;

  const optionsBtn = chapterOptionButtons[i];

  optionsBtn.click();
  waitForModal(i);
}

That’s it for our loop function. It clicks the current options button and triggers the next step. We pass along the i variable because we’ll need it to continue the loop later with loopChapterOptions(i + 1). Let’s create the waitForModal function next.

Alternatively, you can make the i variable global or refactor it in a way you prefer.

function waitForModal(i) {
  const studyModal = document.querySelector(".study__modal");

  if (!studyModal) {
    return setTimeout(() => waitForModal(i), 250);
  }

  setOptionsAndContinue(studyModal, i);
}

We can check if the modal exists by using !studyModal because document.querySelector() will return null if it’s not found. If it’s null then we make use of the setTimeout function from earlier to call the waitForModal() function again until it exists. After we confirmed the modal exists, we move on to the next step, setting the options and continuing the loop. (again passing along the i value and this time the studyModal element too)

Pay close attention to where you put return statements as these are what exit the function. If you miss one, you’ll quickly run into errors.

There is an error that can occur here, if your internet disconnects while it’s waiting for the modal, it’ll just keep waiting forever even if your internet reconnects. To fix this, we can keep track of the number of attempts (how many times has waitForModal been called) and either exit or retry clicking the options button.

// calls itself with a delay until the modal appears
// retry clicking after X number of attempts
function waitForModal(i, attempt = 1) {
  console.log("waiting for modal: ", i, attempt);

  const studyModal = document.querySelector(".study__modal");

  // retry in 250 milliseconds if the modal hasn't appeared
  if (!studyModal) {
    // exit and retry clicking (for connection issues)
    if (attempt > 20) return loopChapterOptions(i);

    return setTimeout(() => waitForModal(i, attempt + 1), 250);
  }

  // if the modal is visible then set options and continue the loop
  setOptionsAndContinue(studyModal, i);
}

In the above code, it will wait until twenty attempts have been made then retry clicking by calling loopChapterOptions() with the same i value. With a delay of 250ms, twenty attempts add up to five seconds, which seems reasonable.

Finally, we can create the setOptionsAndContinue() function and test if it all works. Remember to pass along the arguments studyModal, i, options and you may consider making i a global variable or a similar refactor, since it’s kind of annoying passing it along like this. These scripts aren’t intended to be good code, they’re usually quick and dirty scripts we write quickly and use once or twice.

// options can be passed in or just use the global defaultOptions
function setOptionsAndContinue(studyModal, i, options = defaultOptions) {
  const saveButton = studyModal.querySelector(
    ".form-actions > button:nth-child(1)"
  );

  // set options
  // TODO: set options

  // save and close the modal
  // the modal always disappears even if there is no internet connection, so we dont have to wait for it
  saveButton.click();

  // continue the loop with a slight delay for good measure
  setTimeout(() => loopChapterOptions(i + 1), 100);
}

loopChapterOptions();

So far in the above code, there’s nothing new… we select the saveButton, click it, and continue the loop. The last step is to actually set the options, and this is your final challenge. (skip ahead if not coding along) Use the options object from before to dynamically select each element and change its value.

// (example options object)
var options = {
  "chapter-mode": "gamebook",
  "chapter-orientation": "white",
};

…

!! Challenge Solution Ahead !!

…

function setOptionsAndContinue(studyModal, i, options = defaultOptions) {
  const saveButton = studyModal.querySelector(
    ".form-actions > button:nth-child(1)"
  );

  // set options
  for (const [elementId, desiredValue] of Object.entries(options)) {
    const input = document.getElementById(elementId);

    input.value = desiredValue;
  }

  // save and close the modal
  // the modal always disappears even if there is no internet connection, so we dont have to wait for it
  saveButton.click();

  // continue the loop
  setTimeout(() => loopChapterOptions(i + 1));
}

Okay, if you’re a beginner, I used something you probably haven’t seen before, or only rarely:

for (const [elementId, desiredValue] of Object.entries(options))

First, what does Object.entries(options) return? It converts an object to an array formatted as [ [key, value], [key, value], ... ]

Object.entries({
  "chapter-mode": "gamebook",
  "chapter-orientation": "white",
})[
  // becomes
  (["chapter-mode", "gamebook"], ["chapter-orientation", "white"])
];

The benefit of this is that we can then use Array Destructuring in combination with a for..of loop to create variables for the key/value pair. Of course this is not the only way to do it, this is just how I prefer to do it because it’s short and quick to type while still giving us readable code. Alternatively, you could use a normal for loop and Object.keys.

Our Final Code

// all gear buttons for the chapters
var chapterOptionButtons = document.querySelectorAll(".study__chapters act");

// "id": "value"
var defaultOptions = {
  // available options: normal, practice, conceal, gamebook
  "chapter-mode": "gamebook",
  // "chapter-orientation": "white"
};

// loop over the gear buttons, clicking each
function loopChapterOptions(i = 0) {
  // always start with an exit case to avoid infinite loops
  if (i > chapterOptionButtons.length - 1) return;

  const optionsBtn = chapterOptionButtons[i];

  optionsBtn.click();
  waitForModal(i);
}

// calls itself with a delay until the modal appears
// retry clicking after X number of attempts
function waitForModal(i, attempt = 1) {
  console.log("waiting for modal: ", i, attempt);

  const studyModal = document.querySelector(".study__modal");

  // retry in 250 milliseconds if the modal hasn't appeared
  if (!studyModal) {
    // exit and retry clicking (for connection issues)
    if (attempt > 20) return loopChapterOptions(i);

    return setTimeout(() => waitForModal(i, attempt + 1), 250);
  }

  // if the modal is visible then set options and continue the loop
  setOptionsAndContinue(studyModal, i);
}

function setOptionsAndContinue(studyModal, i, options = defaultOptions) {
  const saveButton = studyModal.querySelector(
    ".form-actions > button:nth-child(1)"
  );

  // set options
  for (const [elementId, desiredValue] of Object.entries(options)) {
    const input = document.getElementById(elementId);

    input.value = desiredValue;
  }

  // save and close the modal
  // the modal always disappears even if there is no internet connection, so we dont have to wait for it
  saveButton.click();

  // continue the loop
  setTimeout(() => loopChapterOptions(i + 1));
}

loopChapterOptions();

After running this in the console on our study page, we’ll see a quick flash of the options modal as it changes every chapter in a split second.

Exercises / Practice

Udemy Script: Add an option to complete parts of a course instead of resetting progress
Udemy Script: Refactor the code so it waits for the checkbox to change before going onto the next (waiting for the server to finish instead of using a setTimeout)
Find another site to automate using the techniques you learned here
Experiment with refactoring the code in different ways (there are many ways to do the same thing)
Append a button to the page and run the script when it’s clicked

Challenge Solutions

Add a Complete All to the Udemy script

// using the quick and simple error-prone method
function checkAll() {
  var checkboxes = document
    .querySelector("[data-purpose=curriculum-section-container]")
    .querySelectorAll("input[type=checkbox]");

  checkboxes.forEach((checkbox, i) => {
    if (!checkbox.checked) {
      setTimeout(() => checkbox.click(), i * 50);
    }
  });
}

checkAll();

Refactor the Udemy script to wait for the checkboxes to change (waiting for the server to finish instead of using a `setTimeout` with a guess value)

// all *checked* checkbox elements within the "curriculum section"
var checkboxes = document
  .querySelector("[data-purpose=curriculum-section-container]")
  .querySelectorAll("input[type=checkbox]:checked");

// a functional loop
function loopCheckboxes(i = 0) {
  // no more checkboxes - our exit case
  if (i > checkboxes.length - 1) return;

  const checkbox = checkboxes[i];

  checkbox.click();
  setTimeout(() => waitForCheck(checkbox, i), 50);
}

// wait for the checkbox to be unchecked
// keep track of the attempts to retry clicking after a number of failures
function waitForCheck(checkbox, i, attempt = 1) {
  // retry clicking after 5 seconds (for network issues)
  if (checkbox.checked && attempt >= 20) {
    return loopCheckboxes(i);
  } else if (checkbox.checked) {
    // if the checkbox is still checked keep waiting, pausing the loop
    return setTimeout(() => waitForCheck(checkbox, i, attempt + 1), 250);
  } else {
    // continue loop on next checkbox (if we didnt exit in the above if-statement)
    return loopCheckboxes(i + 1);
  }
}

// start the loop
loopCheckboxes();

// ERROR PRONE WAY
// checkboxes.forEach((checkbox, i) => {
//     setTimeout(() => checkbox.click(), i * 50);
// });

Closing Message

I hope you enjoyed this tutorial and learned something! The code here can be used for most automation tasks, and I encourage you to find something to automate on your own for practice. If you followed along and ran into any errors that you can’t figure out, share your code and let me know.

Automate The Web: The Basics

Automate the Web - Basics

The Setup and Requirements

Script 1: Udemy Progress Reset

Why does this need to be automated?

Scoping the DOM

Writing the Code

Chrome Users:

Script 2: Lichess Interactive Study Converter

What the Script does

Scoping the DOM

Writing the Code

Exercises / Practice

Challenge Solutions

Add a Complete All to the Udemy script

Refactor the Udemy script to wait for the checkboxes to change (waiting for the server to finish instead of using a setTimeout with a guess value)

Closing Message

Refactor the Udemy script to wait for the checkboxes to change (waiting for the server to finish instead of using a `setTimeout` with a guess value)