Reload WebExtension when page is dinamically changed

guimarac · June 22, 2016, 12:53pm

Hello all. I am trying to write my first add-on using WebExtensions (it is quite complex for a first extension). I need to process the results of all Google searches within a session (session = queries using the same Google page). I have the following code:

manifest.json:

{
  "manifest_version": 2,
  "name": "MyAddon",
  "version": "0.0.1",

  "description": "My Add-on",
  "icons": {
      "48": "icons/icon-48.png"
  },

  "applications": {
      "gecko": {
          "id": "myaddon@local",
          "strict_min_version": "42.0"
      }
  },

  "content_scripts": [
    {
      "matches": ["*://www.google.com/*"],
      "js": ["index.js"]
    }
  ]
}

index.js:

var pageUrl = window.location.href;

var req = new XMLHttpRequest();
req.open("GET", pageUrl, true);
req.onreadystatechange = function() {
    if(req.readyState == 4 && req.status == 200) {
        handleResponse(req.responseText);
    };
};
req.send();

function handleResponse(pageContent) {
    // Do some processing ...
}

It does exactly what I need for the first search (using the address bar). However, the add-on loads the first time I enter a Google page, does what it has to, then stops. It only reloads when I refresh the page or make another query from the address bar.

I need to find a way to reload the add-on when I make another search using the search field and do the processing again, but without losing the work from the previous searches. I suppose I’ll have to use cookies, but I’m not there yet.

I tried to use req.addEventListener("load", someFunction) instead of/besides req.onreadystatechange(), but it didn’t solve the problem. Any ideas?

Thank you!

gorhill · June 22, 2016, 1:18pm

Not sure what is the purpose of your extension, but what I can see so far it looks like you are over-complicating things.

Why load the page again using XMLHttpRequest? The page has already been loaded in the browser, so why not just parse the DOM to extract the information you seek? This way, it will be trivial to install a MutationObserver so that you are notified when the page content changes, and just reuse the same code to analyze the changed page.

guimarac · June 22, 2016, 1:26pm

My question may have been unclear. I don’t want to make another request, but rather do exactly what you said. Since I am new to add-on development (or any kind of web-related development, for that matter) I didn’t know about MutationObserver. I’ll try that. Thank you.

Lithopsian · June 22, 2016, 2:36pm

Yes, MutationObserver is what you need to detect when Google rewrites its page. Depending on what changes you need to take action on, it can be hard to work out how to detect them. Plus Google keeps changing the way they rewrite their search results, so for example picking out something by its class might work one day and not w week later. Acting on every DOM change is not likely to be practical because there can be hundreds of them just from typing s single character. Potentially your MutationObserver will also fire when your own addon changes the page

guimarac · June 23, 2016, 2:10pm

Depending on what changes you need to take action on, it can be hard to work out how to detect them.

I need to detect when the results to a new search are returned.

Potentially your MutationObserver will also fire when your own addon changes the page.

This is not going to be a problem, since my addon will not make any changes to the page. Indeed, there are hundreds of DOM changes, because the results are being retrieved as I type the query. This is going to be a problem.

Lithopsian · June 23, 2016, 2:31pm

Currently, Google Instant updates the entire search list with a single DOM insertion. A div is inserted into a node with an id of “search”. However, this has varied in the past, and I remember a time when each search result was inserted individually. There are also many other DOM changes. You’ll have to find a balance between excluding irrelevant changes and reliably detecting new search results even when Google tweaks their code.

gorhill · June 23, 2016, 3:06pm

There are always solutions for this. You could just use mutation events as a mere signal that the page needs to be re-analyzed by your code, and use a timer to coalesce all such signals into a single one which purpose is to trigger a re-analyzing of the page.

Lithopsian · June 24, 2016, 12:53pm

You should take advantage of the fact that the Mutation Observer batches up changes and sends you an array of them whenever it gets a break. So definitely don’t re-examine the whole DOM for each entry in the array because the DOM is already in the state from the last array entry. If you need to query the whole DOM for each mutation, then a first step to better performance is just to do it once per array instead of once per mutation. You might still get called several times quite quickly, but at least each time you are called the DOM parser is not 100% busy. It is also generally a good idea to background any non-trivial actions taken by an observer (if practical) since multiple observers are usually called synchronously and in order, potentially leading to a long busy loop.

There are also useful filters that you can apply when you add the observer. You can control where in the DOM you want to observe, whether you want to observe the whole tree beneath that point, and which types of mutations you are interested in.

Lithopsian · June 26, 2016, 12:00pm

I’m going to rattle on a bit more about the “synchronous” nature of observers, just to get it down on paper. Mutation Observers and regular observers are similar but also slightly different.

The nsIObserverService that has been around forever is a means of sending notifications outside of the event mechanism. Before multiprocess Firefox, they had the advantage of being a very simple way of communicating between all javascript scopes, without being tied to a particular DOM node or imposing the overhead of raising an event. They way they work is utterly trivial: a call to notifyObservers() simply calls, synchronously and in order, all the registered observer functions (actually objects implementing nsIObserver). You can see that if many observers each take just a few ms, this can end up being a very long delay for that piece of code, although that is less likely for custom notifications where you probably are the only observer (but still perhaps in lots of tabs or browsers) and you know what the notification context is. Unless it is important to you to execute synchronously, you should kick any significant work into a background task.

Mutation observers are newer. They are also designed as an alternative to DOM events, in this case a complete replacement for the deprecated DOM mutation events that have serious performance problems. They are often described as being ‘‘asynchronous’’, in the respect that they do not happen immediately that a DOM mutation takes place. Instead the observer is only called once the executing script completes. Then the observers are called synchronously, one by one, with an array of all the DOM mutations that have taken place. The observer notification call is actually a microtask, which means it happens immediately after the executing script, before any queued setTimeout, events, including redraws (although not promise callbacks which are also microtasks). This way, if you need to you can react synchronously to a DOM mutation before a user sees it, but in most cases it is still better to kick it to the background (eg. with setTimeout).

guimarac · June 27, 2016, 9:10am

Thanks a lot, Lithopsian.

jscher2000 · June 30, 2016, 4:21am

In case they are useful for reference, I have some userscripts that implement mutation observers for Google results updates. For example, this one checks new nodes for cite elements and attaches stuff to them:

https://greasyfork.org/en/scripts/1679-google-site-tool-site-results-exclude-sites

Note: since I am not a programmer, please do not use this as a reference model for good coding practices, just something that works well enough to help maintain my sanity.

guimarac · July 1, 2016, 9:09am

It is very useful. Thank you.