Wednesday, November 25, 2009

JavaScript module loading, the browser and CommonJS

JavaScript module syntax and loading seems like a hot topic at the moment, and here are some thoughts about how to construct module syntax and a loader, with the goal of trying to get to a more universal approach for it. This will be discussed in the context of CommonJS, but browser-based module loaders have existed for a while. All are constrained by the browser in some fashion as listed below. I have a preferred solution, also described in this post.

First, a look at module syntax.

CommonJS is an umbrella for a few different things, including a spec for a module syntax and a standard library of modules. This post is just interested in its module syntax. A simple example of CommonJS syntax, defining an "increment" module defined in increment.js:

var add = require('math').add;
exports.increment = function(val) {
return add(val, 1);
};
How could we build a module loader with this syntax? Here are a couple of options:

1) You parse the module before executing it, looking for require calls. You make sure to fetch those modules and work out the right dependency order in which to execute the modules.

2) You just run the module and when require() is hit, do an synchronous IO operation to load the required module.

For both approaches, use a sandbox or specific context to make sure things like "exports" are defined separately for each module.

Both of those options are easy to implement on the server side since you have more control over the IO layer, and can create separate contexts for each module. However, in the browser, things are different. Creating a separate context for each module is tricky. For IO, there are two realistic approaches, and each has its difficulties:
  • XMLHttpRequest (XHR)
  • script tags
XHR allows us to do either approach, #1 or #2. It can get the contents of the module and parse it into a structure that pulls out the dependencies and we can sort out the right order to execute things. We could use sync XHR calls to accomplish #2, block when each require call is seen. However, sync XHR calls in the browser really hurt performance.

This is actually what the default Dojo loader has done for a very long time, and I believe some pathways in the Google's Closure library do the same thing. It is always recommended you do a custom build to combine all the modules you need into one file to cut out those XHR calls when you want to go to production.

So path #1 would make more sense with an XHR-based loader. However, for an XHR loader to work, it has to use eval() to bring the module into being. Some environments, like Adobe AIR do not allow eval(), and it makes debugging hard to do across browsers. Firefox and WebKit have a convention to allow easier eval-based debugging, but it is still not what I consider to be in keeping with traditional script loading in a browser.

Instead of eval, after the XHR call finishes its parsing and module wrapping for context, you could try to create a script tag that has a body set to the modified module source, but this really hurts debugging: if there is an error, the error line number will be some weird line in a gigantic HTML file instead of the line number of the actual module.

Dojo has a djConfig.debugAtAllCosts option that will use sync XHR to pull down all the modules, parse the for dependencies, work out the right load order, then load each module via a dynamically added script src="" tag. However, since IE and WebKit will evaluate dynamically added script tags out of DOM order -- they evaluate them in network receive order (which is nice for long-polling comet apps, but does not help module loading). So, each script tag has to be added one at a time, then wait for it to finish then add the next one. Not so speedy.

XHR is also normally limited to just accessing the same host as the web page. This makes it hard to use CDNs to load content, and get performance benefits with that approach. There is now support for xdomain XHR in most recent browsers, but IE prefers to use a non-standard XDomainRequest object, making our module loader more complicated. And xdomain XHR just plain does not work in older browsers like IE6.

So, an XHR-based loader is not so great.

Script tags are nice because they keep with the known script pathway in browsers -- easy to debug, and we can get parallel loading. However, we cannot do approach #2 in the browser: our JavaScript in the browser cannot access the module contents before they are evaluated. And since dynamically added script src="" tags via head.appendChild() are not a synchronous operation, approach #2 will not work.

So, really we need to do a variant of #1, pull out the dependencies needed by the module, then after those dependencies are loaded, execute the module. The way to do this in script: put a function wrapper around the module contents, and call a module loader function with a list of dependencies and the module function wrapper. Something like this, for a module with the name of "c" that has dependencies of "a" and "b", Here is a syntax (call it Variant A) for defining a module "c" with this approach:

loader(
"c",
["a", "b"],
function(a, b) {
//The module definition of "c" in here.
//return an object to define what "c" is.
return {};
}
);

or, another variant, call it Variant B:

loader({
name: "c",
dependencies: ["a", "b"],
module: function(a, b) {
//The module definition of "c" in here.
//return an object to define what "c" is.
return {};
}
});

Ideally, we would not have to tell the loader that this structure defines module "c" (the first arg in Variant A and the name: property in Variant B) -- the loader could work this out. Unfortunately, since script tags can load asynchronously and at least IE can trigger script.onload events out of order when compared to when the script is actually evaluated, we need to keep the module name as part of the module definition. This also helps with custom builds, where you can combine a few of these module definition calls into one script.

This approach is actually what Dojo's xdomain loader has done for a very long time, but with more verbose syntax. However, it requires a custom build to convert modules into this structure. The other option is to use a server-side process to convert the modules on the fly, but I do not feel that is keeping with the simplicity of normal browser development: just open a text editor, write some script, save, reload, no extra server config/process needed, besides maybe a vanilla web server.

So, I believe that modules should be coded by the developer in this module wrapper format. YUI 3 has taken this approach, and it is the approach I have taken for RunJS too. However, YUI 3 is limited to needing some module dependency metadata files to help it out. It also uses module names that do not map to the actual module's defined name/functions.

OK, back to CommonJS.

As it stands now, I believe the CommonJS format is not suitable for modules in the browser. There have been attempts to get it to work, but the attempts either use a sync XHR loader, or a "transform-on-the-fly" server process to convert the code to a module wrapper similar to Variant B.

I would rather see a module wrapper format that works with browser natively, that can be hand-authored by developers and that will work with CommonJS modules. CommonJS started out as ServerJS. As ServerJS, the case could be made that supporting browsers may not be an aim of a ServerJS module format. However, with the name change to CommonJS, I believe supporting browsers as a first class citizen is important for CommonJS to get more traction.

So the trick is to come up with a module syntax that has a function wrapper, but is not too wordy with boilerplate. We need some boilerplate, since we need a function wrapper. I believe RunJS has the right right approach. The boilerplate is very terse, basically Variant A mentioned above:

run(
"c",
["a", "b"],
function(a, b) {
//The module definition of "c" in here.
//return an object to define what "c" is.
return {};
}
);

I can see where there is some bikeshedding on the name "run". I think script() instead of run() is a viable alternative, and I may switch to that in the near future (and rename RunJS to ScriptJS).

I have attempted to engage the CommonJS community by putting up a proposal for an Alternate Module Format.

Progress has been slow, but to be expected: the CommonJS group is trying to do lots of other things like define a standard library and build out implementations. However, I am hopeful we can get something that works for the browser front end developers.

The ideal scenario is that some variant of the above syntax is just adopted as the only CommonJS module format. That would save a lot of conversion work, and I believe it makes things much simpler for CommonJS compliant loader. Right now, for CommonJS loaders there is a concept of a require() and require.async() and having to expose Promises for the async stuff. The above format neatly avoids the issue of whether the modules are loaded async or sync and avoids any need for Promises in the module loader. I think it is fine though for modules themselves to use Promises as part of individual module APIs, but at least the loader and module syntax stays simple.

I also do not believe a "module" variable needs to be defined for each module and an exports variable is avoided by returning an object from the module function wrapper.

I can appreciate that the CommonJS folks with modules already written may not like moving to the above syntax. I think it helps in the long run if we can just have one syntax, but in the meantime, I plan on doing the following:
  • Continue to engage the CommonJS community.
  • build out RunJS, probably rename to ScriptJS in the near future, and use script() instead of run()
  • Write a converter that converts Dojo modules to the RunJS/ScriptJS module syntax. I have something basic working, here is an example of Dojo's themeTester.html using RunJS-formatted dojo/dijit/dojox modules. That example is not bulletproof yet (I used a built version of Dojo which removes some dependency info) and i18n modules have not been converted either. RunJS also has built-in support for i18n modules.
  • Convert Raindrop to use RunJS-formatted dojo and convert the Raindrop modules to that format.
  • Override run.load()/script.load() in server environments so it could be used in CommonJS server implementations.
  • Work on a converter for existing CommonJS modules.
  • Use RunJS/ScriptJS as the module syntax for Blade and/or Dojo 2.0 efforts.
If module syntax/loading is important to you, then please join the discussion list for CommonJS, so we can sort this out. It would be great to get consensus on JavaScript module syntax and loading, and I think CommonJS is the area to do that.

I am happy to adjust some of the syntax in RunJS/ScriptJS to match some consensus, but I strongly prefer a terse format. The existing ones I have seen for server-converted CommonJS modules is too verbose for me, particularly for the common cases of defining a module with some dependencies.

2 comments:

J Chris A said...

I'm working out what to implement on the back end of CouchDB. This post has been helpful. I think the best browser based loader is jQuery's plugin convention. I know it doesn't offer safety in the sense of the stuff you are looking at, but it does have adoption on it's side.

James Burke said...

J Chris A: jQuery's plugin convention allows you to add methods to jquery's object and prototype, and that is still possible with RunJS. What jQuery lacks right now is way to assure that if you have code that depends on other code to make sure any dependencies are loaded in the right order.

For the usual types of jQuery projects it is normally not a big problem, but once the project scales up to larger functionality, the delayed loading, dependency management, and integrated build system to optimize HTTP requests become more important.

Thanks for all the great work on CouchDB!