Tagneto: April 2010

Thursday, April 29, 2010

A require() for jQuery

I had a fun time at the Bay Area jQuery Conference. Great people, and I learned some neat things.

In the conference wrap-up, John Resig mentioned some requirements he has for a jQuery script loader:

1) script loading must be async

2) script loading should do as much in parallel as possible. This means in particular, that it should be possible to avoid dynamic nested dependency loading.

3) it looks like a script wrapper is needed to allow #1 and #2 to work effectively, particularly for cross-domain loading. It is unfortunate, but a necessity for script loading in browsers.

I believe these requirements mesh very well with RequireJS. I will talk about how they mesh, and some other things that should be considered for any require() that might become part of jQuery.

Async Loading

As explained in the RequireJS Why page, I believe the best-performing, native browser option for async loading is dynamically created script tags. RequireJS only uses this type of script loading, no XHR.

The text plugin uses XHR in dev mode, but the optimization tool inlines the text content to avoid XHR for deployment. Also, the plugin capability in RequireJS is optional, it is possible to build RequireJS without it. That is what I do for the integrated jQuery+RequireJS build.

Parallel Loading

John mentioned that dynamic nested dependency resolution was slower and potentially a hazard for end users. Slow, because it means you need to fetch the module, wait for it to be received, then fetch its dependencies. So the module gets loaded serially relative to its dependencies. Potentially hazardous because a user may not know the loading pattern.

The optimization tool in RequireJS avoids the parallel loading for nested dependencies, by just inlining the modules together. The optimization tool can also build files into "layers" that could be loaded in parallel.

For each build layer, there is an exclude option, in which you can list a module or modules you want to exclude. exclude will also exclude their nested dependencies from the build layer.

There is an excludeShallow option if you just want specific modules to exclude, but still want their nested dependencies included in the build layer. This is a great option for making your development process fast: just excludeShallow the current module you are debugging/developing.

While dynamically loading nested dependencies can be slower than a full parallel load, what is needed is listing dependencies individually for each module. There needs to be a way to know what an individual file needs to function if the file is to be portable in any fashion. So the question is how to specify those dependencies for a given file/module.

There are schemes that list the dependencies as a separate companion file with the module, and schemes that list the dependencies in the module file. Using a separate file means the module is less portable -- more "things" need to follow the module, so it makes copy/pasting, just distributing one module more onerous.

So I prefer listing the dependencies in the file. Should the dependencies be listed in a comment or as some sort of script structure?

Comments can be nice since they can be stripped from the built/optimized layer. However, it means modules essentially need to communicate with each other through the global variable space. This ultimately does not scale -- at some point you will want to load two different versions of a module, or two modules that want to use the same global name, and you will be stuck. For that reason, I favor the way RequireJS does it:


require.def("my/module", ["dependency1"], function (dependency1) {
  //dependency1 is the module definition for "dependency1"

  //Return a value to define "my/module"
  return {
      limit: 500,
      action: function () {}
  };
});

With this model, dependency1 does not need to be global, and it allows a very terse way to reference the module. It also minifies nicely. By using string names to reference the modules and using a return value from the function, it is then possible to load two versions of module in a page. See the Multiversion Support in RequireJS for more info, and the unit tests for a working example.

This model also frees the jQuery object from namespace collisions by allowing a terse way to reference modules without needing them to hang off of the jQuery object. There are many utility functions that do not need to be on the jQuery object to be useful, and today the jQuery object itself is starting to become a global of sorts that can have name collisions.

Script Wrapper

Because async script tags are used to load modules, each script needs to be wrapped in a function wrapper, to prevent its execution before its dependencies are ready. CommonJS recognizes this concern (one of the reasons for their Transport proposals) and so does YUI3. xdomain builds for Dojo also use a script wrapper.

While it is unfortunate -- many people are not used to it -- it ends up being an advantage. Functions are JavaScript's natural module construct, and it encourages well scoped code that does not mess with the global space. For RequireJS, that wrapper is called require.def, as shown above.

Here are some other things that should be considered for a require implementation:

require as a global

I believe it makes more sense to keep require as a global, not something that is a function hanging off of the jQuery object. require can be used to load jQuery itself, and as mentioned above, it would be possible to load more than one version of jQuery if it was constructed like this.

CommonJS awareness

The CommonJS module format was not constructed for the browser, but having an awareness of their design goals and a way to support their modules in the browser will allow more code reuse. RequireJS has an adapter for the CommonJS Transport/D proposal, and it has a conversion script to change CommonJS modules into RequireJS modules.

In addition, RequireJS was constructed with many of the same design goals as CommonJS: allow modules to be enclosed/do not pollute the global space, use the "path/to/module" module identifiers, have the ability to support the module and exports variables used in CommonJS.

Browsers need more than a require API

They also need an optimization/build tool that can combine modules together. RequireJS has such a system today. It is server-independent, a command line tool. It builds up the layers as static files which can be served from anywhere.

I am more than happy to look at a runtime system that uses the optimization tool on the server. RequireJS works in Node and in Rhino. The optimization tool is written in JavaScript and uses require.js itself to build the optimization layers.

I can see using either Node or Rhino to build a run-time server tool to allow combo-loading on the fly. Using Rhino via the Java VM has an advantage because Closure Compiler or YUI Compressor could be used to minify the response, but I am open to some other minification scheme that is implemented in plain JavaScript.

Loader plugins

I have found the text plugin for RequireJS to be very useful -- it allows you to reference HTML templates on disk and edit HTML in an HTML editor vs. dealing with HTML in a string. The optimization tool is smart enough to inline that HTML during a build, so the extra network cost goes away for deployment.

In addition, Sean Vaughan and I have been talking about support for JSONP-based services and scripts that need extra setup besides just being ready on the script onload event. I can see those as easy plugins to add that open up loading Google Ajax API services on the fly.

For these reasons I have found loader plugins to be useful. They are not needed in the basic case, but they can make overall dependency management better.

script.onload

Right now RequireJS has support for knowing when a script is loaded by waiting for the script.onload event. This could be avoided by mandating that anything loaded via require() register via require.def to indicate when it is loaded.

However, by using script.onload it allows some existing scripts to be loaded without modification today, to give people time to migrate to the require.def pattern. I am open to doing a build without the script.onload support, however the amount of minified file savings will not be that great.

Explicit .js suffix

RequireJS allows two different types of strings for dependencies. Here is an example:

require(["some/module", "http://some.site.com/path/to/script.js"]);

"some/module" is transformed to "some/base/path/some/module.js", while the other one is used as-is.

The transform rules for a dependency name are as follows: if the name contains a colon before a front slash (has a protocol), starts with a front slash, or ends in .js, do not transform the name. Otherwise, transform the name to "some/base/path/some/module.js".

I believe that gives a decent compromise to short, remappable module names (by changing the baseUrl or setting a specific path via a require config call) to loading scripts that do not participate in the require.def call pattern. There is also a regexp property on require that can be changed to allow more exceptions to the rules.

However, if this was found insufficient, I am open to other rules or a different way to list dependencies. The "some/module" format was chosen to be compatible with CommonJS module names, but probably some algorithm or approach could be used to satisfy both desires.

File Size/Implementation

Right now the stock RequireJS is around 3.7KB minified and gzipped. However, there are build options that get the size down to 2.6KB minified and gzipped by removing some features:

plugin support
require.modify
multiversion support (the "context" switching in RequireJS)
DOM Ready support

I am open to getting that file size smaller based on the feature set that needs to be supported.

3 layer loading

John mentioned a typical loading scenario that might involve three sections:

1) loading core libraries from a CDN (like jQuery and maybe a require implementation)
2) loading a layer of your common app scripts
3) loading a page-specific layer

RequireJS can support this scenario like so today:


<script src="http:/some.cdn.com/jquery/1.5/require-jquery.js"></script>
<script>
require({
     baseUrl: "./scripts"
 },
 ["app/common", "app/page1"]
);
</script>

Then the optimization tool instructions would look like so:

{
 modules: [
     {
         //inside app/common.js there is a require call that
         //loads all the common modules.
         name: "app/common",
         exclude: ["jquery"]
     },
     {
         //app/page1 references jquery and app/common as a dependencies,
         //as well as page-specific modules
         name: "app/page1",

         //jquery, app/common and all their dependencies will be excluded
         exclude: ["jquery", "app/common"]
     },
     ... other pages go here following same pattern ...
 ]
}

This would result in app/common and app/page1 being loaded async in parallel. If require.js was a separate file from jquery.js, the following HTML could be used to load jQuery, app/common and app/page1 async and in parallel (the optimization instructions stay the same):


<script src="http:/some.cdn.com/jquery/1.5/require.js"></script>
<script>
require({
     baseUrl: "./scripts",
     paths: {
         "jquery": "http:/some.cdn.com/jquery/1.5/jquery"
     }
 },
 ["jquery", "app/common", "app/page1"]
);
</script>

Those configurations work today.

However, it is not quite flexible enough -- typically modules that are part of app/page1 will not want to refer to the complete "app/common" as the only dependency, but specify finer-grained dependencies, like "app/common/helper". So the above could result in a request for "app/commom/helper" from the "app/page1" script, depending on how fast "app/common" is loaded.

So I would build in support for the following:


<script src="http:/some.cdn.com/jquery/1.5/require.js"></script>
<script>
require({
     baseUrl: "./scripts",
     paths: {
         "jquery": "http:/some.cdn.com/jquery/1.5/jquery"
     },
     layers: ["jquery", "app/common", "app/page1"]
 },
 ["app/page1"]
);
</script>

Notice the new "layers" config option, and now the required modules for the page is just "app/page1". The "layers" config option would tell RequireJS to load all of those layers first, and find out what is in them before trying to fetch any other dependencies.

This would give the most flexibility in coding individual modules, but give a very clear optimization path to getting a configurable number of script layers to load async and in parallel. I will be working on this feature for RequireJS for the next release.

Summary

Hopefully I have demonstrated how RequireJS could be the require implementation for jQuery. I am very open to doing code changes to support jQuery's desires, and even if jQuery or John feel like they want to write their own implementation, hopefully we can at least agree on the same API, and maybe even still use the optimization tool in RequireJS. I am happy to help with an alternative implementation too.

I know John and the jQuery team are busy, focusing mostly on mobile and templating concerns, but hopefully they can take the above into consideration when they get to script loading.

In the meantime, I will work on the layers config option support, improving RequireJS, and keeping my jQuery fork up to date with the changes. You can try out RequireJS+jQuery today if you want to give it a spin yourself.

Sunday, April 25, 2010

RequireJS+jQuery Talk

I gave a talk about RequireJS with jQuery at the jQuery Conference today. Here are the slides:

PDF
HTML (Warning, the inline links do not appear to work, use PDF for working links)

Thanks to the folks that came to the talk! I had a great time at the conference.

If you went to the talk, please feel free to rate the talk so I can improve for the next time.

Friday, April 23, 2010

RequireJS 0.10.0 Released, Node integration

RequireJS 0.10.0 is now available.

The big feature in this release is integration with Node. Now you can use a the same module format for both browser and server side modules. The RequireJS-Node adapter translates existing CommonJS modules on the fly, as they are loaded by the adapter, so you can continue to use server modules written in the CommonJS format for your Node projects.

The RequireJS-Node adapter is freshly baked, so there could be some rough edges with it, but it is exciting to see it work. See the docs for all the details.

0.10.0 also includes support for an excludeShallow option in the optimization tool. This will allow you to do an optimization build during development, but still excludeShallow the specific module you want to develop/debug in the browser. So you can get great debug support in the browser for just that one module, but still load the rest of your JS super-fast. No need for special server transforms.

I will be at the jQuery conference this weekend in Mountain View, CA. I will be speaking on Sunday about jQuery+RequireJS. Stop by and say hi if you are at the conference!

Tuesday, April 13, 2010

JavaScript object inheritance with parents

There are different ways to inherit functionality in JavaScript, including using mixins (mixing in all the properties of one object into another object) and the use of prototypes.

In Dojo, there is dojo.mixin for doing mixins, and dojo.delegate for inheriting properties via prototypes. dojo.delegate is like ECMAScript 5/Crockford's Object.create(), but with a dojo.mixin convenience call.

I really like the dojo.delegate or a Object.create+dojo.mixin combination for inheriting, but it makes it hard to call methods you override from your parent. I see this problem show up frequently with widgets, which typically inherit from each other:


var MyWidget = Object.create(BaseWidget);

//BaseWidget also defines a postCreate method,
//But we want our widget to do work too.

MyWidget.prototype.postCreate = function () {
 //Call BaseWidget's implementation
 BaseWidget.prototype.postCreate.apply(this, arguments);

 //Do MyWidget's postCreate work here.
};

Not too bad, but the BaseWidget.prototype.postCreate.apply junk is a bit much to type, and it gets a bit trickier when there are mixins that also contribute to the functionality.

In Dojo, there is dojo.declare() that helps with this by defining an "inherited" method that can be used to find the BaseWidget's postCreate:


var MyWidget = dojo.declare(BaseWidget, {
  postCreate: function () {
      //Call BaseWidget's implementation
      this.inherited("postCreate", arguments);

      //Do MyWidget's postCreate work here.
  }
});

This is an improvement as far as typing, but the implementation of dojo.declare has always scared me. My JavaScript Fu is not strong enough to follow it, and I am concerned it is actually a bit too complicated.

So here is an experiment on something simpler:


var MyWidget = object("BaseWidget", null, function (parent) {
 return {
     postCreate: function () {
         //Call BaseWidget's implementation
         parent(this, "postCreate", arguments);

         //Do MyWidget's postCreate work here.
     }
 };
});

Here is the implementation of that object function, and here are some tests. That implementation is wrapped in a RequireJS module, but it can be extracted as a standalone script.

The second argument to the object() function allows for specifying mixins.

With two mixins, mixin1 and mixin2, the parent for MyWidget would be an object that inherits from BaseWidget with mixin1 and mixin2's properties mixed in:


var MyWidget = object("BaseWidget", [mixin1, mixin2], function (parent) {
  return {
      postCreate: function () {
          //Call BaseWidget's postCreate, but if it
          //does not have a postCreate method, mixin1's
          //postCreate function will be used. If mixin1
          //does not have an implementation, then mixin2's
          //postCreate function will be used. If mixin2 does
          //not have an implementation an error is thrown.
          parent(this, "postCreate", arguments);

          //Do MyWidget's postCreate work here.
      }
  };
});

dojo.declare has the concept of calling a method called "constructor" if it is defined on the declared object, whenever a new object of the MyWidget type is created. I preserved that ability in object() but the property name for that function is "init" in the object() implementation.

The object() implementation is simpler than dojo.declare, but still gives easy access for calling a parent implementation of a function. It is not has powerful as dojo.declare -- dojo.declare has the concept of postscript and a preamble and even auto-chaining calls. However, I feel the simplified approach is better. It is clearer to follow the code, and to predict how it will behave. I also expect it to perform better.

I like the object() method because it uses closures and a function that accepts the parent function as an argument. Feels very JavaScripty. The prototype chain is a bit longer with the extra object.create() calls creating some intermediate objects, but I expect prototype walking is fast in JavaScript, particularly when you go to measure it in comparison to any DOM operation.

Are there ways in which the object() function is broken or insufficient? Is there a better way to do this? Or even a different way, something that does not rely on a parent reference?

There is traits.js, for using traits. Alex Russell experimented with a trait implementation inside dojo.delegate. Kris Zyp pointed out that Alex's implementation does not have conflict detection or method require support.

I like the idea of mixing in just part of a mixin or remapping a method to fit some other API's expectations, so I can see adding support for the remapping features, similar to what Alex does in the dojo.delegate experiment. However, I am not sure how valuable conflict detection or method require support is.

I can see in large systems it would help with detecting errors sooner, but then maybe the bigger problem is the complexity of the large system. And there is a balance to forcing strictness up front over ease of use. The trait.js syntax looks fairly wordy to me, and the extra benefit of the strictness may not be realized for most web apps.

Also, I do not see an easy way to get the parent reference. It seems like you need to remap each overridden parent function you want to call to a new property name. It seems wordy, with more properties hanging off an object. And do you need to make sure you do not pick a name that is already in use by an ancestor? Seems like it could lead to a bunch of goofy names on an object.

Reusing code effectively is an interesting topic. The traits approach is newer to me, and I keep wondering if there is a better way to do it. It has been fun to experiment with alternatives.