10 Vital Aspects of Building a Node.JS Application

Purpose

It sounds antagonisingly obvious, but the same goes for everything you decide to build. Your app needs to have purpose. A job to do. A problem to solve. Solid reasoning here will cement durable foundations for the application itself. It will help you visualise a path towards the solution, as well as maintaining focus on the ultimate goal when you get stuck in and iterate.

Structure

Structure concerns source code layout, file arrangement, library/module usage and on the whole describes the way the application has been weaved together. Its form will vary greatly depending on the nature of the app you’re building. It could be a web application server, which might be an express app handling static assets, routing, and application logic all in the same app. Or it may be more like a scheduler/worker pipeline, queueing and processing items from a queue. Regardless of the purpose, there are several common patterns that are useful to follow.

Modularity. Try to keep your code as DRY (Don’t Repeat Yourself) as reasonably possible. If you realise you’re going to be needing similar code in many distinct locations or scripts, it’s common to drop the function (or ‘helper’) in a separate file (or module), which may export a collection of helper functions. The module can then be included via node’s require() in all dependant scripts. The aim is to not only avoid rewriting similar functionality multiple times, but also provide an easy way to update the functionality in a single place.
Following node conventions, it’s common to keep the files for 3rd party node modules in the node_modules/ folder. It’s also common to put node_modules in your .gitignore, so that you’re not committing irrelevant dependency files.
Separate concerns. Assets for frontend (static CSS, javascript, HTML, templates, images, files) should be isolated from backend application logic (routes, server, middleware). Likewise keep deployment scripts, config files, data fixtures and tests separate.

Deployment

Your method of shipping applications to production can vary greatly depending on the nature of your stack. Here’s what we’ve tried in the past:

Manually SSH’ing into servers and cloning the git repository. Pros: full manual control, zero deployment tooling setup. Cons: Completely unfeasible for a large number of servers. Everything must be set up manually, so you don’t get any benefits such as upstart / initrc supervisation or logging without work.
Capistrano. Pros: Standard procedure for developers on your team. Simple to run: cap deploy. Cons: Trickier to set up. Introduces Ruby dependency.
Chef scripts. Pros: Scripted procedures for installing apps. Cons: Need to cook your servers each time you want to deploy. Chef is best used for server installation / config, not app deploys.
Deliver. A deploy tool born at GoSquared when we got sick of battling with the other options. It was inspired by Heroku’s git-push based deploy system. All you need to do is configure a system user for the application (which you should do anyway – we use chef to automate this), set up a basic deliver config, and then run the deliver command in your project (after adding it to your $PATH – see deliver setup instructions). It pushes the application to your server(s) using git via SSH, and can use foreman or equivalent to install upstart supervisation of application booting, respawning and recovery.

This is by no means an exhaustive list of deploy methods, and you may need to be a bit creative to come up with a solution to best fit your needs. Whatever strategy you employ, it’s a good idea to include deploy configurations in your application’s source control, and to document deploy processes in your README.

Configuration

Virtually every application has constants and settings that will need to be changed at convenience. The common ones are hostnames, port numbers, timeouts, module options and errors. It’s helpful to keep these values in one place, either in a file or in multiple files if there are enough of them. Doing so makes them quicker to change, without having to spend time combing through the code to track them down.

I used to just dump configuration settings into a file that exported an object of configuration properties. This worked well for a very specific target environment, such as the production environment at the time, but over time it started becoming a maintenance bottleneck that lacked clarity and sprawled into a scribble of conditional statements and multiple files as the infrastructure around the app changed.

We’ve since had great results from environment-aware configuration. The idea is you can change configuration values based on the environment in which your application is running. The way this works is simple. You export a shell environment variable called $NODE_ENV contianing an identifier for the environment mode you’ll be running the app under. Your application will then tailor the configuration settings using those you’ve defined specially for that environment when it starts up.

Environment-aware configuration offers you more leeway throughout the whole application lifecycle. You should be able to develop and run your applications locally, without the need for internet connectivity (you want to be able to hack on trains right? In fact I’m writing this on a train right now ;)). That will require pointing hosts and ports to local services. Then, you’ll want to be able to test-drive on a staging server before deploying to production. Each of these will likely require different configurations.

We commonly use node-config, a module that’s been designed precisely for the job. All you need to do is define your configuration values in config/default.js, and then make a file for each of your various environments, which contain directives that will extend the defaults in default.js. You set the $NODE_ENV variable with the environment name, and the module will override the defaults.js with the properties defined in [$NODE_ENV].js. To import the merged configuration object into your app, you simply require() it.

Logs, Metrics and Monitoring

Logs

You’ll want to give yourself enough evidence to work with should your application misbehave, so you can shoot through from ‘b0rked’ to ‘fixed all of the things’ status in as little time as possible. One of the best (and old skool) ways to do this is trusty old logging. General premise is, if you get an error, log the bugger. You should be following node’s error handling convention, where the first argument of a callback is reserved for error information should one occur:


makeRyanDahlProud(function(err, result){
  if(err){
    console.log(err);
  }
});

How you react to that error, however, is up to you. You may want to log it and carry on. Or you may want to halt execution of that callback. Regardless of your needs, you should have a way of referencing that error in the future, and logging is a simple way to achieve that.

Although logging errors is good practise, it can potentially lead to a lot of messages being sent to the logs / terminal. The mighty TJ Holowaychuck has developed a module called debug that allows you to namespace log messages so you can later filter a signal from the noise by glob-matching these log message namespaces. TJ has plenty of other handy modules in his repertoire.

Metrics

Application metrics offer valuable insight into what your application is doing and how often. It serves as a great way to detect unexpected activity, spot bottlenecks and as a point of reference for scaling plans. I put together a simple module called abacus which helps you maintain a collection of counters and optionally flush them to graphite via statsd for plotting visualisations. This has proved exceptionally handy for ensuring the application is behaving within intended operational parameters.

Monitoring

Not always necessary from the get-go, but it’s usually a good idea to keep an eye on resource utilisation information from the server hosting your application. Another early-warning system is useful to have, and it’ll help avoid silly reasons for your application to go down. There’s nothing more embarrassing than your application breaking because your server ran out of disk space, or you were maxing out the CPU so much it melted.

There’s a huge variety of monitoring tools and services out there: Ganglia, Monit, Sensu to name a few open source ones, and ServerDensity, NodeTime and NewRelic as SaaS services.

Fault tolerance

Tying in closely with deployment considerations, you should think about what’ll happen if your application crashes. It’s best to have the application under the control of a system supervisor, such as upstart on Ubuntu. Configuring upstart is for the most part trivial, and can handle starting, stopping, and restarting the application if it explodes, so it’s worth doing. Foreman has an export facility that generates upstart configs for your foreman-backed application.

Even if your application will be rebooted if it crashes, what are the implications of it doing so? Will it take your service down for some time? Will it lose data? Will it leave partially-complete work? These are consequences you must architect around, and redundancy is a good way to achieve that. For example, if you’re round-robining traffic across a number of servers, consider adding a reverse-proxy intermediary (a load balancer like HAProxy or web server such as nginx) to remove the faulty instance of your app from load balancing until its health checks start passing again.

Efficiency & Scaling

Rarely an early-stage concern, but as your application matures and handles a high workload, you may need to think about making the app more efficient, or even scaling it. The risk here is premature optimisation. You shouldn’t worry about making your app super-scale or super fast early on, because let’s face it, before it actually does get high load, why would you bother? Your precious time is much better spent on building the essential featureset, or ‘minimally viable’ as the lean startups like to call, at least to get it to the stage where you might need to scale.

When you do hit that stage where a single node with a single instance of your app is not enough, you have several options open to you, all at the mercy of trade-offs which can make it feel like a bit of a minefield. The main priority is to seek out the bottleneck. Why is the application not fast enough? Is it maxing out CPU? Is disk I/O sufficiently large or consistent enough?

Sometimes the easiest answer is to stick the app on a bigger server with more resources. If this’ll work for you, then it’ll get the job done quicker than re-architecting the app to work in a multi-node setup, but if you’re growing really fast then it’s not the most sustainable solution. This method might buy you enough time while you are designing your retaliation, however.

Scaling the app horizontally across multiple servers is tricky and introduces lots of scope for failure, but carries long-term viability and a fascinating technical challenge.

Docs & Team Collaboration

An application without documentation is like flat-pack without instructions. You can kind of figure out what it’s supposed to do, but doing so is clumsy, time consuming and imprecise. It’s much better to provide clear documentation that succinctly complements your code. You’re not only going to help others get up and developing quickly, but also salvage your forgetful self when you return to the app in 6 months time to fix an obscure bug.

I’m not advocating writing essays or tautological assertions that can otherwise be discerned from the code. There’s a lot that code can explain itself when it’s written clearly and simply. Instead, your documentation should colour in the grey areas that the code cannot clearly convey. Comments should help explain trickier portions of functionality, as well as inform about design decisions, trade-offs, dependencies, pitfalls, edge-cases and other considerations made. As an application matures, the documentation:code ratio should increase, to reflect its journey towards stability from transience. There’s no point going overboard on the documentation when the app is in its infancy. It will change rapidly in the early stages, and you’ll sink a load of time into documentation that becomes obsolete in a short time.

Every application should include a README[.md] which contains all the need-to-know essentials of working with the app. Commonly this comprises:

A brief description of the app and its purpose
Setup instructions
Booting instructions
Testing instructions
Deployment considerations
Any other need-to-know pointers

We built a little module that can extract source code comments and generate clean, attractive documentation which reads parallel to the code. It’s called docker and we use it in most of our main apps.

Testing

When I was starting out, I never realised the importance of testing and never really bothered. Perhaps it’s because I’d never been in the situation where, years down the line, your application starts failing and you’ve no idea how to guarantee which components were working properly and which weren’t. Yeah, well, now I’ve had that experience, it’s not pretty. You need tests.

A rigourous attitude to testing encourages good application design paradigms. You are required to think laterally to compartmentalise components of your app and make it possible to test them individually. This goes beyond basic unit tests which tend to be unnecessarily pedantic, to more informative component and integration tests where you can write probes to ensure your application is consistently fused together properly.

A good practise is to write tests as you develop your application (once you’re more confident that the functionality you’re testing is not so much in flux) so that as you progress with the app, you have an ever increasing battery of tests on hand to continuously run. This helps you guarantee you don’t break functionality with new developments (regressions). Tests also serve as a great usage example for other developers to understand exactly how various parts of the app are supposed to behave and what the results should be.

There’s a variety of testing frameworks to choose from employing a range of different techniques (BDD, TDD). Amongst others, we’ve tried vows and tap to date but my favourite so far is mocha in conjunction with should.js which I feel hits the right balance of structure and tooling vs pure javascript, allowing you to use all the same source libraries, script files and boot servers from your tests as you would running the app.

Dependencies

Node includes a powerful module system and a package manager called “npm” which is designed to help you seamlessly integrate modules into your app. npm endows you with a wealth of open source modules archived in its index where you’re likely to find what you’re looking for.

Once you figure out what components your app might need (such as a client with which you’ll communicate with a redis database), it’s wise to first check userland node modules in the index for existing implementations. Node.js is one of the hacker’s favourites for experimentation, so there’s a huge range of modules available for you to try out. Naturally, the quality fluctuates dramatically, and you’ll need to build up a sense for what makes a good module and what doesn’t (shortlist: README, tests, examples, prestiged author), but generally there’s almost always something available for what you require.

Enter your search