11. File system

This chapter covers the file system module.

The file system functions consist of file I/O and directory I/O functions. All of the file system functions offer both synchronous (blocking) and asynchronous (non-blocking) versions. The difference between these two is that the synchronous functions (which have “Sync” in their name) return the value directly and prevent Node from executing any code while the I/O operation is being performed:

var fs = require('fs');
var data = fs.readFileSync('./index.html', 'utf8');
// wait for the result, then use it
console.log(data);

Asynchronous functions return the value as a parameter to a callback given to them:

var fs = require('fs');
fs.readFile('./index.html', 'utf8', function(err, data) {
  // the data is passed to the callback in the second argument
  console.log(data);
});

The table below lists all the asynchronous functions in the FS API. These functions have synchronous versions as well, but I left them out to make the listing more readable.

Read & write a file (fully buffered) Read & write a file (in parts) Directories: read, create & delete
Files: info Readable streams Writable streams
Files: rename, watch changes & change timestamps Files: Owner and permissions Files: symlinks

You should use the asynchronous version in most cases, but in rare cases (e.g. reading configuration files when starting a server) the synchronous version is more appropriate. Note that the asynchronous versions require a bit more thought, since the operations are started immediately and may finish in any order:

fs.readFile('./file.html', function (err, data) {
  // ...
});
fs.readFile('./other.html', function (err, data) {
  // ..
});

These file reads might complete in any order depending on how long it takes to read each file. The simplest solution is to chain the callbacks:

fs.readFile('./file.html', function (err, data) {
   // ...
   fs.readFile('./other.html', function (err, data) {
      // ...
   });
});

However, we can do better by using the control flow patterns discussed in the chapter on control flow.

11.1 Files: reading and writing

Fully buffered reads and writes are fairly straightforward: call the function and pass in a String or a Buffer to write, and then check the return value.

Recipe: Reading a file (fully buffered)

fs.readFile('./index.html', 'utf8', function(err, data) {
  // the data is passed to the callback in the second argument
  console.log(data);
});

Recipe: Writing a file (fully buffered)

fs.writeFile('./results.txt', 'Hello World', function(err) {
  if(err) throw err;
  console.log('File write completed');
});

When we want to work with files in smaller parts, we need to open(), get a file descriptor and then work with that file descriptor.

fs.open(path, flags, [mode], [callback]) supports the following flags:

  • 'r' - Open file for reading. An exception occurs if the file does not exist.
  • 'r+' - Open file for reading and writing. An exception occurs if the file does not exist.
  • 'w' - Open file for writing. The file is created (if it does not exist) or truncated (if it exists).
  • 'w+' - Open file for reading and writing. The file is created (if it does not exist) or truncated (if it exists).
  • 'a' - Open file for appending. The file is created if it does not exist.
  • 'a+' - Open file for reading and appending. The file is created if it does not exist.

mode refers to the permissions to use in case a new file is created. The default is 0666.

Recipe: Opening, writing to a file and closing it (in parts)

fs.open('./data/index.html', 'w', function(err, fd) {
  if(err) throw err;
  var buf = new Buffer('bbbbb\n');
  fs.write(fd, buf, 0, buf.length, null, function(err, written, buffer) {
    if(err) throw err;
    console.log(err, written, buffer);
    fs.close(fd, function() {
      console.log('Done');
    });
  });
});

The read() and write() functions operate on Buffers, so in the example above we create a new Buffer() from a string. Note that built-in readable streams (e.g. HTTP, Net) generally return Buffers.

Recipe: Opening, seeking to a position, reading from a file and closing it (in parts)

  fs.open('./data/index.html', 'r', function(err, fd) {
    if(err) throw err;
    var str = new Buffer(3);
    fs.read(fd, buf, 0, buf.length, null, function(err, bytesRead, buffer) {
      if(err) throw err;
      console.log(err, bytesRead, buffer);
      fs.close(fd, function() {
        console.log('Done');
      });
    });
  });

11.2 Directories: read, create & delete

Recipe: Reading a directory

Reading a directory returns the names of the items (files, directories and others) in it.

var path = './data/';
fs.readdir(path, function (err, files) {
  if(err) throw err;
  files.forEach(function(file) {
    console.log(path+file);
    fs.stat(path+file, function(err, stats) {
      console.log(stats);
    });
  });
});

fs.stat() gives us more information about each item. The object returned from fs.stat looks like this:

{ dev: 2114,
  ino: 48064969,
  mode: 33188,
  nlink: 1,
  uid: 85,
  gid: 100,
  rdev: 0,
  size: 527,
  blksize: 4096,
  blocks: 8,
  atime: Mon, 10 Oct 2011 23:24:11 GMT,
  mtime: Mon, 10 Oct 2011 23:24:11 GMT,
  ctime: Mon, 10 Oct 2011 23:24:11 GMT }

atime, mtime and ctime are Date instances. The stat object also has the following functions:

  • stats.isFile()
  • stats.isDirectory()
  • stats.isBlockDevice()
  • stats.isCharacterDevice()
  • stats.isSymbolicLink() (only valid with fs.lstat())
  • stats.isFIFO()
  • stats.isSocket()

The Path module has a set of additional functions for working with paths, such as:

path.normalize(p)Normalize a string path, taking care of '..' and '.' parts.
path.join([path1], [path2], [...])Join all arguments together and normalize the resulting path.
path.resolve([from ...], to)Resolves to to an absolute path.

If to isn't already absolute from arguments are prepended in right to left order, until an absolute path is found. If after using all from paths still no absolute path is found, the current working directory is used as well. The resulting path is normalized, and trailing slashes are removed unless the path gets resolved to the root directory.

fs.realpath(path, [callback])

fs.realpathSync(path)

Resolves both absolute (‘/path/file’) and relative paths (‘../../file’) and returns the absolute path to the file.
path.dirname(p)Return the directory name of a path. Similar to the Unix dirname command.
path.basename(p, [ext])Return the last portion of a path. Similar to the Unix basename command.
path.extname(p)Return the extension of the path. Everything after the last '.' in the last portion of the path. If there is no '.' in the last portion of the path or the only '.' is the first character, then it returns an empty string.
path.exists(p, [callback])

path.existsSync(p)

Test whether or not the given path exists. Then, call the callback argument with either true or false.

Recipe: Creating and deleting a directory

fs.mkdir('./newdir', 0666, function(err) {
  if(err) throw err;
  console.log('Created newdir');
  fs.rmdir('./newdir', function(err) {
    if(err) throw err;
    console.log('Removed newdir');
  });
});

Using Readable and Writable streams

Readable and Writable streams are covered in Chapter 9.

Recipe: Reading a file and writing to another file

var file = fs.createReadStream('./data/results.txt', {flags: 'r'} );
var out = fs.createWriteStream('./data/results2.txt', {flags: 'w'});
file.on('data', function(data) {
  console.log('data', data);
  out.write(data);
});
file.on('end', function() {
  console.log('end');
  out.end(function() {
    console.log('Finished writing to file');
    test.done();
  });
});

You can also use pipe():

var file = fs.createReadStream('./data/results.txt', {flags: 'r'} );
var out = fs.createWriteStream('./data/results2.txt', {flags: 'w'});
file.pipe(out);

Recipe: Appending to a file

var file = fs.createWriteStream('./data/results.txt', {flags: 'a'} );
file.write('HELLO!\n');
file.end(function() {
  test.done();
});

Practical example

Since Node makes it very easy to launch multiple asynchronous file accesses, you have to be careful when performing large amounts of I/O operations: you might exhaust the available number of file handles (a limited operating system resource used to access files). Furthermore, since the results are returned in a asynchronous manner which does not guarantee completion order, you will most likely want to coordinate the order of execution using the control flow patterns discussed in the previous chapter. Let’s look at an example.

Example: searching for a file in a directory, traversing recursively

In this example, we will search for a file recursively starting from a given path. The function takes three arguments: a path to search, the name of the file we are looking for, and a callback which is called when the file is found.

Here is the naive version: a bunch of nested callbacks, no thought needed:

var fs = require('fs');

function findFile(path, searchFile, callback) {
  fs.readdir(path, function (err, files) {
    if(err) { return callback(err); }
    files.forEach(function(file) {
      fs.stat(path+'/'+file, function() {
        if(err) { return callback(err); }
        if(stats.isFile() && file == searchFile) {
          callback(undefined, path+'/'+file);
          }
        } else if(stats.isDirectory()) {
          findFile(path+'/'+file, searchFile, callback);
        }
      });
    });
  });
}

findFile('./test', 'needle.txt', function(err, path) {
  if(err) { throw err; }
  console.log('Found file at: '+path);
});

Splitting the function into smaller functions makes it somewhat easier to understand:

var fs = require('fs');

function findFile(path, searchFile, callback) {
  // check for a match, given a stat
  function isMatch(err, stats) {
    if(err) { return callback(err); }
    if(stats.isFile() && file == searchFile) {
      callback(undefined, path+'/'+file);
    } else if(stats.isDirectory()) {
      statDirectory(path+'/'+file);
    }
  }
  // launch the search
  statDirectory(path, isMatch);
}

// Read and stat a directory
function statDirectory(path, callback) {
  fs.readdir(path, function (err, files) {
    if(err) { return callback(err); }
    files.forEach(function(file) {
      fs.stat(path+'/'+file, callback);
    });
  });
}

findFile('./test', 'needle.txt', function(err, path) {
  if(err) { throw err; }
  console.log('Found file at: '+path);
});

The function is split into smaller parts:

  • findFile: This code starts the whole process, taking the main input arguments as well as the callback to call with the results.
  • isMatch: This hidden helper function takes the results from stat() and applies the "is a match" logic necessary to implement findFile().
  • statDirectory: This function simply reads a path, and calls the callback for each file.

I admit this is fairly verbose.

PathIterator: Improving reuse by using an EventEmitter

However, we can accomplish the same goal in a more reusable manner by creating our own module (pathiterator.js), which treats directory traversal as a stream of events by using EventEmitter:

var fs = require('fs'),
    EventEmitter = require('events').EventEmitter,
    util = require('util');

var PathIterator = function() { };

// augment with EventEmitter
util.inherits(PathIterator, EventEmitter);

// Iterate a path, emitting 'file' and 'directory' events.
PathIterator.prototype.iterate = function(path) {
  var self = this;
  this.statDirectory(function(fpath, stats) {
    if(stats.isFile()) {
      self.emit('file', fpath, stats);
    } else if(stats.isDirectory()) {
      self.emit('directory', fpath, stats);
      self.iterate(path+'/'+file);
    }
  });
};

// Read and stat a directory
PathIterator.prototype.statDirectory = function(path, callback) {
  fs.readdir(path, function (err, files) {
    if(err) { self.emit('error', err); }
    files.forEach(function(file) {
      var fpath = path+'/'+file;
      fs.stat(fpath, function (err, stats) {
        if(err) { self.emit('error', err); }
        callback(fpath, stats);
      });
    });
  });
};

module.exports = PathIterator;

As you can see, we create a new class which extends EventEmitter, and emits the following events:

  • “error” - function(error): emitted on errors.
  • “file” - function(filepath, stats): the full path to the file and the result from fs.stat
  • “directory” - function(dirpath, stats): the full path to the directory and the result from fs.stat

We can then use this utility class to implement the same directory traversal:

var PathIterator = require('./pathiterator.js');
function findFile(path, searchFile, callback) {
  var pi = new PathIterator();
  pi.on('file', function(file, stats) {
    if(file == searchFile) {
      callback(undefined, file);
    }
  });
  pi.on('error', callback);
  pi.iterate(path);
}

While this approach takes a few lines more than the pure-callback approach, the result is a somewhat nicer and extensible (for example - you could look for multiple files in the “file” callback easily).

If you end up writing a lot of code that iterates paths, having a PathIterator EventEmitter will simplify your code. The callbacks are still there - after all, this is non-blocking I/O via the event loop - but the interface becomes a lot easier to understand. You are probably running findFile() as part of some larger process - and instead of having all that file travelsal logic in the same module, you have a fixed interface which you can write your path traversing operations against.

Using a specialized module: async.js

Encapsulating the I/O behind a EventEmitter helps a bit, but we can do one better by using fjacob's async.js. This is a FS-specific library that encapsulates file system operations behind a chainable interface. The findFile() function can be written using async.js like this:

var async = require('asyncjs');
function findFile(path, searchFile, callback) {
  async.readdir(path)
    .stat()
    .filter(function(file) {
      return file.stat == searchFile;
    })
    .toString(callback);
}

The techniques behind async.js consist essentially of an object that acts like a series (from the chapter on Control Flow), but allows you to specify the operations using a chainable interface.

My point - if there is any - is that coordinating asynchronous I/O control flow in Node is somewhat more complicated than in scripting environments/languages where you can rely on I/O blocking the execution of code. This is most obvious when you are dealing with the file system, because file system I/O generally requires you to perform long sequences of asynchronous calls in order to get what you want.

However, even with file I/O, it is possible to come up with solutions that abstract away the details from your main code. In some cases you can find domain specific libraries that provide very conscise ways of expressing your logic (e.g. async.js) - and in other cases you can at least take parts of the process, and move those into a separate module (e.g. PathIterator).

blog comments powered by Disqus