Monkey Patching http.request for Fun and Profit • Stateful

Monkey patching is a time-honored tradition in the system instrumentation space. Many a time a developer will need to change a default behavior, add an annotation or capture an event for a system that doesn’t natively support it or provide any hooks, and will have to resort to trickery to achieve their goals.

Happily, in today’s modern world of high level interpreted dynamic programming languages like Python and JavaScript, monkey patching is so much easier!

Let’s explore how we can monkey patch the Node.js http library (and, by extrapolation, https as well) to annotate every request made from the environment.

Here at Stateful, we use this to add OpenTelemetry tracing information to outbound requests, allowing us to correlate events extending across multiple parts of our infrastructure for each customer integration.

Naive approach

Let’s try the simplest approach. First, let’s establish a simple testcase:

const assert = require(‘assert’);
const http = require('http');

// Let’s create a server that’s emulating our teapot
const server = http.createServer((req, res) => {
  res.writeHead(418);
  res.end();
});
server.listen();

http
  .request({ method: 'GET', hostname: 'example.com' }, (res) => {
    assert(res.statusCode == 418);
  })
  .end();

Obviously this will fail because the http.request is asking for example.com instead of the local HTTP-serving tea pot. So we’re just going to have to change it on the fly!

Note: while in these examples we are changing the hostname, it’s left as an exercise to the reader to modify the header field, or any other, as they prefer.

const simplePatch = () => {
  const oldRequest = http.request;
  http.request = (options, callback) => {
    options.hostname = 'localhost';
    options.port = httpPort;

    return oldRequest(options, callback);
  };
};

(async () => {
  simplePatch();

  let statusCode = await makeRequest();

  assert(statusCode == 418);
})();

Here we see the simplest form of a monkey patch: a simple module modification where we wrap the call to the http.request method, saving the “stock” version in oldRequest, and changing the hostname and port in the new one. Then we call oldRequest at the end, but with the new parameters.

Note: It’s important to always add your monkey patching before there’s any chance that the code that sends a request runs! Or, as some libraries do, has a chance to grab a reference to the “old” .request or .get methods.

Now our test will work, because the request will hit the local server!

However, our work is not remotely done.

What about http.get?

An optimistic interpretation of http.get, which is a shorthand call for http.request({method: ‘GET’}), is that it would directly call http.request. And you’d be right! Unfortunately, now we’re running into a JavaScript module scope issue.

When we modified the http object, we were only modifying our local copy of it, created during the require() step at the top of the test file. Once imported it is cached, so other modules in our executable will also make use of the patched version, but http.get directly calls the local method. You can see the code here; it calls the request() method directly.

That means that the monkey patch we did for the http.request value won’t work! So now we need to add another test case and monkey patch for http.get:

test('example http.get', () => {
  http.get('http://example.com' , (res) => {
    expect(res.statusCode).toBe(418);
  })
  .end();
});

… but wait, that has a completely different call signature? Sure, we can apply the same procedure that we did for http.request, but how many of these are we going to have to implement?

Different call signatures

It turns out that the http library offers four different call signatures:

http.request({...}, cb)
http.request(“url”, cb)
http.request(“url”, { method: ‘GET’ }, cb);
http.request(“url”, { }, cb);

That means that we need to normalize these down to something a little more generic so we can consistently apply our updating logic to each one.

Respecting the HTTP "options"

You’ll also note that on several of the calls, there’s no options parameter - the values are implied via the string URL that is supplied. But on the ones that do have an options parameter, we have to make sure that when we do make changes, we don’t accidentally modify the object that’s supplied. The caller is not expecting the object to be changed, after all, and to violate that convention would be rude (and possibly introduce bugs!).

So for the HTTP options object, we follow one simple rule: don’t modify the original. That means we need to make a safe copy (which can be hard, depending on what the caller supplies!), preserving the parameters and only creating anew the ones that we need for our specific monkey patch.

What about results and errors?

A big part of adding tracing and instrumentation to http is capturing the results of operations. It’s not just enough to wedge a new hostname or port on the request object, but we also have to capture the outcome of the operation. What was the status code? Did the request succeed? Did it fail due to DNS or network issues, or for a protocol reason?

Here’s an example of how to capture all of the various conditions:

const wrapRequest = (oldFunction) => (...args) => {
  let span = createSpan(args);

  return oldFunction
    .apply(null, args)
    // Capture the “normal” response code.
    .on('response', (res) => {
      enrichSpan(span, res);
    })
    // Capture protocol errors
    .once('error', (error) => {
      // Clear the span out after closing it, so it doesn’t get closed twice on
      // stacked errors.
      span = closeSpan(span, error);
    })
    // Capture network errors
    .once('close', (res) => {
      // Clear the span out after closing it, so it doesn’t get closed twice on
      // stacked errors.
      span = closeSpan(span);
    });
};

When do you run this code?

Run your monkey patches as early as possible, so that there’s no potential race conditions between events that you want to capture and your patch. But keep in mind that this is a global operation. It will impact every usage of that module in the system, baring shenanigans with the module cache.

You’re not just racing your code that makes the calls, but you’re racing any other imports that may grab references to the .request or .get methods themselves. Any reference to those old methods won’t be patched by your code.

The whole bit, all together now

In the spirit of showing you how it all works, here’s the code we use as part of tracing integrations through our system, here at Stateful.

const Http = require('http');
const Https = require('https');

const cloneHttpOptionsObjects = ['headers'];

const cloneHttpOptions = (options) => {
  const result = {};

  // Make a simple copy of all of the entries
  Object.keys(options).forEach((opt) => (result[opt] = options[opt]));

  // Duplicate the existing entries that are objects that we know about and touch, to
  // avoid contamination back to the caller.
  cloneHttpOptionsObjects.forEach(
    (opt) => (result[opt] = result[opt]
      ? JSON.parse(JSON.stringify(result[opt]))
      : result[opt])
  );

  return result;
};

// Convert an error into a standard object.
const errorToObj = (error) => ({
  code: 500,
  status: 500,
  statusCode: 500,
  message: error.message,
  properties: {
    errorMessage: error.message,
    errorType: error.name,
    stackTrace: error.stack.split('\n'),
  },
});

// Monkey patch both http and https, modifying both the `get` and `request` methods
// in each to add instrumentation and tracking to each outbound request.
[
  [Http, 'http:'],
  [Https, 'https:'],
].forEach((entry) => {
  const [h, hstr] = entry;

  // Return a standardized options object that always looks the same.
  const normalizeOptions = (args) => {
    let options;

    if (typeof args[0] === 'object') {
      options = cloneHttpOptions(args[0]);
    } else if (typeof args[0] === 'string') {
      if (typeof args[1] === 'object') {
        options = cloneHttpOptions(args[1]);
      } else {
        options = {};
      }
      const url = new URL(args[0]);
      options.hostname = url.hostname;
      options.port = url.port;
      options.path = url.pathname;
    } else {
      return {};
    }
    options.protocol = options.protocol || hstr;
    options.method = (options.method || 'get').toUpperCase();

    return options;
  };

  // Create a new OpenTelemetry span to track this action.
  const createSpan = (args) => {
    const options = normalizeOptions(args);
    const { protocol, host, hostname, port, path, method } = options;
    return {
      startTime: Date.now(),
      url: `${protocol || hstr}//${host || hostname}${port ? `:${port}` : ''}${path}`,
      method,
    };
  };

  // Figure out which of the various supported calling conventions are at play, clone
  // (or create) the options object, and return a new args array.
  const addTraceToArgs = (args) => {
    let options;

    // There's three different call signatures to deal with here for http.get and
    // http.request:
    if (typeof args[0] === 'object') {
      //   1. http.get({ ...options... }, (response) => {});
      options = args[0];
    } else if (typeof args[1] === 'object') {
      //   2. http.get('http://stateful.com', { ...options... }, (response) => {});
      options = args[1];
    } else {
      //   3. http.get('http://stateful.com', (response) => {});
      options = {};
      args = [args[0], options, ...args.slice(1)];
    }

    // Add the traceIdHeader and the traceId itself
    options.headers = options.headers || {};
    if (traceId) {
      options.headers[traceIdHeader] = traceId;
    }

    return args;
  };

  // Add the result of the operation to the OpenTelemetry span
  const enrichSpan = (span, res) => {
    if (!res) {
      return;
    }
    span.statusCode = res.statusCode;
  };

  // Note that the span has been closed, add a normalized error object if any, and
  // return undefined to prevent closeSpan from being called again.
  const closeSpan = (span, error) => {
    if (!span) {
      return undefined;
    }
    span.endTime = Date.now();
    span.error = error && errorToObj(error);
    spans.push(span);
    return undefined;
  };

  // Perform the actual wrap operation, adding instrumentation and tracing, and then
  // calling the previous function to perform the actual work.
  const wrapRequest = (oldFunction) => (...args) => {
    args = addTraceToArgs(args);
    let span = createSpan(args);

    return oldFunction
      .apply(null, args)
      .on('response', (res) => {
        enrichSpan(span, res);
      })
      .once('error', (error) => {
        span = closeSpan(span, error);
      })
      .once('close', (res) => {
        span = closeSpan(span);
      });
  };

  // Wrap both 'request' and 'get', which is a specialization of 'request'.
  h.request = wrapRequest(h.request);
  h.get = wrapRequest(h.get);
});

To wrap up…

Hopefully, you’ll find the above code and implementation details helpful! Don’t hesitate to reach out on Discord if you have any questions.

To stay updated with our latest content, please subscribe to our email updates or follow us on Twitter at @runmedev! Also, check out Runme (interactive runbooks for VS Code), and Runme Cloud.

Let us know what you think. Bye for now! 👋