Mojolicious and HTTP streaming

By Abhijit Menon-Sen <ams@toroid.org>

2011-07-15

In response to a number of recent requests for examples of how to implement "tail -f" functionality in a Mojolicious application, this post dissects a log-file viewer I wrote. It's not difficult, but it's not obvious how to put the many moving parts together.

The nicest way to do this would be to stream lines from the logfile to the client over a websocket, or as DOM updates using server-sent events. Sadly, widespread and usable browser support for either method is still months away, and in their absence we must pick from a collection of unsatisfying hacks to implement "server push".

I wanted something easy to implement that would work on Firefox, Chrome, and IE8. I also wanted to minimise wastage of resources, more because of my parsimonious nature than based on any measured need (so forcing the page to reload every n seconds was right out). The Wikipedia entry for Comet describes the available options. It's not easy to find complete, non-trivial examples of their use, but I eventually settled on the "forever iframe" as the best fit for my needs. (But my code could be adapted to fit long-polling, another common strategy.)

The idea is to serve a page with a hidden iframe pointing to a URL that keeps a connection open forever and sends HTTP-chunked <script> tags to the client to display each new line in the log. The logfile doesn't need to keep being reopened and re-read, and multiple clients reading the logfile concurrently are supported cleanly (each one will have a dedicated connection, and will never need to tell the server which line in the file they last saw).

I decided to serve the container page from /logs, with the iframe pointing to /logs/tail. I set up the routes as follows:

$r->waypoint('/logs')->to('logs#index')
    ->route('/tail')->to('logs#tail');

The container

logs#index just serves the following content from a static template.

% layout 'default', title => "Log file";

<ul id=lines style="list-style-type: none">
</ul>

<%= javascript '/jquery.js' %>

<%= javascript begin %>
function r() {
    $('#lines').find('li').remove();
}

function p(s) {
    $('<li>').html(s).appendTo('#lines');
}

$(document).ready(function () {
    $('<iframe>')
        .attr('src', document.location + '/tail')
        .css({display: 'none'})
        .appendTo('body');
});
<% end %>

This is just one of many ways to write the container. You don't need to display lines in an <ul>, or to use jQuery. The important bits are to provide a function to print one line, to remove all lines, and to create the iframe (we do that using Javascript so that the iframe is loaded only after jQuery is available).

The twisted tail

logs#tail is where all the real work happens. The code below may look complicated, but all it's really doing is converting the logfile to a stream of p() calls.

sub tail {
    my $self = shift;

    # We'll open the log file and keep the filehandle.
    my $log;
    my $file = "/path/to/file.log";
    unless (open($log, $file)) {
        $self->render_not_found;
        return;
    }

    # We won't be generating a response right away, and we set a large
    # timeout so that we can take our own sweet time about it.

    $self->render_later;
    Mojo::IOLoop->singleton->connection_timeout(
        $self->tx->connection => 30
    );

    # We can send the beginning of the response right away: a bit of
    # Javascript to let us use p() to print lines and r() to remove
    # them.

    $self->res->code(200);
    $self->res->headers->content_type("text/html");
    $self->write_chunk(<<"    PROLOGUE");
@{[(" ")x200]}
<script type="text/javascript">
function r(){parent.r()}
function p(s){parent.p(s)}
</script>
    PROLOGUE

    # Next, we send the last 10KB of data from the logfile (so that
    # the client sees some data immediately, rather than after 2s).

    my $ino = (stat $file)[1];

    my $size = -s $file;
    if ($size > 10*1024 && seek $log, -10*1024, 2) {
        # Discard one (probably) partial line
        my $dummy = <$log>;
    }
    while (defined(my $l = <$log>)) {
        chomp($l);
        html_escape($l);
        $self->write_chunk(
            qq{<script type="text/javascript">}.
            qq{;p("$l");</script>\n}
        );
    }
    seek $log, 0, 1;

    # Now we set up a recurring timer to check for new lines from the
    # logfile and send them to the client, plus a utility function to
    # close the connection if anything goes wrong.

    my $id;
    my $close = sub {
        Mojo::IOLoop->drop($id);
        $self->finish;
        close $log;
        return;
    };

    $id = Mojo::IOLoop->recurring(2 => sub {
        my @st = stat $file;

        # Zero tolerance for any shenanigans with the logfile, such as
        # truncation, rotation, etc.

        unless (@st && $st[1] == $ino &&
                $st[3] > 0 && $st[7] >= $size)
        {
            $self->write_chunk(
                qq{<script type="text/javascript">}.
                qq{p("<b>Logfile ended. Please reload.</b>");}.
                qq{</script>\n}
            );
            return $close->();
        }

        # If there's new data, read it all and send it out. Then
        # seek to the current position to reset EOF.

        if ($size < $st[7]) {
            $size = $st[7];
            while (defined(my $l = <$log>)) {
                chomp($l);
                html_escape($l);
                $self->write_chunk(
                    qq{<script type="text/javascript">}.
                    qq{p("$l");</script>\n}
                );
            }
            seek $log, 0, 1;
        }
    });

    # If the client closes the connection, we can stop monitoring the
    # logfile.

    $self->on_finish(sub { Mojo::IOLoop->drop($id) });
}

The complications

Like every other Comet technique, the hidden iframe has some drawbacks.

First, some browsers start interpreting the streamed script tags only after a certain amount of data is received, and this data must add to the DOM (a long comment at the beginning of the response won't do). I settled on sending a bunch of <span> elements containing just &nbsp; in the PROLOGUE above.

Next, an extended silence may cause the browser to close the connection, so we must send something even if we don't have any log data to send. A comment works fine for this purpose (see below).

Finally, the infinite stream of new elements will cause browsers to leak memory. I close the connection after a fixed time and force the client to reload the log. (But rather than reloading the whole page, I reload only /logs/tail, as shown below.)

    $id = Mojo::IOLoop->recurring(2 => sub {
        …

        if ($size < $st[7]) {
            …
        }
        else {
            $self->write_chunk(qq{<!-- Nothing happens. -->\n});
        }

        # Close the connection after a while.

        if ($calls > 900) {
            $self->write_chunk(
                qq{<script type="text/javascript">r();}.
                qq{window.location.reload(true);</script>\n}
            );
            return $close->();
        }

        $calls++;
    });

The spinning hourglass

One regrettable side-effect of this method is that the browser will (correctly) never consider the page loaded, and keep the hourglass spinning. There are ghastly browser-specific hacks to avoid this, but I decided that I didn't care enough to fix it.

Update (2011-07-26): I forgot to mention that my Mojolicious app runs behind nginx, and I needed to turn "proxy_buffering off;" to make the above work.