Sunday 9 June 2013

Two-Dimensional Knapsacks .. or heatmaps

I've had an interest in heatmaps type of displays for a long time...
anything that "looks like magic, and is cool" is of interest.

Heatmaps are a nice way to visual complex data sets, and are
used, for example, as a neat way to visualise disk usage.

I kind of put it on a back burner...I have no explicit need for it,
other than a desire to implement one.

Brendan Gregg recently posted an article on graphical heatmaps for
dtrace output...which reminded me that I was curious how easy/difficult
it is to implement them. [http://externaltable.blogspot.co.uk/2013/05/latency-heat-map-in-sqlplus-with.html]

Of course, something so simple is not so simple to implement. Whilst
"thinking" about heatmaps, I thought I would have a go - something
simple, but a "trial". Its not obvious how to implement them and
some research, e.g. on Wikipedia [http://en.wikipedia.org/wiki/Heat_map]
explain what they are and some digging leads to either some
complex (mathematically) articles, or snippets of implementation.

A long time ago, I implemented a knap-sack algorithm, used to
pack Z80 assembler code into overlayed ROMs. I didnt know what I was
doing but the results were pretty good - no longer needing
to manually lay out the code, and although the algorithm wasnt optimal,
it was sufficient. (Given a set of object files, figure out how
to map them to the ROMs, to minimise the few ROMs needed). Something
along the lines of start with the biggest and iteratively add
new object files to the ROM until its full.

The algorithm I wrote worked well, because there was enough slack
to not need perfect packing (which becomes algorithmically more complex,
e.g. trying all permutations).

A typical heatmap is a 2D visualisation, e.g. a tiled surface area
representing the size of files against the total size. The algorithm
to do this starts off quite straightforward:

Take the size of a file. Divide by the total size of all files.
Take the sqrt() of the size to create a rectangle of this size.
One can add the rectangle to the display surface, top-to-bottom,
left-to-right, but doing this would lead to large empty areas (gaps).

The result - although interesting - isnt a heatmap per se. Its a graphical
representation of the file sizes. To avoid gaps requires that, for
each square being layed out, to find a gap where the rectangle
can be placed - effectively an O(n^2) operation, but also a 2D
version of the knapsack.

I started experimenting with adding this to CRiSP's "du" macro, but
I am not sure I will like the results. The macro language isnt
fast enough to do millions of operations, and even if it is viable,
the DBOX_GENERIC object - used to allow custom macro painting, doesnt
model the kind of thing I want. (DBOX_GENERIC is really just a blank
X window, so painting and repainting is expensive). What is really needed
is a canvas/pixel arrangement so that CRiSP can render the visualisation
without requiring recomputation of the heatmap.

With todays machines being fast and having lots of memory, this is more
like the Wayland approach to graphics - dealing with bitmap buffers,
rather than drawing primitives.

I need to implement the code in CRiSP as a "first pass" to look at the
efficiency, and decide how to allow CRiSP to do this at several orders
of magnitude greater efficiency, inside itself.



Post created by CRiSP v11.0.16a-b6560


No comments:

Post a Comment