Saturday 8 November 2014

CRiSP/crtags Optimisation

The tagging facility in CRiSP has always worked reasonably well - it
is based on a series of custom parsers for each programming language.
Over the years, the list of languages supported has grown and is
now a huge repository of nearly every common language and file
format out there. (See: "crtags -help" for details).

The initial implementation is getting on for nearly 20y old.
The goal originally was to provide the "ctags" facility of vi,
but better. Machines of the day were looking like 4-16MB of RAM rather
than 4-16GB of RAM which is common today, so effort was made to
optimise space usage. The crtags file format is a series of sections
of files and items which are tagged. It has been optimised - right
from the beginning, to avoid optimise space usage, and avoid bad
paging behavior. (Now, a distant artifact ! Systems rarely page or
are memory constrained). This attempt to optimise memory goes back to the
1MB machines that CRiSP was originally built on. These optimisations
are no longer necessary - but removing them would only offer a small
change in performance.

crtags is designed to work with reasonably sized projects and directories.
It takes a few seconds to scan the nearly 8000 files in the CRiSP source
tree, and I regularly use it on the Linux kernel. The 3.16.1 kernel
has 47426 files in it. Scanning that takes a little while. I have benchmarked
this over the years.

Recently I did some more work to look at the performance and optimisation
facilities in crtags. I collected a series of Linux kernels - so
I would have a good/large test case - about 500MB of source files,
67356 files in all. On my i7 laptop (crtags is single threaded), it was
taking about 2m20s to scan the files. On investigating the performance,
I could see that we had an O(n^2) algorithm on filename matching. This
is silly, given the complexity of the language parsers - that the mere
filenames were using a lot of the CPU.

I modified the code to put in a hash table for filename matching,
and this gave a huge win - down to about 23s for scanning the same files -
about a 7x improvement in speed.

In looking at crtags, most of the processing is a constant per file -
and each file is handled, one after the other. This opens up a huge
win by multithreading the code. Potentially an Nx speedup, on an N cpu system.
The code is difficult to convert to multithreading - it would require
a lot of edits and refactoring, to ensure each thread is
avoiding global state - a common reason why converting a non-threaded
application to multithreaded is so difficult.

Its depressing how little attention the C standards bodies and
compiler writers have, for converting non-threaded code to threaded.
Really, there are a set of transformations (refactoring) and one would
think that tools could help identify the major areas at issue
(use of global variables, and use of "static" variables). I
may create a refactoring macro in CRiSP to handle this.

On Unix, using a fork/join model of operation, one can create
the equivalent of a multithreaded code, by use of fork() and wait()
system calls. For example, divide all the files to be processed
up, into separate groups, processed by individual CPUs. Then the
issue of global state and locking disappears - at the expensive of
more work on the "join" or merge at the end of processing.

I have modified crtags to use this fork/join model (it is a command
line option, and not enabled by default), and reran my test. The above
test went from 23s to 4s by using "crtags -j 8" - using the 4 real and
4 hyperthread cpu's on my i7. About a 6x performance increase.
(The final code will be slower due to the lack of a merge).

So we went from 2m20s to 4s - a 35x speed up, with just a handful of lines
of code.

The depressing thing about Windows (and this is late 2014) is that
it still does not support fork(). It does support threads. So the above
code will have no effect on Windows, and real threading will have to
be implemented or an alternate way of achieving the same result.

I do fail to understand why Windows cant implement fork() - from
the user space point of view, there is almost nothing to implement.
From the kernel point of view, its not a huge amount of code.
Granted, Windows processes may carry more state and forking may
be more expensive if Windows could do it, but that would be such
a big benefit when writing portable code or porting code to Windows.
Oh well. (Cygwin supports fork(), and it is hugely expensive
in operation, since it relies on software to copy huge blocks
of memory around, rather than relying on the CPU's MMU to do
copy-on-write (COW) operation - the key to why fork() is so efficient
on Unixes).

Having said that, fork() on Linux is not brillliantly fast. Despite
processors being so much faster than years of old - fork() seems
to be getting slower, possibly related to the need for all
CPUs to synchronise MMU and other state, rather than fork() itself
getting more complicated.

To end users of CRiSP, they may see the initial performance optimisation,
but unless they are working on extremely huge projects, they may
hardly ever notice the change put in place. Also note, that
in CRiSP, one rarely tags an entire project - CRiSP does incremental
updates to the tag database, as you are editing/saving files.

Post created by CRiSP v12.0.3a-b6801


No comments:

Post a Comment