Joshua Morey | 2012-10-25

Using LASzip from C#

Compiling LASzip is simple, but what what does performance look like when using LASzip in a managed environment? The first thing to realize is that accessing points individually is very expensive across a managed boundary. That means that using an equivalent of P/Invoke individually for each point will add a substantial amount of overhead in a C# context. To reduce the number of interop thunks which need to occur, the most important step is to write an intermediate class in native C++ which can retrieve the individual points and return them in blocks, transforming calls from this:

bool LASunzipper::read(unsigned char * const * point);

...to something like this:

int LAZBlockReader::Read(unsigned char* buffer, int offset, int count);

The next, less important, performance consideration is to create a C++/CLI interop layer to interface with the block reader/writer. This allows us to hide details like marshaling and pinning, and uses the C++ Interop, which provides optimal performance compared to P/Invoke.

For my situation, this is exactly what I want, since CloudAE is built around chunk processing anyway. For other situations, both the "block" transformation and the interop layer can be an annoying sort of overhead, so it should definitely be benchmarked to determine whether the thunk reduction cost is worth it.

The final factor determining the performance of LASzip is the file I/O. In LAStools, Martin Isenburg uses a default io_buffer_size parameter that is currently 64KB. Using a similarly appropriate buffer size is the easiest way to get reasonable performance. Choosing an ideal buffer size is a complex topic that has no single answer, but anything from 64KB to 1MB is generally acceptable. For those not familiar with the LASzip API, LASunzipper can use either a FILE handle or an iostream instance, and either of these types can use a custom buffer size.

One caveat that I mentioned in my last post is that when compiling a C++/CLI project in VS 2010, the behavior of customizing iostream buffer sizes is buggy. As a result, I ended up using a FILE handle and setvbuf(). The downside of this approach is that LAZ support in my application cannot currently use all my optimized I/O options, such as using FILE_FLAG_NO_BUFFERING when appropriate.

For an example of using the LASzip API from C++, check out the libLAS source.

Joshua Morey | 2012-09-19

LAZ Support

I have now added LASzip support to CloudAE. LASzip is a compression library that was developed by Martin Isenburg1 for compressing LAS points into an LAZ stream. Using the LASzip library, an LAZ file can be decompressed transparently as if it was an LAS source. This differs from the approach taken by LizardTech for the MG4 release of LiDAR Compressor, which does not necessarily maintain conformance to the LAS point types. Due to the compression efficiency and compatibility of the LAZ format, it has become popular for storing archive tiles in open data services such as OpenTopography and NLSF.

I link to the LASzip library in a similar fashion as libLAS, while providing a C++/CLI wrapper for operating on blocks of bytes. As a result, I am able to pretend that the LAZ file is actually an LAS file at the byte level rather than the point level. This allows me to support the format easily within my Source/Segment/Composite/Enumerator framework. I merely needed to add a simple LAZ Source and StreamReader, and the magic doth happen. There is minimal overhead with this approach, since the single extra memcpy for each point is not much compared to decompression time.

LAZ writer support is similarly straightforward, but I am sticking with LAS output for now, until I have more time to determine performance impacts.

  1. Thanks to Martin for his suggestions regarding implementation performance. It turns out there is a bug in the ifstream/streambuf when compiling with CLR support. I had to extract the stream operations into a fully native class in order to achieve the desired performance.