Joshua Morey | 2014-05-15

Spatialization Development

When I started building the CloudView LAS viewer, I decided to take advantage of it by updating my CloudAE spatialization algorithm to optimize for fast 3D display without slowing down the processing pipeline. The SATR2 algorithm is now live, and I decided to write up some of the milestones in the development of the CloudAE spatialization. My first algorithm was mostly just a proof-of-concept, similar to something like Isenburg's lastile, but reorganizing the tiles into a single file rather than splitting into separate files. At the present time, my spatialization is extremely fast, and built around a robust indexing framework.

The next set of improvements will involve improved dynamic code generation for filtering and converting points. I am also considering additional changes relating to 3D visualization, such as additional sorting in the SATR2 algorithm or reworking the spatialization to handle general point clouds instead of just focusing on airborne LiDAR.

This list also leaves out the numerous unrelated improvements that may have been developed in parallel with these iterations, such as LAS 1.4 output support, composite readers, unbuffered I/O, and fast allocation. Figuring which development phase all the individual features belong to would require a tedious inspection of my source control history, and that's just not worth it.

SOOT

Compute a tile grid based on the assumption that the points are evenly distributed across the extent. Traverse the file to get the tile counts. Allocate a number of fixed-size buffers and traverse the file again to do the tiling. For each point that doesn't have an active tile buffer, acquire a free buffer and associate it with that tile. If the new point fills a buffer, flush the buffer to the correct location in the file. If there are no free buffers, flush one (last used or most full). When the traversal is complete, flush remaining buffers. As I said, this first attempt is similar to tiling to separate files, except that the separate file approach does not require the initial tile counts because it pushes the buffering problem off to the OS and filesystem, resulting in file fragmentation and varying performance.

  • Simple algorithm.
  • No temporary files.
  • Random write performance is dreadful (especially on magnetic media).
  • Sparse points will result in more points per tile than intended.

STAR

Traverse the file to get estimated tile counts. Use estimated counts to create new tile boundaries (taking into account empty space). Break point data into logical segments which can each fit in a large buffer. Tile each segment individually, but use the same tile boundaries for each (so each segment fills a subset of the counts). For each segment, read the points into memory, count and sort them, and write out the tiled segment. Merge the tiled segments together.

  • Faster than SOOT.
  • Resulting points per tile are more accurate on sparse points.
  • Requires simultaneous open read handles on all intermediate files.
  • Uses 1x the output size for temporary files.

STAR2

During the final merge operation, calculate how much data could be loaded from each intermediate file to fill the large buffer and read much larger chunks before flushing the large buffer to the output.

  • Much faster merge than STAR due to larger sequential operations.
  • Does not require intermediate read handles to stay open.
  • Temporary file usage is the same.
  • More complex interleaving logic.
  • Increased memory usage during merge, but since it reuses the large buffer, no overall increase.

SATR

Traverse the file to create high resolution index. Create tile boundaries from index, but skip populating them with actual counts. Create sparse logical segments from the index data such that they can efficiently read the tiles in the order that they will be written. For each segment, read the sparse file data into the large buffer in an optimized sequence, filtering out the extra points that do not belong in the associated tiles, and sort the remaining points into tile order. Append the buffer directly to the output file.

  • Faster than STAR2 due to halving the data written to disk.
  • No temporary files.
  • Depending on the point order and the buffer size, the sparse read multiplier might be 1.1-1.8x, and since it is sparse, it is similar to the previous 2x sequential read from STAR2.
  • The spatial indexes can be saved either for regenerating the spatialized file without the initial pass, or for direct use instead of spatializing (when disk space is a concern).

SATR2

During the segment sort operation, extract representative low-res grid points from each tile and append them to a common buffer. Medium-res points can be shifted to the beginning of each tile. After the segments have been written, append the common buffer with the low-res points to the end of the file so they can all be read at once. Retrieving tile points now requires pulling the subset of associated low-res points from the common buffer and adding them to the remaining points in the tile location. As part of the emphasis on rendering, grids now use square cells for all tile calculations.

  • Low-resolution points are available immediately in a sequential region.
  • Negligible performance difference for tile operations.
  • Reading the tiles is slightly more complicated internally, but it is hidden by the API.
  • Tile metadata is increased by the addition of a low-res count.
  • Points are "out-of-order", but since they are already in an artificial order, it hardly matters.
  • Slightly slower spatialization than SATR1 due to extra point shifting (it only becomes measurable when using low-res point criteria such as nearest to grid cell center).
  • More complex and time-consuming to debug.
Joshua Morey | 2014-05-08

WebGL LAS Viewer

I have been on an hiatus from development recently, due to injury, but I am now starting to do some coding for a couple of hours a day to build my hand strength back up. Since I have the time, I decided to learn about WebGL and get myself up to speed with modern JavaScript libraries. When I first developed my CloudAE platform, I used WPF 3D for rendering meshes, but I found it to be extremely limited in functionality and performance. I decided to make a WebGL viewer to replace it, starting with rendering simple LAS point clouds and tacking on functionality as I have time. As I brought myself up-to-date, I found that developers have made an enormous number of complex and powerful WebGL demos and applications. Although it was good to know that WebGL and HTML5 would probably be able to handle my needs, it was disconcerting to realize how out-of-date my knowledge was, due to my focus on desktop/server development.

My first basic implementation was in straight-up WebGL, but I found that due to my hand limitations, I just wasn't adequately productive, so I learned about available JavaScript 3D libraries and eventually decided on three.js. Three.js is a reasonably well-maintained library that doesn't have too many limitations for the current scope of my viewer. Eventually, I will have to break down and go back closer to WebGL, but not just yet. The main technical issues I have right now are the inability to do buffer interleaving properly and limitations on the number of scene nodes. It can also be difficult to develop against the unstable three.js API, since I might be able to find two or three different solutions to a problem, but none of them are valid in the current version, which requires me to delve into the source (the migration page is often inadequate).

I have uploaded the demo version of what I am giving the temporary (and uninteresting) name CloudView. Among other things, it is a learning ground for JavaScript frameworks, web workers, and the HTML5 File API.

Next time, I will outline the iterations of my CloudAE spatialization algorithms, including the current SATR2 implementation which is being improved for optimal use by CloudView.

Joshua Morey | 2013-03-14

SATR Development Finalized

I finally found some time to finish my SATR tiling algorithm for CloudAE. Unlike the previous STAR algorithms, which were built around segmented tiling, this approach uses more efficient spatial indexing to eliminate temporary files and to minimize both read and write operations. Another difference with this approach is that increasing the allowed memory usage for the tiling process will substantially improve the performance by lowering the duplicate read multiplier and taking advantage of more sequential reads. For instance, increasing the indexing segment size from 256 MB to 1 GB will generally reduce the tiling time by 50% on magnetic media, and 30% on an SSD.

Joshua Morey | 2012-10-29

PropertyManager

When I started programming as a lad, I initially used INI files to manage configuration settings. Lightweight and easy to parse, they were a simple way to get started, whether I was rolling my own parsing or, later, using existing parsing utilities. Soon, however, the allure of the Windows Registry drew me in, and I began using it almost exclusively for configuration settings. I found the registry convenient for most purposes, and only resorted to INI files for portable applications that I would run from a disk. This state of affairs lasted almost a decade, until I started to encounter registry permission problems on newer operating systems with improved user security controls. I finally started adopting some different configuration mechanisms.

The Application Settings mechanism is the default way to persist and manage application configuration settings in .NET. For those who prefer to adjust the behavior, it supports custom persistence implementations. This feature allows design-time development of application and user settings, and is improved by adding ConfigurationValidatorAttribute constraints.

And now, here I go, flying in the face of convention. I dislike managing settings outside the scope of the code that will use them, so I have written a PropertyManager which uses generics, reflection and delegates to provide a good alternative to the built-in Application Settings. It allows the declaration of properties at a more reasonable scope, automated discovery, and simple run-time management.

private static readonly IPropertyState<ByteSizesSmall> PROPERTY_SEGMENT_SIZE;
private static readonly IPropertyState<bool> PROPERTY_REUSE_TILING;
static ProcessingSet()
{
PROPERTY_SEGMENT_SIZE = Context.RegisterOption(Context.OptionCategory.Tiling, "MaxSegmentSize", ByteSizesSmall.MB_256);
PROPERTY_REUSE_TILING = Context.RegisterOption(Context.OptionCategory.Tiling, "UseCache", true);
}

Once the properties have been defined, they can be used easily through the Value property, similar to the usage of a Nullable<T>.

if (PROPERTY_REUSE_TILING.Value)
{
// do something exciting
}

So, what makes this all happen? Other than some validation, it boils down to the call to PropertyManager.Create().

public static IPropertyState<T> RegisterOption<T>(OptionCategory category, string name, T defaultValue)
{
if (!Enum.IsDefined(typeof(OptionCategory), category))
throw new ArgumentException("Invalid category.");
if (string.IsNullOrWhiteSpace(name))
throw new ArgumentException("Option registration is empty.", "name");
if (name.IndexOfAny(new[] { '.', '\', ' ', ':' }) > 0)
throw new ArgumentException("Option registration contains invalid characters.", "name");
string categoryName = Enum.GetName(typeof(OptionCategory), category);
string optionName = String.Format("{0}.{1}", categoryName, name);
IPropertyState<T> state = null;
var propertyName = PropertyManager.CreatePropertyName(optionName);
if (c_registeredProperties.ContainsKey(propertyName))
{
state = c_registeredProperties[propertyName] as IPropertyState<T>;
if (state == null)
throw new Exception("Duplicate option registration with a different type for {0}.");
WriteLine("Duplicate option registration: ", propertyName);
}
else
{
state = PropertyManager.Create(propertyName, defaultValue);
c_registeredProperties.Add(propertyName, state);
c_registeredPropertiesList.Add(state);
}
return state;
}

The static property manager contains the necessary methods to create, update and retrieve properties. It wraps an IPropertyManager instance which knows the details about persistence and conversion for the storage mode that it represents. I have standard implementations for the Registry, XML, and a web service.

public interface IPropertyManager
{
PropertyName CreatePropertyName(string name);
PropertyName CreatePropertyName(string prefix, string name);
IPropertyState<T> Create<T>(PropertyName name, T defaultValue);
bool SetProperty(PropertyName name, ISerializeStateBinary value);
bool GetProperty(PropertyName name, ISerializeStateBinary value);
bool SetProperty(IPropertyState state);
bool GetProperty(IPropertyState state);
}

As for data binding, just create a DataGrid with a TwoWay binding on Value, and we have ourselves a property editor.

dataGrid.ItemsSource = Context.RegisteredProperties;

The main downside with this approach to application and user settings is that configuration validators cannot be used as attributes on the Value property of the IPropertyState<T>. The workaround for this is validation delegates which work just as well, but are not quite as nice visually.

Joshua Morey | 2012-10-29

MatSu Point MacKenzie in LAZ

The Matanuska-Susitna Borough's 2011 LiDAR & Imagery Project has been around for a while now, but I recently discovered that the Point MacKenzie data on the public mirror has been made available as compressed LAZ. The LASzip format is not ideal for all purposes, but it is always a major improvement when data servers provide data in LAZ format because of the massive reduction in bandwidth and download times.

Even when CloudAE did not support LAZ directly, it was always well worth it to download the compressed data and convert it to LAS. Now that CloudAE supports LAZ, everything is much simpler.

Joshua Morey | 2012-10-26

Point Enumeration in CloudAE

In my last post, I referenced the block-based nature of point cloud handling in CloudAE. The following example shows the basic format for enumerating over point clouds using the framework. At this level, the point source could be a text file, an LAS or LAZ file, or a composite of many individual files of the supported types. The enumeration hides all such details from the consumer.

using (var process = progressManager.StartProcess("ChunkProcess"))
{
foreach (var chunk in source.GetBlockEnumerator(process))
{
byte* pb = chunk.PointDataPtr;
while (pb < chunk.PointDataEndPtr)
{
SQuantizedPoint3D* p = (SQuantizedPoint3D*)pb;
// evaluate point
pb += chunk.PointSizeBytes;
}
}
}

This can be simplified even more by factoring the chunk handling into IChunkProcess instances, which can encapsulate analysis, conversion, or filtering operations.

var chunkProcesses = new ChunkProcessSet(
quantizationConverter,
tileFilter,
segmentBuffer
);
using (var process = progressManager.StartProcess("ChunkProcessSet"))
{
foreach (var chunk in source.GetBlockEnumerator(process))
chunkProcesses.Process(chunk);
}

The chunk enumerators handle progress reporting and checking for cancellation messages. In addition, they hide any source implementation details, transparently reading from whatever IStreamReader is implemented for the underlying sequential sources.

Joshua Morey | 2012-10-25

Using LASzip from C#

Compiling LASzip is simple, but what what does performance look like when using LASzip in a managed environment? The first thing to realize is that accessing points individually is very expensive across a managed boundary. That means that using an equivalent of P/Invoke individually for each point will add a substantial amount of overhead in a C# context. To reduce the number of interop thunks which need to occur, the most important step is to write an intermediate class in native C++ which can retrieve the individual points and return them in blocks, transforming calls from this:

bool LASunzipper::read(unsigned char * const * point);

...to something like this:

int LAZBlockReader::Read(unsigned char* buffer, int offset, int count);

The next, less important, performance consideration is to create a C++/CLI interop layer to interface with the block reader/writer. This allows us to hide details like marshaling and pinning, and uses the C++ Interop, which provides optimal performance compared to P/Invoke.

For my situation, this is exactly what I want, since CloudAE is built around chunk processing anyway. For other situations, both the "block" transformation and the interop layer can be an annoying sort of overhead, so it should definitely be benchmarked to determine whether the thunk reduction cost is worth it.

The final factor determining the performance of LASzip is the file I/O. In LAStools, Martin Isenburg uses a default io_buffer_size parameter that is currently 64KB. Using a similarly appropriate buffer size is the easiest way to get reasonable performance. Choosing an ideal buffer size is a complex topic that has no single answer, but anything from 64KB to 1MB is generally acceptable. For those not familiar with the LASzip API, LASunzipper can use either a FILE handle or an iostream instance, and either of these types can use a custom buffer size.

One caveat that I mentioned in my last post is that when compiling a C++/CLI project in VS 2010, the behavior of customizing iostream buffer sizes is buggy. As a result, I ended up using a FILE handle and setvbuf(). The downside of this approach is that LAZ support in my application cannot currently use all my optimized I/O options, such as using FILE_FLAG_NO_BUFFERING when appropriate.

For an example of using the LASzip API from C++, check out the libLAS source.

Joshua Morey | 2012-09-19

LAZ Support

I have now added LASzip support to CloudAE. LASzip is a compression library that was developed by Martin Isenburg1 for compressing LAS points into an LAZ stream. Using the LASzip library, an LAZ file can be decompressed transparently as if it was an LAS source. This differs from the approach taken by LizardTech for the MG4 release of LiDAR Compressor, which does not necessarily maintain conformance to the LAS point types. Due to the compression efficiency and compatibility of the LAZ format, it has become popular for storing archive tiles in open data services such as OpenTopography and NLSF.

I link to the LASzip library in a similar fashion as libLAS, while providing a C++/CLI wrapper for operating on blocks of bytes. As a result, I am able to pretend that the LAZ file is actually an LAS file at the byte level rather than the point level. This allows me to support the format easily within my Source/Segment/Composite/Enumerator framework. I merely needed to add a simple LAZ Source and StreamReader, and the magic doth happen. There is minimal overhead with this approach, since the single extra memcpy for each point is not much compared to decompression time.

LAZ writer support is similarly straightforward, but I am sticking with LAS output for now, until I have more time to determine performance impacts.

  1. Thanks to Martin for his suggestions regarding implementation performance. It turns out there is a bug in the ifstream/streambuf when compiling with CLR support. I had to extract the stream operations into a fully native class in order to achieve the desired performance.

Joshua Morey | 2012-06-26

The Cost of Double.TryParse

After extensive testing, I have optimized the text to binary conversion for ASCII XYZ files. These files are a simple delimited format with XYZ values along with any number of additional attributes. My original naive approach used StreamReader.ReadLine() and Double.TryParse(). I quickly discovered that when converting billions of points from XYZ to binary, the parse time was far slower than the output time, making it the key bottleneck for the process.

Although the TryParse methods are convenient for normal use, they are far too slow for my purposes, since the points are in a simple floating point representation. I implemented a reader to optimize the parse for that case, ignoring culture rules, scientific notation, and many other cases that are normally handled within the TryParse. In addition, the parse performance of atof()-style operations varies considerably between implementations. The best performance I could come up with was a simple variation of a common idea with the addition of a lookup table. The main cost of the parse is still the conditional branches.

In the end, I used custom parsing to identify lines directly in bytes without the overhead of memory allocation/copying and converting to unicode strings. From there, I parse out the digits using the method I described. I also made a variation that used only incremented pointers instead of array indices, but in the current .NET version, the performance was practically identical, so I reverted to the index version for ease of debugging.

The following test code provides reasonable performance for parsing three double-precision values per line.

bool ParseXYZ(byte* p, int start, int end, double* xyz)
{
for (int i = 0; i < 3; i++)
{
long digits = 0;
// find start
while (start < end && (p[start] < '0' || p[start] > '9'))
++start;
// accumulate digits (before decimal separator)
int currentstart = start;
while (start < end && (p[start] >= '0' && p[start] <= '9'))
{
digits = 10 * digits + (p[start] - '0');
++start;
}
// check for decimal separator
if (start > currentstart && start < end && p[start] == '.')
{
int decimalPos = start;
++start;
// accumulate digits (after decimal separator)
while (start < end && (p[start] >= '0' && p[start] <= '9'))
{
digits = 10 * digits + (p[start] - '0');
++start;
}
xyz[i] = digits * c_reciprocal[start - decimalPos - 1];
}
else
xyz[i] = digits;
if (start == currentstart || digits < 0)
return false; // no digits or too many (overflow)
}
return true;
}
Joshua Morey | 2012-02-27

Simulating points in WPF 3D

I have implemented a simple 3D point cloud visualization control intended to test whether WPF 3D can be reasonably used as a point cloud viewer. Up to this point, I have been rendering clouds using TIN geometry because WPF only supports meshes.

Using indexed meshes, I generated geometry that could represent points in 3D space. At first, I tried to make the points visible from all angles, but that required too much memory from the 3D viewer even for simple clouds. I reduced the memory footprint by only generating geometry for the top face of the pseudo-points, which allowed small clouds to be rendered. For large clouds, I had to thin down to fewer than 1m points before generating geometry. Even at this point, I was forced to make the point geometry larger than desired in order for it to not disappear when rendered.

The conclusion from this test is that WPF 3D is totally unsuitable for any type of large-scale 3D point rendering. Eventually, I will need to move to OpenGL, Direct3D, or some API such as OSG or XNA.

Joshua Morey | 2011-12-24

Edge Detection

After completing the segmentation, I went ahead and implemented a fast edge detection filter.

Joshua Morey | 2011-12-23

Segmentation

Since the framework performance has reached the goals that I set for myself, I have paused work on tiling optimizations and moved on to algorithm development. The first step was to build a raster format data source. I decided to go with a non-interleaved tiled binary format, which fits well with my intended usage. In the future, I will make a more robust determination of my needs with regard to raster support.

One of the first high-performance algorithms that I developed in years past was a basic region-grow. I have now implemented my tile-based segmentation algorithm (region-growing) for 2D analysis. This allows me to perform segmentation on an interpolated cloud or raster, using the tiles to allow a low memory footprint.

Soon, I will need to make a vector output source, possibly SHP. That will give me better geometry to visualize than the textured meshes I am currently using in the 3D viewer.

Joshua Morey | 2011-12-23

Tile Stitching

I have completed a basic tile stitching algorithm for use in completing the 3D mesh view. At this point, it only stitches equal resolutions, so it will stitch low-res tiles together and high-res tiles together, but it will leave a visible seam between tiles of different resolutions. Obviously, that is not a difficult problem, since I am only stitching regular meshes at this point. However, I do plan to get irregular mesh support eventually. Currently, I have an incremental Delaunay implementation, but the C# version is much slower than the original C++, to the extent that it is not worth running. The S-hull implementation that I downloaded is much faster, but does not produce correct results.

The primary limitation of the tile-based meshing and stitching is that WPF 3D has performance limitations regarding the number of MeshGeometry3D instances in the scene. For small to medium-sized files, there may be up to a few thousand tiles with the current tile-sizing algorithms. WPF 3D performance degrades substantially when the number of mesh instances gets to be approximately 12,000. Massive files may have far more tiles than that, and adding mesh instances for the stitching components makes the problem even worse. I have some ideas for workarounds, but I have not yet decided if they are worth implementing since there are already so many limitations with WPF 3D and this 3D preview was never intended to be a real viewer.

Joshua Morey | 2011-12-23

CloudAE Preview Announcement

CloudAE development has reached a preview stage, so I am beginning this blog to share my development efforts. I solve a number of problems as I develop applications, and I no longer have the energy to share solutions on forums and message boards. Perhaps I will find the motivation to provide details here as I run benchmarks and resolve issues that interest the community. We shall see.