Improving Storage Size and Random Access Time in k²-raster Compact Data Structure for Hyperspectral Scenes
Tuesday, September 22, 2020 |
2:25 PM - 2:50 PM |
Speaker
Attendee8
Universitat Autònoma de Barcelona
Improving Storage Size and Random Access Time in k²-raster Compact Data Structure for Hyperspectral Scenes
Abstract Submission
Compact data structures are a type of losslessly compressed data structures that provide efficient storage and real-time processing. The data can be loaded into memory and accessed directly using the rank and select functions available from these structures. They facilitate better transmission through communication channels with limited bandwidth and provide faster data access. Queries to their data elements can be done without any need for full decompression.
Hyperspectral data are scenes taken from instruments on board an aircraft such as AVIRIS (Airborne Visible/Infrared Imaging Spectrometer), or on a satellite in space such as Hyperion and IASI (Infrared Atmospheric Sounding Interferometer). These scenes are made up of multiple bands from across the electromagnetic spectrum. Data extracted from these bands are useful in many diverse applications, for example, oil field search, mineral search, weather prediction, and wildfire soil studies, to name just a few. Hyperspectral scenes are usually compressed to reduce their large sizes in order to facilitate their transmission.
In our research, we have been using a compact data structure called k²-raster, proposed by Ladra et al. [1], to compress hyperspectral data. The k²-raster was developed from k²-tree, another compact data structure used for web graphs and social networks. The difference between the two is that k²-raster is built from a matrix with integers while k²-tree is from a bit matrix. In our previous papers [2,3], we presented some of the experimental results that show that k²-raster can help reduce the size of hyperspectral data to as much as 52%. Another major advantage of this structure is its ability to randomly access data without full decompression, saving access time. DACs used in the original paper of k²-raster [1] is the only integer encoder that has been examined to allow the structure to do that. But as explained below, we show that other integer encoders also have this random-access capability with competitive results and therefore can be used as a substitute for DACs.
In this paper, we describe some of the improvements we made on two important aspects of k²-raster: data storage and access speed. First for storage, before the k²-raster tree is built, the original matrix may need to be extended in size so that it can be partitioned into equal-sized subquadrants. Doing so will facilitate the search for child nodes. However, this also creates a lot of empty nodes filled with zeros and the k²-raster results in a larger, sometimes bloated, storage size. To resolve this, we examined a method where the original matrix size is used to build the k²-raster tree without any expansion. The paper describes the steps to do so and shows some experimental results.
Another improvement is to find an alternative to the DACs integer encoder [4] which is used together with k²-raster to provide fast random access. We have been studying the possibility of using word-aligned integer encoders such as Simple9 [5], Simple16 [6] and PForDelta [7] to attain a better storage size and random-access time. The results are reported and compared to those of DACs.
Bibliography:
[1] Ladra, S., Paramá, J. R., & Silva-Coira, F. (2017). Scalable and queryable compressed storage structure for raster data. Information Systems, 72, 179-204.
[2] Chow, K., Tzamarias, D. E. O., Blanes, I., & Serra-Sagristà, J. (2019). Using Predictive and Differential Methods with K²-Raster Compact Data Structure for Hyperspectral Image Lossless Compression. Remote Sensing, 11(21), 2461.
[3] Chow, K., Tzamarias, D. E. O., Hernández-Cabronero, M., Blanes, I., & Serra-Sagristà, J. (2020). Analysis of Variable-Length Codes for Integer Encoding in Hyperspectral Data Compression with the k²-Raster Compact Data Structure. Remote Sensing, 12(12), 1983.
[4] Brisaboa, N. R., Ladra, S., & Navarro, G. (2013). DACs: Bringing direct access to variable-length codes. Information Processing & Management, 49(1), 392-404.
[5] Anh, V. N., & Moffat, A. (2005). Inverted index compression using word-aligned binary codes. Information Retrieval, 8(1), 151-166.
[6] Zhang, J., Long, X., & Suel, T. (2008, April). Performance of compressed inverted list caching in search engines. In Proceedings of the 17th international conference on World Wide Web (pp. 387-396).
[7] Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006, April). Super-scalar RAM-CPU cache compression. In 22nd International Conference on Data Engineering (ICDE'06) (pp. 59-59). IEEE.
Hyperspectral data are scenes taken from instruments on board an aircraft such as AVIRIS (Airborne Visible/Infrared Imaging Spectrometer), or on a satellite in space such as Hyperion and IASI (Infrared Atmospheric Sounding Interferometer). These scenes are made up of multiple bands from across the electromagnetic spectrum. Data extracted from these bands are useful in many diverse applications, for example, oil field search, mineral search, weather prediction, and wildfire soil studies, to name just a few. Hyperspectral scenes are usually compressed to reduce their large sizes in order to facilitate their transmission.
In our research, we have been using a compact data structure called k²-raster, proposed by Ladra et al. [1], to compress hyperspectral data. The k²-raster was developed from k²-tree, another compact data structure used for web graphs and social networks. The difference between the two is that k²-raster is built from a matrix with integers while k²-tree is from a bit matrix. In our previous papers [2,3], we presented some of the experimental results that show that k²-raster can help reduce the size of hyperspectral data to as much as 52%. Another major advantage of this structure is its ability to randomly access data without full decompression, saving access time. DACs used in the original paper of k²-raster [1] is the only integer encoder that has been examined to allow the structure to do that. But as explained below, we show that other integer encoders also have this random-access capability with competitive results and therefore can be used as a substitute for DACs.
In this paper, we describe some of the improvements we made on two important aspects of k²-raster: data storage and access speed. First for storage, before the k²-raster tree is built, the original matrix may need to be extended in size so that it can be partitioned into equal-sized subquadrants. Doing so will facilitate the search for child nodes. However, this also creates a lot of empty nodes filled with zeros and the k²-raster results in a larger, sometimes bloated, storage size. To resolve this, we examined a method where the original matrix size is used to build the k²-raster tree without any expansion. The paper describes the steps to do so and shows some experimental results.
Another improvement is to find an alternative to the DACs integer encoder [4] which is used together with k²-raster to provide fast random access. We have been studying the possibility of using word-aligned integer encoders such as Simple9 [5], Simple16 [6] and PForDelta [7] to attain a better storage size and random-access time. The results are reported and compared to those of DACs.
Bibliography:
[1] Ladra, S., Paramá, J. R., & Silva-Coira, F. (2017). Scalable and queryable compressed storage structure for raster data. Information Systems, 72, 179-204.
[2] Chow, K., Tzamarias, D. E. O., Blanes, I., & Serra-Sagristà, J. (2019). Using Predictive and Differential Methods with K²-Raster Compact Data Structure for Hyperspectral Image Lossless Compression. Remote Sensing, 11(21), 2461.
[3] Chow, K., Tzamarias, D. E. O., Hernández-Cabronero, M., Blanes, I., & Serra-Sagristà, J. (2020). Analysis of Variable-Length Codes for Integer Encoding in Hyperspectral Data Compression with the k²-Raster Compact Data Structure. Remote Sensing, 12(12), 1983.
[4] Brisaboa, N. R., Ladra, S., & Navarro, G. (2013). DACs: Bringing direct access to variable-length codes. Information Processing & Management, 49(1), 392-404.
[5] Anh, V. N., & Moffat, A. (2005). Inverted index compression using word-aligned binary codes. Information Retrieval, 8(1), 151-166.
[6] Zhang, J., Long, X., & Suel, T. (2008, April). Performance of compressed inverted list caching in search engines. In Proceedings of the 17th international conference on World Wide Web (pp. 387-396).
[7] Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006, April). Super-scalar RAM-CPU cache compression. In 22nd International Conference on Data Engineering (ICDE'06) (pp. 59-59). IEEE.