Sparse Virtual File System  0.4.1
A Sparse Virtual File System.
Sparse Virtual File System Documentation

Introduction

Sometimes you don't need the whole file. Sometimes you don't want the whole file. Especially if it is huge and on some remote server. But, you might know what parts of the file that you want and svfsc can help you store them locally so it looks as if you have access to the complete file but with just the pieces of interest.

svfsc is targeted at reading very large binary files such as TIFF, RP66V1, HDF5 where the structure is well known. For example you might want to parse a TIFF file for its metadata or for a particular image tile or strip which is a tiny fraction of the file itself.

svfsc implements a Sparse Virtual File, a specialised in-memory cache where a particular file might not be available but parts of it can be obtained without reading the whole file. A Sparse Virtual File (SVFS::SparseVirtualFile) is represented internally as a map of blocks of data with the key being their file offsets. Any write to an SVFS::SparseVirtualFile will coalesce these blocks where possible.

A Sparse Virtual File System (SVFS::SparseVirtualFileSystem) is an extension of this to provide a key/value store where the key is a file ID and the value a SVFS::SparseVirtualFile.

svfsc is written in C++. It is thread safe.

Usage

A SVFS::SparseVirtualFile might be used like this:

  • The user requests some data (for example TIFF metadata) from a remote file using a Parser that knows the TIFF structure.
  • The Parser consults the SVFS::SparseVirtualFile, if the SVFS::SparseVirtualFile has the data then the Parser parses it and gives the results to the user.
  • If the SVFS::SparseVirtualFile does not have the data then the Parser consults the SVFS::SparseVirtualFile for what data is needed, then issues the appropriate GET request(s) to the remote server.
  • That data is used to update the SVFS::SparseVirtualFile, then the parser can use it and give the results to the user.

Here is a conceptual example of a SVFS::SparseVirtualFile running on a local file system containing data from a single file.

CLIENT SIDE | LOCAL FILE SYSTEM
.
/------\ /--------\ | /-------------\
| User | <--> | Parser | <-- read(fpos, len) --> | File System |
\------/ \--------/ | \-------------/
| .
| |
/-------\ .
| SVF | |
\-------/ .

Here is a conceptual example of a SVFS::SparseVirtualFile running with a remote file system.

CLIENT SIDE | SERVER SIDE
.
/------\ /--------\ | /--------\
| User | <--> | Parser | <-- GET(fpos, len) --> | Server |
\------/ \--------/ | \--------/
| . |
| | |
/-------\ . /-------------\
| SVF | | | File System |
\-------/ . \-------------/

Example C++ Usage

svfsc is written in C++ so can be used directly:

#include "svf.h"
// Using an arbitrary modification time of 0.0
SVFS::SparseVirtualFile svf("Some file ID", 0.0);
// Write six char at file position 14
svf.write(14, "ABCDEF", 6);
// Read from it
char read_buffer[2];
svf.read(16, 2, read_buffer);
// What do I have to do to read 24 bytes from file position 8?
// This returns a std::vector<std::pair<size_t, size_t>>
// as ((file_position, read_length), ...)
auto need = svf.need(8, 24);
// This prints ((8, 6), (20, 4),)
std::cout << "(";
for (auto &val: need) {
std::cout << "(" << val.first << ", " << val.second << "),";
}
std::cout << ")" << std::endl;
Implementation of a Sparse Virtual File.
Definition: svf.h:288

The basic operation is to check if the SVFS::SparseVirtualFile has data, if not then get it and write that data to the SVFS::SparseVirtualFile. Then read directly:

if (!svf.has_data(file_position, length)) {
// Iterate through the minimal block set to read.
for (auto &val: svf.need(file_position, length)) {
// Somehow get the data at (val.first, val.second)...
// This could be a GET request to a remote file.
// Then...
svf.write(val.first, data, val.second)
}
}
// Now read directly
svf.read(file_position, length)