Performance
###########

This describes some of the measured performance of ``SVFS``.
Platform is a 2018 Mac Book Pro, 2.7 GHz Intel Core i7, 16 GB RAM.
Memory read/write performance on this machine is typically 4,000 to 5,000 MB/s.

C++ Performance
===============

Read
----

This test simulates reading from a one Megabyte of data arranged in a sparse form with different block sizes from 1 byte to 512 bytes.
For the one byte case there are 1,000,000 blocks each of 1 byte, for the 512 byte case there are 2,048 blocks each of 512 bytes.
In the extreme right the data is coalesced into a single one Megabyte block.

The y axis shows the time to read all blocks.

.. image:: ../../plots/images/cpp_1mb_read.png

The one byte case corresponds to 6.3 MB/s, the 512 bytes case corresponds to 4,540 MB/s, the single 1MB block case
corresponds to 2,300 MB/s.

Need
----

This test simulates writing a low level RP66V1 index and then running need on it.
Total bytes written around 1Mb.
Blocks are about 800 bytes apart.
There are 238,310 blocks.

.. image:: ../../plots/images/cpp_need.png

This shows good linear performance.

Write
-----

This show the performance of writing 1MB of data to a ``SVF`` in two ways:

- Each write is contiguous with a previous one so the blocks are always coalesced. The ``SVF`` always contains only one block.
- Each write is *not* contiguous with a previous one so the blocks are *never* coalesced. The ``SVF`` eventually contains as many blocks as writes.

.. image:: ../../plots/images/cpp_1mb_write.png

In the case of storing 1M one byte blocks the ``SVF`` consumes 34,603,192 bytes of memory, so x33.
In the case of a 256 byte block size the ``SVF`` consumes 1,179,832 bytes of memory, just a 12.5% premium.

The one byte block size performance corresponds to 14 MB/s (coalesced) and 3.1 MB/s (un-coalesced).
The 256 byte block size performance corresponds to 445 MB/s (coalesced) and 456 MB/s (un-coalesced).

Multi-threaded Writes
---------------------

This looks at the performance where many threads might be writing independently to a single ``SVF``.
This requires the code be compiled with ``SVF_THREAD_SAFE`` and ``SVFS_THREAD_SAFE``.

This test is done with the test function ``test_write_multithreaded_coalesced()`` and
``test_write_multithreaded_un_coalesced()`` with a varying number of threads.

This test writes/overwrites a 1MB file with 8 bytes writes.
In the coalesced case these writes are all to one block.
In the un-coalesced case these writes are all to multiple (1024 * 1024 / 8) blocks.

.. image:: ../../plots/images/cpp_write_multithreaded.png

Python Performance
==================

Read
--------------------

This test simulates reading from a one Megabyte of data arranged in a sparse form with different block sizes from 1 byte to 512 bytes.
For the one byte case there are 1,000,000 blocks each of 1 byte, for the 512 byte case there are 2,048 blocks each of 512 bytes.
In the extreme right the data is coalesced into a single one Megabyte block.

The y axis shows the time to read all blocks.

.. image:: ../../plots/images/py_read_uncoalesced.png

The Python performance is about 5x compared to C++ for the one byte case and nearly equal to C++ for the large block cases.

Need
-------------

This measures the performance of ``need()`` for a 1MB SVF under various conditions:

One Megabyte of data is loaded into an SVF un-coalesced equal sized blocks, the block sizes range from 1 byte to 512 bytes.
For the one byte case there are 1,000,000 blocks each of 1 byte, for the 512 byte case there are 2,048 blocks each of 512 bytes and so on.
In the extreme right the data is coalesced into a single one Megabyte block.

A ``need()`` request is made for various 'need' sizes (1KB, 64KB, 1025KB) and file positions (0, 512KB).
For example a ``need()`` request of 64KB  on the un-coalesced SVF of 1 byte blocks will generate a need list 64K long.
The same request on the un-coalesced SVF of 128 byte blocks will generate a need list of length 64K / 128 = 512.

.. image:: ../../plots/images/py_need_1MB.png

Observations:

- The ``need()`` time is pretty much independent of file position.
- The ``need()`` time for a particular size is proportional to the fragmentation of the SVF (inversely proportional to the block size).
- The ``need()`` time is roughly proportional to the size of the need request regardless of the fragmentation of the SVF.
- All the configurations converge on the extreme right as it is a coalesced 1MB SVF so the need list is empty.  This represents the lower bound for ``need()``, typically 0.2 µs.

Write
--------------------

This show the performance of writing 1MB of data to a ``SVF`` in two ways:

- Each write is contiguous with a previous one so the blocks are always coalesced. The ``SVF`` always contains only one block.
- Each write is *not* contiguous with a previous one so the blocks are *never* coalesced. The ``SVF`` eventually contains as many blocks as writes.

.. image:: ../../plots/images/py_write.png

The Python performance is about 3x compared to C++ for the one byte case and nearly equal to C++ for the large block cases.

Multi Threaded Writes
---------------------

The Python module is compiled *without* ``SVF_THREAD_SAFE`` and ``SVFS_THREAD_SAFE`` so that the C++ mutexes are not used.
Instead Python thread locks are used with ``AcquireLockSVF`` and ``AcquireLockSVFS`` that are wrappers around ``PyThread_acquire_lock()`` and ``PyThread_release_lock()``.

This test writes/overwrites a 1MB file with 8 bytes writes.
In the coalesced case these writes are all to one block.
In the un-coalesced case these writes are all to multiple (1024 * 1024 / 8) blocks.

.. image:: ../../plots/images/py_multi_threaded_write.png

The result is quite different from the C++ result given above.