Despite what the man page says, Linux does not discard the initial
partial page only. The ending page would be truncated no matter if
it is partial or not.
Page-align the fadvise size to take care of this.
Also while we are at it, roll back initial fadvise offset to the
previous page boundary to actually throw away that page as we
no longer need it having read the second part now and the first
time in the previous call.
This patch has a noticeable impact in my Linux testing when the file
is on the rotating media. The total test runtime decreased by a bit
over 10%, but since over half of that time was actually cpu time,
the actual iowait time decreased around 20%.
[1]
This worked incidentally because OSes tend to return at least one page
worth of data when EOF is not reached. Increasing WINDOW_SIZE beyond
the page size might have lead to data loss.
[2]
If read() of the passed Python object returned something not-bytes,
PyBytes_Size returns -1 (ssize_t) which becomes a very larger number for
memcpy()s size_t.
if we have a OS file handle, we can directly read to the final destination - one memcpy less.
if we have a Python file object, we get a Python bytes object as read result (can't save the memcpy here).