A colleague recently asked me an interesting question, How can I reduce the size of a physical IO on Linux?
So here’s a blog that hopefully answers that question.
Linux Operating System
For this blog I will be using a Linux server with a Kernel that has a default maximum size of 1280KB (if you are running an older Kernel you may still be using 512KB, another reason to upgrade) connected to Pure Storage FlashArray which supports a maximum transfer size of 4MB.
The Linux Kernel has 2 parameters in the /sys/block file system used to manage physical block IO sizes: max_hw_sectors_kb and max_sectors_kb.
max_hw_sectors_kb (read-only)
This is the maximum number of kilobytes supported in a single data transfer by the underlying device.
This value is read-only. It is set by the driver to reflect the driver/hardware limit. The block layer will also enforce this limit and so it will take the minimum of the max_hw_sectors_kb and the kernel default block limit to make sure all I/O requests are within the size limit that the hardware/driver can support.
max_sectors_kb (read/write)
This is the maximum number of kilobytes that the block layer will allow for a filesystem request. This value can be overwritten, but it must be smaller than or equal to the maximum size allowed by the hardware.
[root@z-oracle:/dev/mapper]# ls -l /sys/block/dm-18/queue/* | egrep 'sector_size|sectors_kb'
-r--r--r-- 1 root root 4096 Jan 16 10:02 /sys/block/dm-18/queue/hw_sector_size
-r--r--r-- 1 root root 4096 Jan 16 09:52 /sys/block/dm-18/queue/max_hw_sectors_kb
-rw-r--r-- 1 root root 4096 Jan 3 15:39 /sys/block/dm-18/queue/max_sectors_kb
From the above we can see max_sectors_kb is read-writable.
[root@z-oracle:/dev/mapper]# cat /sys/block/dm-18/queue/max_hw_sectors_kb
4096
[root@z-oracle:/dev/mapper]# cat /sys/block/dm-18/queue/max_sectors_kb
4096
Using dd we can check the device throughput, for example.
[root@z-oracle:~]# dd if=/dev/dm-18 of=/dev/null iflag=direct bs=4M count=100000 &
Output from iostat -tkx 1 -d /dev/dm-18 shows the average request size (avgrq-sz) for my FlashArray volume never exceeds 8192 sectors or 4MB (8192*512B) as expected.
[root@z-oracle:~]# iostat -tkx 1 -d /dev/dm-18
...
16/01/24 11:26:53
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-18 0.01 0.01 26.39 1.15 683.67 26.29 51.56 0.06 2.08 2.14 0.65 0.0441 0.12
dm-18 486.00 0.00 162.00 0.00 663552.00 0.00 8192.00 0.99 6.10 6.10 0.00 6.1049 98.90
dm-18 483.00 0.00 165.00 2.00 659520.00 20.00 7898.68 0.99 5.93 6.01 0.00 5.9162 98.80
dm-18 513.00 0.00 171.00 0.00 700416.00 0.00 8192.00 0.98 5.74 5.74 0.00 5.7602 98.50
dm-18 510.00 0.00 170.00 0.00 696320.00 0.00 8192.00 0.99 5.81 5.81 0.00 5.8000 98.60
dm-18 507.00 0.00 177.00 2.00 692352.00 20.00 7736.00 0.99 5.52 5.58 0.00 5.4860 98.20
dm-18 504.00 0.00 168.00 0.00 688128.00 0.00 8192.00 0.98 5.82 5.82 0.00 5.8155 97.70
dm-18 513.00 0.00 171.00 0.00 700416.00 0.00 8192.00 0.98 5.75 5.75 0.00 5.7485 98.30
dm-18 510.00 0.00 178.00 2.00 696448.00 20.00 7738.53 0.98 5.46 5.52 0.00 5.4333 97.80
...
So let’s reduce the max_sectors_kb from 4096 to 128 and repeat the test.
[root@z-oracle:~]# echo 128 > /sys/block/dm-18/queue/max_sectors_kb
[root@z-oracle:~]# cat /sys/block/dm-18/queue/max_sectors_kb
128
[root@z-oracle:~]# dd if=/dev/dm-18 of=/dev/null iflag=direct bs=4M count=100000 &
Output from iostat -tkx 1 -d /dev/dm-18 shows the average request size (avgrq-sz) now never exceeds 256 sectors or 128KB (128*512B) as we hoped.
[root@z-oracle:~]# iostat -mx 1 -d /dev/dm-18
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
dm-18 0.18 0.01 26.46 1.15 0.89 0.03 67.95 0.06 2.09 2.15 0.65 0.0557 0.15
dm-18 0.00 0.00 13321.00 1.00 1665.12 0.00 255.98 25.61 1.92 1.92 0.00 0.0715 95.30
dm-18 0.00 0.00 13139.00 0.00 1642.37 0.00 256.00 25.29 1.93 1.93 0.00 0.0728 95.60
dm-18 0.00 0.00 13123.00 1.00 1639.94 0.02 255.91 25.72 1.96 1.96 0.00 0.0731 95.90
dm-18 0.00 0.00 13064.00 1.00 1633.00 0.00 255.98 25.40 1.94 1.94 1.00 0.0733 95.80
dm-18 0.00 0.00 13016.00 0.00 1627.00 0.00 256.00 25.11 1.93 1.93 0.00 0.0736 95.80
dm-18 0.00 0.00 13256.00 1.00 1656.12 0.02 255.85 25.70 1.94 1.94 1.00 0.0726 96.20
dm-18 0.00 0.00 12961.00 1.00 1620.13 0.00 255.98 25.53 1.97 1.97 0.00 0.0739 95.80
dm-18 0.00 0.00 12991.00 0.00 1623.88 0.00 256.00 25.32 1.95 1.95 0.00 0.0732 95.10
...
Great, OK what if we try to set max_sector_kb to a value higher than the storage device supports ?
[root@z-oracle:~]# echo 8192 > /sys/block/dm-18/queue/max_sectors_kb
-bash: echo: write error: Invalid argument
Good news, if you try to use an unsupported value, you get a write error, and will need to lower the value to an acceptable value.
Summary
In this blog post I have shared how we can use the Linux Kernel to limit physical IO sizes, perform a test using the Linux dd command, and check results of the the change with iostat.
