Thursday, 10 April 2008

Physical storage space used versus physical file size

Problem(Abstract)
Data Storage space used when saving a file on the TSM Server depends on the storage block size in regards to the size of the object being saved.

Resolving the problem
In general, the TSM Server uses the following block sizes for data storage:
Random-access disk pool, devtype=disk ==>> 4KB block
Sequential-access disk pool, devtype=file ==>> 256KB block
Sequential tape pool, devtype=tape(e.g. lto) ==>> 256KB block

* When saving an object that is smaller than 4K to a DISK stgpool, it will allocate/use the entire block (4K) of space. This will be true for each object saved to the DISK Stgpool.
* If an object is saved to sequential media (FILE or TAPE devclass), it will again use the entire block of space, which is generally 256K for sequential storage. Even if the object is moved from FILE stgpool to TAPE stgpool it will still be using the 256K.

For example, if a small, 3K, object is saved to the DISK Stgpool, it will use the entire 4K block of space. When this object is then moved to the sequential media of FILE it will occupy the whole block size of 256K. Any move thereafter to another sequential stgpool, either FILE or TAPE will still just use the 256K of physical space. This is the case for each object that is saved to the stgpool.

In the case where a file is greater than 4k, it will use the number of 4K DISK storage blocks needed to hold the file.
For example in the case where a file is 10K, it will use 3 blocks:
Two 4K blocks will be full and the third block will hold the remaining 2K of data.
Even though the third block is not full, it is considered to be used and cannot be utilized by anything else.

This will work the same way for the sequential stgpool.
If the object is less than 256K, it will allocate the entire 256K block of physical space and consider it to be used. The block will not be full, but it cannot be used for anything else.
For example, if the object is 300K in size, it will take two 256K blocks to hold it. The first 256K block will be full and the second block will contain the left over data that did not fit in the first block. This second block will not be full, but again is considered as used and cannot be utilized for any other data.

Some sequential devices can also have their own minimum requirements for physical space that is used when storing data. For example, the LTO requires a minimum amount of space usage per write, that is called a dataset. The following are the minimum LTO dataset sizes:
LTO1/LTO2 ==>> ~400KB
LTO3 ==>>~1.6MB.

The physical space used when writing to the LTO is specific to the manner in which the data is saved. When the data is flushed to the LTO device, it will minimally use the size of the dataset. A flush of the data will occur at the end of the transaction processing.

For example, if a single 3K object is sent within a single session to the LTO3, this will write the object and then perform a flush at the end of the transaction. This flush to the device will end on the dataset boundary, which will be 1.6MB for the LTO3. For this example, the 3K object written straight to LTO3, would use 1.6MB of physical space.

However, if there are many objects being stored (in one transaction) to the LTO3, then the data would stream to the tape and be written in 256K blocks. This would be the case for when migration is writing to the LTO3. For example if there are 40 objects of 10K each saved on the DISK stgpool. Then when migration was run to copy this data to LTO3, it would write in 256K blocks per object, using 40 of the 256K blocks. After saving the 40 objects, the end of the transaction processing would not allocate any additional space for the dataset usage since the physical space used is greater than the minimum size of 1.6MB. Thus the physical storage used would be 10MB.

0 comments: