在塊分配機制中,涉及到幾個主要的數據結構。
通過ext4_allocation_request描述塊請求,然後基於塊查找結果即上層需求來決定是否執行塊分配操作。
在分配過程中,為了更好執行分配,記錄一些信息,需要對分配行為進行描述,就有結構體ext4_allocation_contex。
在搜尋可用空間過程中,是有可能使用預分配空間的,因此還需要有能夠描述預分配空間大小等屬性的描述符ext4_prealloc_space。
下面,對各個關鍵結構體進行詳細的分析。
1. 塊請求描述符ext4_allocation_request
塊分配請求屬性,有請求描述符ext4_allocation_request來描述:
structext4_allocation_request {
/* target inode for block we'reallocating */
struct inode *inode;
/* how many blocks we want to allocate*/
unsigned int len;
/* logical block in target inode */
ext4_lblk_t logical;
/* the closest logical allocated blockto the left */
ext4_lblk_t lleft;
/* the closest logical allocated blockto the right */
ext4_lblk_t lright;
/* phys. target (a hint) */
ext4_fsblk_t goal;
/* phys. block for the closest logicalallocated block to the left */
ext4_fsblk_t pleft;
/* phys. block for the closest logicalallocated block to the right */
ext4_fsblk_t pright;
/* flags. see above EXT4_MB_HINT_* */
unsigned int flags;
};
這個請求描述符結構體在ext4_ext_map_blocks()中初始化(注:ext4_ext_map_blocks()的作用是查找或分配指定的block塊,並完成與緩存空間的映射)。
具體上述信息也就一個成員變量goal值的我們分析一下,goal記錄是物理塊號,其隱含含義比較重要:goal雖然只是記錄物理塊號,但是這個物理塊號的選擇可以很大程度的是文件保證locality特性及其物理地址連續性。
goal是由函數ext4_ext_find_goal()來定義:
static ext4_fsblk_t ext4_ext_find_goal(struct inode*inode,
struct ext4_ext_path *path,
ext4_lblk_t block)
{
if(path) {
intdepth = path->p_depth;
structext4_extent *ex;
/*
* Try to predict block placement assuming thatwe are
* filling in a file which will eventually be
* non-sparse --- i.e., in the case of libbfdwriting
* an ELF object sections out-of-order but in away
* the eventually results in a contiguousobject or
* executable file, or some database extendinga table
* space file. However, this is actually somewhat
* non-ideal if we are writing a sparse filesuch as
* qemu or KVM writing a raw image file that isgoing
* to stay fairly sparse, since it will end up
* fragmenting the file system's free space. Maybe we
* should have some hueristics or some way toallow
* userspace to pass a hint to file system,
* especially if the latter case turns out tobe
* common.
*/
ex= path[depth].p_ext;
if(ex) {
ext4_fsblk_text_pblk = ext4_ext_pblock(ex);
ext4_lblk_text_block = le32_to_cpu(ex->ee_block);
if(block > ext_block)
returnext_pblk + (block - ext_block);
else
returnext_pblk - (ext_block - block);
}
/*it looks like index is empty;
* try to find starting block from index itself*/
if(path[depth].p_bh)
returnpath[depth].p_bh->b_blocknr;
}
/*OK. use inode's group */
returnext4_inode_to_goal_block(inode);
}
細細分析這段代碼,如果從根目錄到指定邏輯塊的path存在,那麼就需要根據path來計算目標物理塊的地址。
(1) Path的終點若是dataextent,則說明該path是從根到葉子的。當請求block號大於path葉子extent的起始邏輯塊號ext_block (對應物理塊號為pblk),其邏輯塊的距離為(block-ext_block),為在最可能上保證對應物理地址的連續性;只需返回與pblk+(block-ext_block)物理塊號最接近的空閒物理塊即可;而對於請求block號小於extent的起始邏輯塊號ext_block的情況,只需盡最可能以pblk-( ext_block -block)物理塊號為目標尋找與其物理地址最接近的空閒物理塊即可。因此,我們指定goal分別為pblk+(block-ext_block)和pblk-(block-ext_block)。
(2) 而如果path存在,卻沒有葉子,那則麼辦,很簡單,我們只需要將goal物理塊號指定為最後一個的extent block對應的物理塊號既可。
(3) 還有一種情況,沒有給出path。個人認為,這種場景即inode剛create的情況。有專門的ext4_inode_to_goal_block()來實現:
ext4_fsblk_t ext4_inode_to_goal_block(struct inode*inode)
{
structext4_inode_info *ei = EXT4_I(inode);
ext4_group_tblock_group;
ext4_grpblk_tcolour;
intflex_size = ext4_flex_bg_size(EXT4_SB(inode->i_sb));
ext4_fsblk_tbg_start;
ext4_fsblk_tlast_block;
block_group= ei->i_block_group;
if(flex_size >= EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME) {
/*
* If there are at leastEXT4_FLEX_SIZE_DIR_ALLOC_SCHEME
* block groups per flexgroup, reserve thefirst block
* group for directories and special files. Regular
* files will start at the second blockgroup. This
* tends to speed up directory access andimproves
* fsck times.
*/
block_group&= ~(flex_size-1);
if(S_ISREG(inode->i_mode))
block_group++;
}
bg_start= ext4_group_first_block_no(inode->i_sb, block_group);
last_block= ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es) - 1;
/*
* If we are doing delayed allocation, we don'tneed take
* colour into account.
*/
if(test_opt(inode->i_sb, DELALLOC))
returnbg_start;
if(bg_start + EXT4_BLOCKS_PER_GROUP(inode->i_sb) <= last_block)
colour= (current->pid % 16) *
(EXT4_BLOCKS_PER_GROUP(inode->i_sb)/ 16);
else
colour= (current->pid % 16) * ((last_block - bg_start) / 16);
returnbg_start + colour;
}
其思想是:如果flex_size至少有EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME個block groups,則定義inode所在flex_group的第二個block group的首個可用block為起始物理塊號bg_block。
當然,如果該flex_group的所有文件都以bg_block為goal的,肯定會產生競爭,所以增加color的作用,目的就是加入一個隨機值,降低可能帶來的競爭。
因此,最後這種情況的goal會選擇inode所在flex_group中某個隨機值。
【說明:如果flex_size只有不小於EXT4_FLEX_SIZE_DIR_ALLOC_SCHEME,則才有可能將flex_group中第一個group分離出來,用於專門存放directories和一些特殊文件,普通文件從第二個group中分配,該特可以加速directory的訪問及fsync效率。】
2. 分配行為描述符ext4_allocation_contex
在分配過程中,為了更好執行分配,記錄一些信息,需要對分配行為進行描述,就有結構體ext4_allocation_contex:
struct ext4_allocation_context{
struct inode *ac_inode;
struct super_block *ac_sb;
/* original request */
struct ext4_free_extent ac_o_ex;
/* goal request (normalized ac_o_ex) */
struct ext4_free_extent ac_g_ex;
/* the best found extent */
struct ext4_free_extent ac_b_ex;
/* copy of the best found extent takenbefore preallocation efforts */
struct ext4_free_extent ac_f_ex;
__u16 ac_groups_scanned;
__u16 ac_found;
__u16 ac_tail;
__u16 ac_buddy;
__u16 ac_flags; /* allocation hints */
__u8 ac_status;
__u8 ac_criteria;
__u8 ac_2order; /* if request is to allocate 2^N blocks and
* N > 0, the field stores N, otherwise 0 */
__u8 ac_op; /* operation, for history only */
struct page *ac_b