Pytorch 显存管理

复制本地路径 | 在线编辑

参考文章
1. https://zhuanlan.zhihu.com/p/680769942
2. https://zhuanlan.zhihu.com/p/681651660

写的会很简陋，而且只会涉及最最基础的内容。简单留存一下，花费不到五分钟的水文章。

如下图所示，用两个 pool 管理空闲和占用的块：

结构体

上图中 block 的结构体，本质是用双向链表穿起来：

struct Block {
  int device; // gpu
  cudaStream_t stream; // 哪个stream开辟了的
  stream_set stream_uses; // 哪些stream使用了该block
  size_t size; // block size in bytes
  BlockPool* pool; // owning memory pool
  void* ptr; // memory address
  bool allocated; // in-use flag  
  Block* prev; // prev block if split from a larger allocation
  Block* next; // next block if split from a larger allocation
  int event_count; // number of outstanding CUDA events
  int gc_count; // counter for prioritizing older / less useful blocks for
                // garbage collection
  std::unique_ptr<History> history;
  History* history_last;
}

上图中管理 block 的 pool 结构体，主要使用了 set 容器，排序和查找都很方便（CPP 中的 set 是可以根据索引获取的）。

struct BlockPool {
    BlockPool(Comparison comparator, bool small, PrivatePool *private_pool = nullptr)
        : blocks(comparator)
        , is_small(small)
        , owner_PrivatePool(private_pool)
    {
    }
    std::set<Block *, Comparison> blocks;  // Comparison 排序用的函数指针
    const bool is_small;
    PrivatePool *owner_PrivatePool;
};

整体逻辑

其实很简单。

申请一个块：查找有没有空闲的 block，如果要切分那么切分；
释放一个块：查找左右是否可以合并，如果合并就操作；

Pytorch 显存管理

结构体

整体逻辑

Comments