***** - 首页 - 微博


【FlexGen:在单个 GPU 上运行像 OPT-175B/GPT-3 这样的大型语言模型,比其他基于 offloading 的系统快100倍】’FlexGen - Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.' Foundation Model Inference GitHub: github.com/FMInference/FlexGen

https://weibo.com/mygroups?gid=221012100009820647