Here're some resources about Deployment on LLMs
paper link: here
citation:
@misc{miao2023efficient,
title={Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems},
author={Xupeng Miao and Gabriele Oliaro and Zhihao Zhang and Xinhao Cheng and Hongyi Jin and Tianqi Chen and Zhihao Jia},
year={2023},
eprint={2312.15234},
archivePrefix={arXiv},
primaryClass={cs.LG}
}