We observed that enabling both use_gzip and dump_self_cuda_time_total in the vLLM torch profiler introduces significant overhead during profiling. For example, when profiling 10 randomly generated ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results