Abstract: The challenge of the exploration-exploitation dilemma persists in off-policy reinforcement learning (RL) algorithms, impeding the improvement of policy performance and sample efficiency. To ...