Maximal clique enumeration (MCE) in large-scale graphs is critical across various application domains, including social network analysis, bioinformatics, and computer vision. However, existing GPU-based MCE solutions suffer from inefficient load-balancing mechanisms. These mechanisms force busy workers to pause and hand over workloads to idle workers, which introduces significant synchronization overhead and increases memory usage. To address these limitations, we introduce a root-down exposure mechanism where busy workers dynamically expose their current root, enabling idle workers to pull workloads from the exposed node directly without synchronization. We then propose a bitmap-centric MCE scheme and an aggressive node generation rule to further simplify the memory layout and accelerate enumeration. We combine them into RDMCE, a Root-Down MCE solution on GPUs. Across large real-world graphs with up to 145 billion maximal cliques, RDMCE is 1.25-5.38x faster than any next-best state-of-the-art GPU-based solutions and is the only one that completes enumeration on every test dataset, offering a more efficient and scalable MCE solution.