Synthesizing physically plausible human motions in 3D scenes is a challenging problem. Kinematics-based methods cannot avoid inherent artifacts (e.g., penetration and foot skating) due to the lack of physical constraints. Meanwhile, existing physics-based methods cannot generalize to multi-object scenarios since the policy trained with reinforcement learning has limited modeling capacity. In this work, we present a framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. The key idea is to decompose human-scene interactions into two fundamental processes, Interacting and Navigating, which motivates us to construct two reusable Controller, i.e., InterCon and NavCon. Specifically, InterCon contains two complementary policies that enable characters to enter and leave the interacting state (e.g., sitting on a chair and getting up). To generate interaction with objects at different places, we further design NavCon, a trajectory following policy, to keep characters' locomotion in the free space of 3D scenes. Benefiting from the divide and conquer strategy, we can train the policies in simple environments and generalize to complex multi-object scenes. Experimental results demonstrate that our framework can synthesize physically plausible long-term human motions in complex 3D scenes.