In this work, we present an approach to jointly segment a rigid object in a two-dimensional (2D) image and estimate its three-dimensional (3D) pose, using the knowledge of a 3D model. We naturally couple the two processes together into a shape optimization problem and minimize a unique energy functional through a variational approach. Our methodology differs from the standard monocular 3D pose estimation algorithms since it does not rely on local image features. Instead, we use global image statistics to drive the pose estimation process. Experimental results on both synthetic and real images are provided, which highlight the robust performance of the technique in challenging tracking and segmentation applications.