Clock gating is one of the important techniques to achieve low power and small area in high-performance synchronous circuit design. In this paper, we propose a three-phase clock gating optimization methodology by using clustering and merging algorithm to construct a gated clock tree with minimal number of clock gating cells and buffers. In addition, according to the fan-out numbers of a clock gating cell, we derive a parameter r that may used to adjust the tradeoff between clock gating cell and buffer. The experimental results show that the number of clock gating cells and buffers reduced in each phase in our algorithm. Our solutions are better than greedy approach.