This study investigates how raters make their scoring decisions when assessing tape-mediated speaking test performance. 24 Chinese EFL teachers were trained before scoring analytically five sample tapes selected from TEM4-Oral, a national EFL speaking test designed for college English major sophomores in China. The raters' verbal reports concerning what they were thinking about while making their scoring decisions were audio-recorded and collected during and immediately after each assessment. Post-scoring interviews were used as supplements to the probe of the scoring process. A qualitative analysis of the data showed that the raters tended to give weight to the content, to punish both grammar and pronunciation errors and to reward the use of impressive and uncommon words. Moreover, the whole decision-making process was proved to be cyclic in nature. A flow chart describing the cyclic process of hypothesis forming and testing was then proposed and discussed.