China’s Standards of English Language Ability was officially issued in February 2018, which is expected to exert a profound influence on English language teaching and testing in China. To use the Standards in a fair manner, it is important to understand the different levels of the various subscales, particularly the distinguishing features of these levels.
As an initial effort to identify the distinguishing features, text mining was performed on all the descriptors on the Reading subscale of the Standards, using the ROST Content Mining System (Shen, 2009). In essence, the system carried out a series of algorithms comparable to verbal protocol analysis based on grounded theory. The descriptors, in the form of can-do statements, were fed to the system, which then automatically segmented each statement into words, and calculated the frequency of each word, as well as the frequency of each possible pair of words cooccurring in the same statement. The resulting frequency list and cooccurrence matrix provided the information for identifying the distinguishing features of eight of the nine levels of the Reading subscale, detailed below.
a) The top 20 high-frequency keywords in the descriptors covered a variety of features, including linguistic features, cognitive operations, topical features, text types, and types of message.
b) Linguistic features provided a crude division between upper and lower levels. The keyword “complex (language)” appeared above Level 3 whereas the keyword “simple (language)” disappeared beyond Level 4.
c) Cognitive operations provided more fine-grained dividing lines. The two top levels, 8 and 9, were no longer associated with “understanding”, but only with “evaluation”, which was also associated with Levels 6 & 7. Level 7 was the only level that were associated to some degree with “analyzing”. “Extracting (messages)” were associated with the lower Levels 3 & 4.
d) In terms of topical features, “people” was obviously lower (Level 2), “society” and “(familiar) topics” in the middle (Level 5), while “(one’s own) field” was reserved for the higher Level 8.
e) Text types and type of messages had only sporadic associations with the levels.
These findings can be interpreted in terms of meaning making of the levels in the Reading subscale of the Standards and used to guide English language teaching and testing practices. They can also be used to guide the ongoing alignment efforts between the Standards and some international tests of English as a foreign or second language.