During the operation of electrical equipment, a wealth of defect description texts are recorded by inspectors. They contain abundant defect information that greatly contributes to the fault diagnosis of electrical equipment. But this valuable information is still untapped because record texts are unstructured, professional, and mixed with numbers and unit. This paper has two contributions. Firstly, a text preprocessing stage is established. Secondly, a proposed attention-based deep learning network constructs the mapping relationship between defect texts and defect severity. Particularly, the word embeddings representing texts are fine-tuned, while the word embeddings representing numbers and units are fixed. The experiment result shows the proposed learning model has a superior classification ability compared to shallow learning models This research not only provides a new technology for processing grid text data, but also helps to build the smart grid integrating multi-source heterogeneous data.