^{1}

^{1}

^{1}

Knowledge tracking model has been a research hotspot in the field of educational data mining for a long time. Knowledge tracking can automatically discover students’ weak knowledge points, which helps to improve students’ self-motivation in learning and realize personalized guidance. The existing KT model has some shortcomings, such as the limitation of the calculation of knowledge growth, and the imperfect forgetting mechanism of the model. To this end, we proposed a new knowledge tracking model based on learning process (LPKT), LPKT applies the idea of Memory Augmented Neural Net-work(MANN).When we model the learning process of students, two additional important factors are considered. One is to consider the current state of knowledge of the students when updating the dynamic matrix of the neural network, and the other is to improve the forgetting mechanism of the model. In this paper we verified the effectiveness and superiority of LPKT through comparative experiments, and proved that the model can improve the effect of knowledge tracking and make the process of deep knowledge tracking easier to understand.

With the continuous development of Internet education, the use of artificial intelligence technology to promote education has become an inevitable trend [

Although DKVMN has made a breakthrough in the field of knowledge tracking, which greatly improves the efficiency of knowledge tracking and improves the interpretability of deep knowledge tracking model, there are still several problems as follows:

First of all, there are limitations in the calculation of knowledge growth. In DKVMN, the amount of knowledge growth is calculated by multiplying the learning activity of students’ answering questions and a trained embedded matrix, which means that the knowledge growth gained by students after each question answering activity is only related to this activity. However, in fact, from the perspective of human cognitive process, students’ knowledge growth in learning should also be related to students’ current knowledge status [

Secondly, it relies too much on the forgetting mechanism of the model itself. In DKVMN, the updating process of students’ knowledge state is inspired by LSTM forgetting mechanism. Firstly, the “erase” vector is calculated by the hidden layer of a sigmoid activation function, and then the “erase” vector and the student’s knowledge growth are used to update the dynamic knowledge state matrix of students [

Finally, the forgetting mechanism is not considered in the prediction process. DKVMN uses the advantages of large capacity memory of MANN to model the learning process of students. Originally, MANN is widely used in intelligent question answering and machine translation. It stores the learned knowledge in the dynamic matrix by reading a large number of documents. Therefore, the process of intelligent question answering and machine translation is similar to the process of retrieval. However, in the field of knowledge tracking, the prediction process of predicting whether students can correctly answer the next question is not a simple retrieval process. It also needs to consider the students’ memory forgetting in learning, which is obviously not considered in DKVMN.

According to the above, we can see that the superiority of the current deep knowledge tracking method is attributed to the deep learning model itself. In essence, to achieve a better knowledge tracking effect, we need to start from human learning cognitive psychology and complete the knowledge tracking process by simulating students’ learning and memory process. Therefore, we propose a knowledge tracking model based on learning process (LPKT), it adopts the idea of Memory Augmented Neural Network to model the learning process of students. We make two improvements: one is to consider the current knowledge state of students when updating the dynamic matrix of MANN, the other is to improve the forgetting mechanism of the model to make the reading and writing process of the model conform to the learning forgetting mechanism of human.

The knowledge tracking model based on learning process (LPKT) aims to complete knowledge tracking by simulating students’ learning and memory process. Its structure is shown in

The attention mechanism in MANN can be understood as finding the knowledge points involved according to the students’ questions in the answering activities. In the answering learning activities, attention will be used in the reading and writing process of MANN. The calculation process of attention includes:

firstly, according to the problem encountered by students at t moment, multiply it with a trained embedded matrix A to obtain vector k t , and then process k t through static matrix K to obtain attention vector w t . The calculation process is as follows:

w t ( i ) = s o f t m a x ( k t T • M k ( i ) ) (1)

where K(i) represents the vector represented by the i-th row of the matrix K, represents the i-th knowledge point c i , and w t ( i ) represents the attention paid to the i-th knowledge point, that is, how much proportion of the problem involves the knowledge points, and the symbol ● represents the inner product operation between vectors.

The reading process is the prediction process of knowledge tracking. Firstly, according to the attention vector, the students’ mastery of the knowledge points involved in the problem is read from the knowledge state matrix of the students. In the DKVMN, this process is calculated as follows:

r t = ∑ w t ( i ) V t ( i ) (2)

However, considering the forgetting mechanism in the learning process, we have to carry out two additional steps. First, we calculate the amount of knowledge forgetting of the student according to his knowledge state V t :

e t k = sigmod ( E e k V t + b e k ) (3)

Then, referring to the forgetting mechanism of LSTM, according to the forgetting vector, attention vector and input vector, the knowledge state V t ' in accordance with the learning law of students is calculated:

V t ' = V t ( i ) [ 1 − w t ( i ) e t k ] (4)

So we can modify the formula to:

r t = ∑ i w t ( i ) V t ' ( i ) (5)

Then, the knowledge state vector r t and the input vector k t are processed by multi-layer perceptron to get vector f t , which reflects the students’ knowledge state and the characteristics of the problem itself, such as the difficulty of the problem, and shows the comprehensive knowledge state of the students for a specific problem:

f t = tanh ( W 1 t [ r t , k t ] + b 1 ) (6)

Finally, the vector a is passed through the sigmoid output layer:

p t = sigmod ( W 2 T f t ' + b 2 ) (7)

Then we can get the probability that students can correctly answer the question. So far, the reading process of knowledge tracking method based on learning and memory process has been completed.

The process of writing in MANN is to update students’ dynamic knowledge state in knowledge tracking. Firstly, according to the model mechanism of MANN, a question answering activity x t = ( q t , a t ) is multiplied by another embedded matrix B to obtain vector v t , and v t represents the knowledge increment gained by students. Because Ha points out that the knowledge increment of this dependent model is not enough to express the students’ gains in the learning process, and proposes that the knowledge state of students should be considered when calculating the knowledge increment of students, so the knowledge increment of students is expressed as v t ' :

v t ' = [ v t , f t ] (8)

After we get the increase of students’ knowledge, we update the dynamic matrix V in MANN with the method similar to the “forgetting gate” mechanism in LSTM, which is called “erase” in DKVMN. Generally, the calculation process of “erase” vector to determine the number of forgetting is as follows:

e t = sigmod( E e v t ' + b e ) (9)

However, from the formula, we can draw a conclusion that for the same student, as long as the amount of knowledge growth is the same, the “erase” vector is also the same, which is obviously contrary to common sense. Moreover, Ha points out that DKVMN model, as a method of calculating forgetting vector, will lead to too much forgetting content. Although Ha gives a regularization method to modify it, however, this correction method is not very explanatory.

According to human cognitive process [

e t ' = λ 1 e t + λ 2 e t k (10)

where λ 1 ∈ ( 0 , 1 ) , λ 2 ∈ ( 0 , 1 ) , λ 1 + λ 2 = 1 , the initial values of λ 1 and λ 2 are 0.5.

When the value in the dynamic matrix V is “erased”, we calculate the update vector according to the knowledge growth vector, and the calculation process is similar to that of LSTM:

α t = tanh ( W a v t ' + b a ) (11)

Finally, through the process of “erasing” and then updating, the updating process of students’ dynamic knowledge state value is as follows:

V t = V t − 1 [ 1 − w t ( i ) e t ' ] + w t ( i ) α t (12)

That is to say, after the students’ answering behavior at time t, the value of dynamic matrix is transformed from V t − 1 to V t .

The optimization goal of our model is to minimize the difference between the predicted value and the actual value of the students’ answer results, that is, to minimize the cross entropy of p t and a t . So our loss function is:

L = − ∑ t ( a t log p t + ( 1 − a t ) log ( 1 − p t ) ) (13)

And we use the random gradient descent method for training.

We verify the effectiveness of our method on the data sets of ASSIST Ments 2009 [

This paper uses Apache MXNet, an open source deep learning software framework, to implement a knowledge tracking model based on learning and memory process, the experimental results are compared with the current knowledge tracking methods.

Standard DKT model: Piech uses luascripting language to implement DKT model in torch framework. In order to facilitate the experiment, this paper uses Python 3.6 to reimplement DKT model in tensorflow GPU version 1.9.0 framework, and the implementation code refers to public code.

DKVMN model: DKVMN model is currently the best model for knowledge tracking. In this paper, we use the public code provided by Zhang on GitHub.

In addition, in order to compare with DKVMN, the setting of memory

Datasets | #Students | #Questions | #Records |
---|---|---|---|

ASSIST Ments 2009 | 3091 | 110 | 315,527 |

ASSIST Ments 2015 | 14,228 | 100 | 628,507 |

augmented neural network in the two models is referred to Zhang’s model setting. As shown in

We divide the data set into training set, cross validation set and test set. The training set accounts for 60% of the data set, and the cross validation set and test set account for 20% respectively. We use cross entropy as the loss function, use SGD optimization algorithm to train, and set the learning rate to 0.005. We use the area under receiver operating characteristic (ROC) curve to measure the performance of the model. Firstly, as a performance measure, AUC has been widely concerned in the field of machine learning, especially for class imbalance problems. In addition, most of the papers in KT field use AUC as the evaluation index, so our experimental results can be easily compared with those in KT field.

Our experimental results are shown in

It can be seen that in ASSIST Ments 2009 data set, the AUC value of LPKT model proposed in this paper (82.35%) is 1.71% and 1.17% higher than that of DKT model (81.18%) and DKVMN model (80.64%), while in ASSIST Ments 2015 data set, the AUC value of LPKT model (73.83%) is 4.62% and 1.15% higher than that of DKT model (69.21%) and DKVMN model (72.68%). From the comparison, the LPKT model in this paper shows better tracking effect. Compared with the DKT method using LSTM, DKVMN and LPKT using MANN have larger capacity to remember more content, so the effect of knowledge tracking is significantly improved. The advantage of LPKT over DKVMN is that

Datasets | Parameter | Value |
---|---|---|

ASSIST Ments 2009 | Dimension of embedded matrix A | 110 × 50 |

Dimension of embedded matrix B | 220 × 200 | |

Dimension of key matrix K | 110 × 50 | |

Dimension of value matrix V | 110 × 200 | |

ASSIST Ments 2015 | Dimension of embedded matrix A | 100 × 50 |

Dimension of embedded matrix B | 200 × 100 | |

Dimension of key matrix K | 100 × 50 | |

Dimension of value matrix V | 100 × 100 |

Model | Datasets | |
---|---|---|

ASSIST Ments 2009 | ASSIST Ments 2015 | |

DKT | 80.64% | 69.21% |

DKVMN | 81.18% | 72.68% |

LPKT | 82.35% | 73.83% |

its improved forgetting mechanism and calculation method of knowledge growth are more in line with human’s learning and memory process, so it shows better knowledge tracking effect.

In order to further analyze the tracking effect of the three knowledge tracking models in the test set, we observe the change process of the test set results of the three knowledge tracking models with the increase of the training iterations of the models. As shown in

In addition, we also analyze the results of training set and verification set of DKVMN and LPKT in the training process. As shown in

large. This means that although the LPKT method increases the parameter scale of MANN, it does not cause over fitting phenomenon to the model, that is, the increase of parameters is scientific and reasonable.

Based on the cognitive learning process of human, we proposed a knowledge tracking model based on learning process by improving the forgetting mechanism and knowledge growth mechanism of the existing knowledge tracking model, which makes the knowledge tracking more consistent with the human learning process and enhances the interpretability of the model.

In this paper, the LPKT model is compared with DKVMN model and DKT model on two datasets. The experimental results show that the AUC score of LPKT is significantly higher than that of DKVMN and DKT on the two datasets, and there is no over fitting phenomenon on the premise of increasing the parameter scale. This fully proves the effectiveness and superiority of our model. LPKT can be applied to a variety of online education platforms to help educators to achieve personalized guidance.

The authors declare no conflicts of interest regarding the publication of this paper.

Zou, Y., Yan, X.D. and Li, W. (2020) Knowledge Tracking Model Based on Learning Process. Journal of Computer and Communications, 8, 7-17. https://doi.org/10.4236/jcc.2020.810002