Research papers investigating “retrieval practice” continue to be published rather regularly. Fortuntely, at a pace that I can almost keep up to date with.
This paper by Zheng et al., (2023) caught my attention as you rarely read papers which consider both ‘retrieval practice’ and the impact of ‘working memory,’ on the well reported and upheld ‘testing effect.’ (Testing or quizzing is a more effective retention strategy than, in this case, re-study. And ‘testing’ of learning, is just one component part of a suite of strategies contributing to test-enhanced learning).
You know the routine. Learning phase, then practice phase, in which half of the learned items are studied again (i.e., restudy) and the other half tested (i.e., retrieval practice). A final test phase. The tested condition items are recalled more effectively than the restudy condition items.
What Zheng et al., (2023) explore want to know is “how WM capacity affects the testing effect,” both backward (retrieval) and forward (potentiated).
Zheng et al., (2023) are quick to point out that both Tse et al., (2010) and Agarwal et al., (2017) reported that participants with lower WM capacity benefit more from retrieval practice than those with higher WM capacity, but also that this effect only emerged when feedback was given. Certainly, in lessons and even more so as homework, some of the pupils I teach, report feeling “comfortable with quizzing.” (We do not call “retrieval practice” either retrieval practice or testing).
Second, when looking at test-enhanced learning with working memory in mind, Zheng et al., (2023) are also keen to point out the two processes enacted from retrieval, or presenting a cue, or quizzing, or asking a question. The separate contributions from a retrieval attempt process, and a post-retrieval re-encoding process. For the retrieval attempt, that is, the maintenance of retrieval cues (holding the question in mind), the active search for targets to that cue, and maintenance of retrieved information. Second, for the, post-retrieval re-encoding process, either strengthening the target, or assimilating and modifying the target.
SImply – as a learner, there is a significant draw of working memory resources, and as a teacher, therefore there is a lot to consider when presenting a retrieval opportunity. Not least the length of the cue, the complexity of the cue, the diagnosticity of the cue, the breadth of content the target can be drawn from, the chronology of when that target was last accessed, and the demands of maintained the retrieved information, especially if multi-part, extend or complex. (Note – keep failure rates low).
Unsurprisingly, therefore, Zheng et al., (2023) predict that if working memory resources are abundant, the testing effect will emerge regardless of the working memory demands of the stimuli. In contrast, for participants with limited working memory capacity, the testing effect will only emerge when working memory demands are low. (And yet, as a teacher, I have no WM measure to work with).
Working memory demands were manipulated by having either items as either high-frequency (HF) or low-frequency (LF) through three weeks of pretraining of Fribbles. Yes – Fribbles. Novel stimuli for use within behavioural research.
Also note, that given this extended commitment, the power of this study is relatively small. And yet, from an educational perspective, it is one of the few retrieval studies that ‘feels’ like a teaching cycle.
What did we learn?
The researchers again confirmed the testing effect. Also, that testing effect was moderated by both individual working memory capacity and working memory demands. That is, where participants had “spare” working memory capacity, they benefited from retrieval practice regardless of the stimulus frequency.
Takeaway
The results demonstrated that, as far as working memory is concerned, retrieval practice is a costly learning technique. Therefore, so is any quizzing or questioning style task and although that is not a bad thing, it is worth teachers knowing.
The results also suggest that successfully retrieving an item does not guarantee that the memory is more effectively strengthened than restudying the item. The testing effect involves a post-retrieval process that further strengthens memory – that is important. Hence why leveraging the gains of elaborative interrogations of BOTH correct and incorrect responses are viable secondary strategies for teachers. “That is interesting! – What makes you say that?” (Do remember there is an efficiency cost to these diversions.)
The “bottleneck of working memory only emerges when working memory demands challenge working memory capacity.” Makes sense. Why is this important? Teachers need to know what to do when working memory capacity is exceeded. (Time for pupils with shallow working memory is crucial).
As teachers, we are dealing with cues or questions and not Fribbles – two recommendations to lower the working memory demands of the cue.
- Does the cue need simplifying (language) or shortening? Does the diagnosticity need tightening? Can we offer a hint within the question and along side the question?
- Regarding the response. Always show the “correct and accurate” response and importantly, do not overlook the demands of the “re-encoding process” that occur after the correct answer is obtained. (This is a new consideration for me as a teacher). Always expect pupils to upgrade, uplift their own response to at least match the “correct and accurate” response. This is important for pupil agency / responsibility and for learning.
On designing cues or questions – and the requirement for responses to match the target.
This is a just an aside to help illustrate the importance of re-encoding above. First, remember, giving feedback has a cost. It is ‘time’ costly. Second, using test-enhanced learning means cues and targets. Pupils want to know if they were correct. (I want to know if pupils were confidently, correctly, and accurately correct).
I use a 2, 1 and 0 mark scheme and the promote “Were you correct and accurate? Then 2 marks. 1 mark if you were correct.”
It is helpful when leveraging the empirical benefits of self-assessment and in our attempts to reduce time costs of following up on self-assessment queries, I use the phrase “correct and accurate.”
Take the three versions of the “diamond” question below. All query the same knowledge, to varying complexity and difficulty.
Question or Cue | Response | Hint |
On what code and what colour scale are diamonds graded? | D (colourless) to Z (light yellow) | Alpha codes |
Diamonds are coded D-Z and graded c_________ to l_____ y_______. | D___________ are coded ___-___ and graded colourless to light yellow. | Alpha codes |
Diamonds are graded D-Z. What colours do grades D and Z represent? | Colourless to light yellow | Is D a colour at all? |
It is tricky. The first possible error is that the question probes two parts. Code and Colour. Solution: Write two separate questions. What code? What colour?
As for the responses. What mark(s) would you give the following answers to “On what code and colour scale are diamonds graded?”
- Colourless and light yellow (pupils would be expected to add D with colourless and Z to light yellow)
- Colourless and yellow (pupils would be expected to add D with colourless and Z and light to yellow)
- D-Z (pupils would be expected to add colourless with D and light yellow to Z)
In essence, the finished quiz, complete with additions would be worth full marks.
1 and 3 are correct, none are accurate?
Hence we always mark questions out 2 marks, we use self-assessment 90% of the time, and we always expect the full, accurate and correct answer recorded in exercise books following self-assessment.