Automated Scoring of Inquiry Learning Using Natural Language Processing

Donnelly, Dermot F.

New automated scoring technologies can score essays and drawings for forms of scientific reasoning that are valued in the workplace. We report on research showing that automated scores of students’ essays and drawing are as accurate as scores assigned by instructors. Our research explores effective ways to use these scores to guide students to develop more coherent ideas about scientific phenomena. Guidance designed to encourage students to gather more evidence and construct a better explanation leads to better learning outcomes than guidance primarily focused on either encouraging students to try again or guidance that primarily provides the right answer.

For example, one of our automated items in a thermodynamics unit asks students to explain which spoon (metal, wood, plastic) will feel the hottest after being left in hot water for 15 seconds. Students submit their initial response and this response is scored using Natural Language Processing (NLP) to detect scientific concepts. Students then receive targeted feedback based on this score. The feedback encourages students to revisit a relevant visualization to gather further evidence in order to write a new response. Some examples from our findings of the initial and new responses are:

Group 1 Initial Response:‘the metal spoon will fell [student spelling] the hotter’

Automated Feedback: Good start. Now improve your answer using evidence from the finger/bowl activity. Explain which spoon will feel the hottest, and why.

Group 1 New Response: ‘the metal spoon will feel hotter because it conducts more heat.’

Group 2 Initial Response: ‘the metal spoon will feel the hottest because it will conduct heat the best.’

Automated Feedback: You are on the right track. Now revise your explanation using evidence from the finger/bowl activity. Explain the relationship between how the spoons feel and their temperature.

Group 2 New Response: ‘the metal spoon will be the hottest because it will conduct the best. Also metal feels more hot even if they're the same temp.’

These findings can help instructors engage students in exploring complex topics such as photosynthesis, global climate change, or density rather than memorizing details. This presentation will showcase these automated scoring items and rubrics, demonstrating how inquiry can be accurately and reliably assessed to enhance student understanding of science.