false
Catalog
Constructing the Exam
Cons Exam
Cons Exam
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Welcome to part three on this module on constructing valid examinations for our students. In this third module, we're really going to get to the meat of what we've been talking about, which is actually writing the exam items, also known as questions that go on our exams. The objective for this module is going to be to construct valid items utilizing standard item writing conventions. As a professional educator for many years and having served on exam committees for our National Certification Board, I have a lot of experience in writing exam questions, and I'm going to share some of those with you in this module so that you can put together exam questions that are valid and get to the point of what you're trying to assess in your student and to make sure that the questions are clear and doing the thing that you intend them to do, which is to assess the appropriate competencies and knowledge in your students. Just as a review of what led us to this point, we talk about the purpose of assessments being to ensure accountability for useful, applicable knowledge in our students, and I go over that point and kind of hammer that point over and over again because it can be very easy, as we'll see here in a minute, as we write test questions and exam items, it's really easy to write things that just assess a very basic kind of memorization of random facts. Those are very simple and easy to put together, and they probably, unfortunately, do not assess the useful, applicable knowledge that we want our students to have. So that's an important thing to always keep in our mind. Second thing is we're going to use exams and various assessments to assess students' progress toward meeting cascading objectives, cascading meaning the modular objectives that lead up to the course objectives that lead up to the program objectives. So we want to keep that in mind also as we put our exam together, once again, to make sure that although we could construct an examination that's filled with a bunch of facts and figures for the person to memorize, the question we have to ask ourself is would that really demonstrate to us that the student has met the objectives, which probably look a lot more lofty, such as being able to make valid decisions about, you know, differentiating medications or therapies that should be done or sequencing steps that need to be done in taking care of patients undergoing surgery. We need to assess the elements which demonstrate the achievement of those objectives, in other words. So that's an important piece. And then we also want to think about the cognitive level that lends authenticity to the exam. So once again, same concept. We want to make sure that we're not leading always to the very basic competency or cognitive level of knowledge-based questions. In other words, we don't want to just ask the student to memorize something and regurgitate some fact or just state, you know, what is the definition of this or what is the dosage of that. But we want to really use the right cognitive level that's going to be based on where the student is in their educational process. So what is the challenge? Has we put together assessments that are valid? Well, it's very easy to put together an exam that assesses low level of cognitive skills, first of all. Just it's easy to write questions that look like that, even though that's not necessarily what we need for our students. Another challenge is that the exam might not relate to the thought processes or the decision making that will be required in practice. So sometimes we can ask the students to recite certain things, but we have to ask ourselves first, is that really what they're going to need to do in practice? Is this really the type of thought process that they're going to have to pull out in practice? So if we say to the student, for example, you know, what are the, you know, five symptoms of this thing, that might or might not be the thought process they need to have in practice. In practice, they might need to differentiate the symptoms of this from the symptoms of something else that are related to it so they can follow the correct path. Another challenge is that the exam items become invalid sometimes due to non-exclusive answer choices or content irrelevant variants or construct irrelevant variants, which we'll talk about in this module. So the challenge is that it's not so easy just to put together a question and some answer choices and think that it's necessarily going to tell you that the student is meeting the objectives that you think that they are meeting. Some of the terminology to get started with is that what you're going to hear me using is first of all, we'll talk about items. And so the technical, the technical term for a test question, which we typically call a test question, is not really a question, but a test item. And there's a subtle distinction there, but it's an important distinction because we're not just asking questions. Sometimes an item might be an instruction or pairs of choices that the person needs to put together or things that they need to put in a certain order. So technically, we're going to call test questions, test items. The stem of the item is the lead-in. It's the presentation. It's the thing that says the student, here's the scenario. And then the responses are the answer choices or the single answer choice that the student is going to give in response. And then of the responses, of the potential responses that the student gives, the one that is correct is called the key in a multiple choice question. If you have A, B, C, and D choices, you know, choice B might be the key. That might be the correct answer. When we think about putting the question together, for example, on the right side of the screen here, we're going to start with our blueprint. Again, we're going to start with the objectives. What is it that the student needs to know or needs to do as a result of passing this particular module in the course? And so that objective might be, for example, to apply gas laws to real life situations. In that case, the stem of the question is going to be this part, this right over here. An endotracheal tube is inserted and the cuff is inflated to a pressure of 20 centimeters water. According to Charles' Law, what will happen to the cuff pressure as the air in the cuff is warmed up to body temperature? And so the response is then, that's the stem. The responses are A, B, and C. It will go up, it will go down, or it will not change. Out of those three response choices, if the student chooses response A, that's the key, if the pressure would go up, then that student would be correct in that choice. So we've got the blueprint, we've got the stem of the question, we've got the responses down here. One of those responses, and sometimes more than one response, is going to be called the key, the correct response that we're expecting from the student. Okay, some very basic rules then as we think about putting together our exam items. First of all, the stem should be straightforward. It should avoid extraneous information that is not required to get to the point that we're trying to make, or trying to have the student demonstrate for us. The distractors should be plausible, but clearly incorrect. So that's really part of the big challenge when you put together a multiple choice item, is that to come up with distractors, in other words, choices that are not correct, that are not correct, clearly, but also could be a consideration for the student. Because if you have distractors that are clearly incorrect, then you're really not just not giving the student very much to think about. Sometimes you end up with one clear answer and three clearly wrong answers, and you're not really assessing very much with the person's ability to pick one thing out of a list where it doesn't fit. The key should be clearly correct. The distractors should be mutually exclusive. In other words, there should be no overlap between one distractor and another. So it should be very clear that choice B and C, for example, should not both be kind of related to the same thing or part and parcel of the same thing. Because once again, it becomes easy for the student to sort of jockey and game the exam and look at that item and say, well, these two are kind of both similar, so neither of them could be correct if I just have to choose one correct response for this question. So we'll look at some examples as we talk through some of these rules that we'll go through. So done incorrectly, for example, an endotracheal tube. Here's the stem. An endotracheal tube is inserted, and the cup is inflated to a pressure of 20 centimeters water. What will happen to the cup pressure as the air is warmed to body temperature? OK, it will go up. It will go down. It will not change. It will get higher. OK, so I added a response choice here as opposed to when I use this example just a second ago. And you will notice that there's some overlap now. So now we have two choices. One over here says it will go up and another says it will get higher. Those both basically say the same thing. So in this kind of example, I'm you know, I said, well, up, down, not change. What else could there be? Because I want four choices. But I put in a fourth choice just to just to fill the blank and have actually overlapped now. And so going up is the same as getting higher. And so now one of two things is going to happen. Actually, it's an invalid question because both A or D could be considered correct answers. But actually, the smart student would look at that and say, well, gosh, A and D are both the same thing. So neither of them could be correct. And so it's going to lead the student to a to an error. They're going to probably choose B or C, just thinking that that's what makes more logical sense, even though neither of those is actually the correct key to the question. Here's another example. A trauma patient wrecked their car in the snow and comes to the OR after a prolonged extrication. You intubate and inflate the cuff to a pressure of 20 centimeters water. According to Charles Law, what will be the cuff pressure as the air is warmed to body temperature? So now we've got two problems with this with the responses in this in this case. First of all, the stem is a problem because the stem is very wordy. And this kind of wordiness leads to something called construct irrelevant variance. In other words, it leads the student to think about things and to consider things that you didn't intend them to consider. We're really just trying to test one construct that the student understands one thing, which is what is the relationship between pressure and temperature of a gas? And in this case, we've added a lot of extraneous information. And sometimes we do this because we feel like, oh, it's going to give them something to think about. We're going to make them think through a more difficult process. And actually, what it does is for a student who's really very astute, you know, they might read into it because they'll say, well, you put this information here for a reason. So I'm going to use it. And in this case, they might say wreck the car in the snow and they come in after a prolonged extrication. I'm thinking that they were out in the snow for a long period of time. Maybe their body temperatures are very low. And maybe what you're asking me is that the cuff pressure does not go up because they're they're very hypothermic. So sometimes by putting this extra kind of window dressing in the stem is actually problematic because it does not make for a more robust question per se. It just makes for a more confusing question. So we want to be thoughtful about constructing a stamp that is really succinct and gets right to the point of what we're looking for. The other problem that we have here in the response choices is that we've changed now from, you know, it'll go up, it'll go down to the and not change to putting some numbers in. And this kind of makes sense because it can be difficult to, you know, with this type of example, when you're trying to say, well, I just really want them to say it goes up or goes down. You know, I could put in some questions, some numbers and say, well, if you know that we started at 20, then it could go, you know, 15 would be the same as saying it goes down and 25 would be the same as saying it goes up. But actually, 25 and 30 both would mean it goes up. And unless we really know very specifically, you know, the degree to which it would change, which we don't, then we've got to answer choices that really should, you know, probably both be correct. And then the other thing is that we've put a non plausible distractor in here. And so once again, the distractors, the answer choices should be plausible. It should be something that gives the student pause to consider and say that could be right. And when we put these things in that don't really make sense at all, that really does not contribute to the student's thought process or to have them thinking about, you know, the actual construct that we are trying to assess. So a quality stem, the stem of the item will focus on a single problem or a single theme. So we want to try to avoid that window dressing, even though it sounds kind of interesting and sounds fun and gives the students a lot to think about. Unfortunately, sometimes it gives them a lot to think about that actually distracts them from actually getting to the point that you're trying to assess in the first place. We want to avoid excessive wordiness as a general rule, and it leads to that idea that construct irrelevant variance. That concept of construct irrelevant variance really has to do with a lot of different things. And typically it means an exaggerated increase in the performance on the exam. In other words, an artificially elevated score or performance. And usually it could be due to a number of things. It could be due to queuing on the test. We'll talk about queuing in a minute. It could be that test items are shared across cohorts. So you don't have good exam security and one class tells the next class down about a particular question that stood out in their mind. And so they tend to get that question right just because they heard it before. Construct irrelevant variance could be related to reading ability, you know, very simple things like that. You put very, you know, complex words and there are very simplistic words in there can change the performance on the exam. But as I'll mention that a couple of times as we talk about the test items themselves, you really want to think about it just in the context of a single test item. We want to be careful not to put stuff in there that misleads the student to think that we're asking about something else or to mislead the student to put in details that we don't intend to be there because they are not related to the construct or the concept, in other words, that we're trying to assess with a given exam item that we are presenting to the student. The article reference there at the bottom of the screen you can see is about reducing construct irrelevant variance in context rich scenario items. So in other words, when you're trying to write more sort of clinically valid, clinically relevant items, how to avoid that construct irrelevant variance, that kind of distraction. When we write a quality stem, we want them to be grammatically correct, both alone and in conjunction with the responses. So we want to make sure that if the responses kind of finish the sentence that we start in the stem, for example, that that makes sense the way it reads. Because anything that kind of seems to be an outlier, if the responses don't seem to go along with the stem, once again, it's going to make the student be thinking down a different line. They're going to be thinking, well, it doesn't sort of make sense grammatically, so therefore it's probably not the right answer. We want to have the stem point to the desired response. So we don't want these very generic, very wide open stems like succinylcholine is, and then a bunch of responses for the person to guess from after that. We really want to point to the student to say succinylcholine is which type of muscle relaxant or what classification of medication or something like that. We want to avoid teaching in the stem. We want to avoid a stem that says something like, because denervation injury causes proliferation of extrajunctional receptors, potassium increase can occur after succinylcholine. In what case should this not, you know, succinylcholine not be given or something like that? We want to avoid that sort of teaching because we assume that the teaching is done in the classroom. And again, keeping the wordiness out of the stem means that we don't want to give this student a whole lot of background or lead into it. You just want to get to the point, what is it that I want you to know? You know, what is the question I'm asking the student, and try to put that down as succinctly as possible in the stem of the item that you're writing. The last thing on this point is we want to avoid native wording. We want to avoid saying which of these does not happen or double negatives is doubly bad. This does not happen except when this other thing happens. If you do have to call things out, put them in capitals. Like if you're saying which of these does not occur, put that in capital letters because if the student gets the item wrong, we don't want them to get it wrong because they misread it grammatically. We want to make it very clear, you know, we want them to be very clear on what the question is or what the item is asking them to do. And then if they get it right or wrong, then that tells us that they had the concept or not, not that they just sort of misread it. So if we're asking about something that's a negative or something that we're flip-flopping back and forth on, you know, don't do this unless this other thing happens, those are kind of inherently confusing. So try to make, try to call out underline or bold or whatever, however you want to call out. It was kind of connector words to make sure that the student understands exactly what it is that you are trying to have them to think about. Last couple of points about quality stems, avoid the use of personal pronouns. And this sounds like a very minor point, but avoid saying something like the patient rolls into the operating room. What's the first thing that you do? Well, when I use the pronoun you, the examinee now has the opportunity to sit there and think, well, me, I'm a student. So the first thing I do is I call the staff person. I call my faculty person to come in. It might not be getting them to the exact construct that you're trying to assess them on. So avoid using personal pronouns that, again, add some element to the consideration that's not what you're actually trying to assess. And use present tense. The patient arrives in the operating room. What's the first thing to do? If you use past tense, for example, two things happen. First of all, it gives you a standard, you know, everything's present tense. The person arrives, I place the endotracheal tube. If you say the person arrived in the operating room, what are you doing right now? This introduces the idea of a time span. And if there's a time span, once again, the student can be starting to think about other things. Well, let's see, they were in the operating room already, so they're probably already intubated, and now they're going to get the sequencing off. So we want to be as standardized as possible as we put the questions together or the items together, and as straightforward as possible in making it very clear what we want the student to be able to respond to. If we have a very broad stem, so I'll show you some examples, succinylcholine is, is what? You know, it should be that just by reading the stem, that if the person is really smart about the topic, that they should be able to guess at what the answer would probably be. If you tell me that the question is succinylcholine is dot, dot, dot, I have no way of even taking a wild guess. You're going to ask me about a class of drug, you're going to ask me about when it's contraindicated, you're going to ask me about the cost of the drug. It really doesn't tell me anything at all. And then all the responses in this case, dosed at one milligram per kilo, it's contraindicated in pediatrics, it's the only depolarizing realaxant. You know, choice number four here is an interesting point, metabolized by pseudoclonestrase, and you'll notice I intentionally misspelled pseudoclonestrase there with a double U, double letter U. And that's a really interesting point because we want to proofread our exams, we'll get back to this point later as we wrap this up to talk about proofreading the exams, because that's another aspect of misspelling is another thing that can cause a context irrelevant variance because the student might look at that and say, well, that's not spelled correctly, so I don't think that's the correct choice, right? So actually succinylcholine, of course, is metabolized by pseudoclonestrase spelled correctly. But as a student, as I'm reading this and taking this exam, I might say, hmm, that's not spelled right. That's not the right thing. That doesn't trick me. The flip side of that, and, you know, some novice educators, you know, sometimes fall under this trap, is doing that on purpose, you know, actually trying to trick the student. Ha, I caught him. They said, you know, it's choice four, but that's not really the right word. But again, back to the competencies. What is it we're really trying to assess in the student? We want them to know, if we want them to know what enzyme breaks down succinylcholine, it doesn't matter if they know it's spelled correctly or incorrectly. What we want to know is, do they know pseudoclonestrase is what breaks down succinylcholine? So be really careful about intentionally or unintentionally putting misspellings and things like that, because it's something that's going to influence the performance of the item irrelevant to the student's actual understanding of the construct that you're trying to assess. All right, that's putting together quality stems. Now let's think about the responses. On the other hand, so the stem is the leading part. The responses are the answer choices. We want to think about having plausible responses to the stem. And so sometimes, once again, we have this concept that we want four choices, four responses to a multiple choice question. And we might find ourselves saying something like this, which medication is the first line therapy for acute bronchospasm, epinephrine, theophylline, albuterol, and gosh, I can't think of another one. I'll just put peanut butter. If you're doing this, you're really, you're doing nothing good for the student. You're doing nothing for the exam. And you're better off just making it three choices, because that last choice is really just a distraction that no one's going to choose. We hope, right? No one's going to choose, and it's really not doing anything at all that's good there. So avoid those things. You really want to put together a good exam. This is no different than just having the three choices, because this is not giving the person a one in four chance of getting the thing correct. When you put down answer responses, you might think about putting them alphabetically. We want to be careful about our own bias. Our own bias might be, well, if I'm putting out four choices, I don't want to put the correct key as choice A because they're going to find it right away. I don't want to put it as choice B because they're going to see it early on. I don't want to put it as D because they're going to see it last and they're going to remember it, and so where do we commonly tend to put our correct choice? It's answer C. To avoid our own human bias in doing that, a really great way to have yourself avoid doing that is to put things alphabetically, whether it's numerical or alphabetically, if it's text-based responses. If you always put things in an ascending order or descending order, but one way or the other, it will help you to avoid the natural human tendency to shoot for tucking or trying to hide the key in the middle of the response choices somewhere or another. You also want to make sure that the responses are distinct from one another, and that can be a really challenging piece. For example, if we ask a question about this, what is the appropriate size endotracheal tube for a three-year-old? Well, we've got a few problems with the answer responses here. We've got 8 millimeter, 4.5 millimeter, 4 millimeter, 5 millimeter, 17 French. A couple of problems that we have. One is that depending on how we calculate this, if we use Cole's formula based on age, we would come up with 4.5 as the correct size. If we use the Motoyama formula, we'd come up with 4.0, and so we don't have distinctness. We've got two answer choices that both could be correct. We want to think about referencing our exam questions and answers, items and answers to the literature. If there is more than one answer that can be, if there's some normal variability to it, then we want to think about other ways of presenting that item to the student. In this item also, the choice of 8.0 millimeter is like a peanut butter answer. If a student thinks that if they've been in the operating room even one day and they think that an 8.0 tube would go on a three-year-old, they're probably way, way off base. But the likelihood is that no one's going to choose that answer if they're taking this pediatric exam, and so that's not doing you any good. Then the other thing is that the 17 French, technically 17 French is the same as a 4.0 internal diameter tube. We want to have consistency in the responses. We want to make sure there's not overlap in the responses. If four is my answer, you see what I did, that human tendency, I tried to tuck it right in the middle so that maybe they wouldn't see it first, they wouldn't see it last. But really what I ought to do in this case is I should say four, 4.55, 5.56 or something along those lines, or to think of another way to present this. In a similar way as we think about overlap, there's a question about micro shock. The amount of electrical energy needed to induce V-fib when applied to the myocardium is 100-200, 200-300. Well, there's a little problem because if I think 200 is correct, then choice 1 or choice 2 could be the correct answer, so there's some overlap there. Technically, I should say 100-200, greater than 200-300, maybe something like that. But also now that the choices are so close together, because I didn't want to spread them out too much because it'd be too easy for them to know I'm looking for a low number. But also the answer choices now have become very close together. So that becomes difficult. Am I really assessing what I think I'm assessing? When I asked the students now to make the distinction between 100-200 versus 200-300, they're both small little amounts of electricity, which is the point that we're trying to get at. So in this case, maybe you're better off with a wider range. You might say, what's the minimum amount? Maybe you just put like 100, 300, 500, 800, one milliamp, two milliamps. Put a wider range and just say, what's the minimum amount that would be needed to induce V-fib and microshock, for example. Because what we're really doing here with these very fine little distinctions of these different levels of energy, is that they're all essentially the same. The point being that a very small amount of electricity is going to induce V-fib and microshock. So making the student memorize the fact that 100-200 versus 200-300 or that versus 300-400 was what was in the book, it doesn't really get to the very concept that you're trying to assess in the student. So we want to make sure that they are appropriately reference the literature also and allowing for natural variability. So when we talk about natural variability, think about things like this. There's a lot of variability in clinical practice, which is one of the things that makes writing clinical-based questions pretty difficult. In this case, for example, in a patient with 60 percent burn, which of the following should be avoided? Racharonia, morphine, succinylcholine, but only 24 hours after the injury, and Coumadin. So we've got a couple of problems in the construction of this exam here. One of them is that it's subtle, but it gets this point about referencing the literature. Some of the literature will say, well, burn patients get tolerant to opioids. So on one hand, a student might say, well, gosh, morphine should be avoided because the stem doesn't really point to the very key of the issue, right? Should be avoided versus is contraindicated, right? It would be a big distinction. Should be avoided, you might say, well, gosh, I know when I did this clinical case the other day, they said to avoid morphine because the students, because the patients don't really respond to it, and that's why we use ketamine. The other problem with the responses on this test item are that choice C, once again, we tuck the answer into choice C, hoping that the student wouldn't notice it. But choice C also, you notice, is very different in construction. The student would look at this and say, let's see, it's capitalized, number one, that makes it stand out, and also it has a lot of details. There's a caveat to there, succinylcholine, but with the caveat only if it's 24 hours past the acute injury. Well, that tells the student, the student's going to look at that and say, why do they feel the need to put the caveat in there? Obviously, because they had to make that be the correct answer, because succinylcholine in and of itself would not necessarily be the correct answer without that caveat because we need more details. The fact that you had to put it in there makes it really stand out, and so we want to be really careful about putting that response that cues the student, that's called cueing, cues the student to know that this is something that is probably the answer or the key that you wanted. We want to be careful about having the response to be similar in length. We want to avoid cueing. We want to avoid things in the responses that give the student a little heads up like this one stands out for one reason or another. It's grammatically different. It's different in length. It's capitalized. It's written as a full sentence or whatever it is. We want to keep the responses succinct also and put the details in the stem. That's the other piece that we want to do is that, for example, in this case, if we had more details and we wanted to put that caveat in there, instead of saying, which is true, once again, a very unfocused stem, and then three choices about succinylcholine that all are the same length. Now, we've fixed that piece, except that now we've got three responses that each talk about something just a little bit different. One talks about timing of it. One talks about using burn patients in general. One talks about the potassium level. Maybe a better, more straightforward thing would be to put a little bit more detail in the stem and more straightforward, when can succinylcholine be used in a burn patient, period. Now, we can have three answer choices that don't have a great variability. Of course, choice A might say it's never useful or should never be using a burn patient, just to keep real consistency between the responses. But now, we've got three response choices that are a lot more straightforward and a lot more easy for the student to look at and get to the point that you're trying to get them to. Final thing is we don't want to use absolutes, always, absolutely, never, all of the above, none of the above, all things that really don't help us because they're things that tend to shift the student's attention, either for or away from those answers. It gives them a big cue. Usually, when you say that, as you know in anesthesia, very rarely is anything always done or always the right answer. We want to be really careful about using those words in the distractors because usually it cues the student to know that that's not going to be the right thing. Now, the multiple choice question is obviously the most common, probably the most common question item or the item types that we tend to use. But there are other formats that we can use other than the MCQ or the multiple choice question. The intention then is to lend more authenticity to the assessment. In other words, that we are assessing the student's ability to do something other than connect some words or connect some concepts that are presented in a textual basis, but rather to make distinctions, to make ordering between things, order and prioritize things. We can use various alternate formats to provide this opportunity for the test taker to demonstrate these competencies. They're not just related to pick out the right answer out of a number of choices. We'll talk about some of these individual item types and how you might be able to employ them and when they might be useful. Multiple correct response is a good one because it provides more discrimination. When we say discrimination in assessment parlance, we're talking about discriminating. What the exam is supposed to always do is discriminate between someone who is highly performing or highly achieving in these various concepts that we're trying to assess or low performers on the concept. So discriminatory value just means to what extent does the item individually or the exam as a whole tell you what you're trying to figure out by giving the test, which is who knows this stuff versus who does not know this stuff. Obviously, you might say, well, of course, that's what a test does. But when we think about that construct irrelevant variance, if there are distracting, misleading things, if there are misspellings in the test items, someone might get the test question right or wrong, not because they knew the concept or not, but because of something else. And so that type of thing is a great example of something which lowers the discriminatory value of the exam. It does not do as good a job of actually telling you who understood and did not understand. It just said that your exam was kind of confusing. So multiple correct responses have better discriminatory value. They will ask the examinee to create finer distinctions. So it might be which of these four out of these eight things is the most common. You can ask them to prioritize things. You can ask them to order things. What are the first four steps? These might all be correct steps that you do, but what are the first four steps or the first three steps or the first eight steps? One of the conventions of putting these choice multiple correct response items together is that you want to provide twice as many choices or responses as the correct number that you're seeking. So if you ask them to choose four of something, then you're going to want to have eight responses. So it's not as useful to, say, choose the four out of six that are presented. But as a general rule, as a general convention, you want to have twice as many responses as the number of choices that the examinee is going to choose. And then in terms of scoring, you can choose to score these with partial credit or as a binary right or wrong answer, meaning that either they need to get them all right. It's either all right or all wrong. Or you could choose to give partial credit. There are some reasons why you might go one way or the other. If the answer choices seem to be all sort of, maybe they're all different sort of concepts, you might say, which of these four medications will be appropriate in this particular scenario? Maybe you'd be happy that the person could identify two of them. And if they didn't get all four, we still give them partial credit. If you ask them, put these four steps in order, and they get two steps right, but two steps wrong, you might say, well, they wouldn't be able to do the thing in the right order. And I'm going to give them no credit whatsoever for it. So that's up to you as you determine how to put the exam together as to whether you want to do an all or nothing or a partial credit on these multiple correct response. An example of this format might be something like this. While pulmonary embolism can present in various ways, which are the most common presenting signs? Now, one of the conventions that our students will see on the certification exam is that because the exam is an adaptive exam, it can only score an item as correct or incorrect in order to know where to send the student for the next item. And so for anything that's a multiple correct response item, it's going to tell the student, choose X number. It's going to tell the student how many choices that they need to come up with. And that's important because if you say, which of these are found in pulmonary embolism? Actually, all of these could be found in pulmonary embolism. And so it's just sort of a little bit of a gamble for the student to figure out where's the cutoff? You know, what point did you want me to cut off? Which ones are more common or less common? So it's a really nice convention. You know, if you want to help the student to be really clear on what your expectations are is to say, choose four of these or choose two out of the four or choose three out of the six or whatever it is. And this works well if you're doing the binary scoring, because now you're not asking the student to make that distinction of where the cutoff is anymore. But you're actually saying to them, I want you to identify the top four or the first four steps or whatever the choices are that you're looking for. In this case, for example, chest pain, dyspnea, cough, tachypnea occur in more than 50% of patients with pulmonary embolism. So those are the four choices that I want the student to be able to choose. And the other ones, while they are potential signs, they occur statistically less frequently. The other thing that you will notice if you look at how this item is constructed is that I put them in alphabetical order. So it prevented me from the potential to try to let some human bias come in here and say, well, I want to spread the correct answers out. I don't want the correct answer as the first choice. I put them alphabetically. And then wherever the correct answer is, is where it is or are in this case. And that's a good approach, a good convention to how you order the choices, the response choices in any kind of assessment. The nice thing about the multiple correct response also is that it does get to a higher level of thinking, a higher level of cognitive ability because you're now not just asking the student, you know, here's the stem, what's the answer to that? But you're asking them to make distinctions. You're asking them to kind of put different things together. Here's a great example of a multiple correct response question that will then, maybe this would be a great one to do as a partial credit because each of the responses is sort of a little bit different. So each one does not necessarily respond, rely on another one being correct. In this case, we're asking the student with a patient breathing through a circle system, a finding of five millimeters mercury-inspired carbon dioxide could be caused by which of these issues? So first of all, the student needs to recognize that five millimeters mercury-inspired carbon dioxide means that the person's rebreathing for that's the first thing they need to know. But then as they look at the answer choices, they need to make kind of an independent judgment about each one. They need to say, well, absorb and drain plug, that would lead to a leak. That would not lead to rebreathing. A hole in the bellows housing would not lead to, it might lead to a loss of tidal volume on the ventilator, but it would not lead to rebreathing. Ventilator relief valve stuck open. Well, the relief valve is on the backside of the machine. So that would not cause the patient to rebreathe. So they really need to assess each of these things kind of independently. And this would be a great example of where I might give them partial credit because there is almost like each of these is a separate thought that they need to figure out. And then they're gonna come down to saying things like, well, the unidirectional valves or soda line being exhausted are all things that would cause the patient to rebreathe carbon dioxide. Fresh gas flow being too low in a partial breathing system, yes, not in a circle system. Hotspot questions are a great opportunity. If you have a testing system that will allow you to do this, hotspot questions can really allow you to get at some very great clinical scenarios, clinical skill competencies. And that's one of the challenges people often have and criticisms people sometimes have of written tests is how does this written assessment say that I'm gonna be good anesthetist? And obviously it's all the clinical evaluation that goes on in our programs that assesses that piece. But it's nice and I think it feels good for the examining. It feels good for us when we can feel like we're assessing these clinical decision skills as well. And the hotspot exam option or the hotspot items help us to do that in a really interesting way. So hotspot is basically there's a photo or some kind of graphic that the student clicks on using their mouse. And so it basically provides for essentially an infinite number of correct responses within the parameters that you set. The one downside to it that I'll caution we'll talk about is that it can really be overused. And so we wanna use it, we wanna use all of these things where they give us what we need and the feedback that we seek on the examinee's performance. Not just to use them as window dressing to make the exam seem kind of interesting. So I'll show you examples of this done well and not so well. This is not so well done as a hotspot. Click in this picture where the endotracheal tube should be placed. Well, essentially, so I could ask the student to bring their mouse and click over here on the glottis. But if the student knows that it goes in the glottis, well, they're just gonna see that word there and know, well, that's where it goes. The fact that it's a picture, but it's a black and white drawing does not really approximate what I see when I look down into the throat doing a laryngoscopy. So it's not really lending that sort of bit of clinical applicability to it. And so as a hotspot, this really isn't doing very much at all. Especially with the word glottis there, I could just make this a multiple choice question and say, where does the endotracheal tube go with one of the responses being that it goes through the glottis. Another way I might approach this is like this. Again, not well done. Click in the picture where to perform a cricothyroidomy. Okay, getting a little bit better because I'm asking the student to actually click somewhere. And it's something that there might be some variability to it. But you also notice these lines here. These lines where I've taken, obviously taken away what was previously the labels. Also, unfortunately, tell the student that these are the discrete choices. So we now made it not from an infinite number of choices to a number of very discrete choices. And if you're going to provide something like this, you might as well just label this A, B, C, D, E, F. And then just to have it as a multiple choice question, say which letter is indicating the anatomical part where you would perform a cricothyroidomy. So the way that you might employ a hotspot question in a way that really does the best for you is something like this. So stick with that same concept about airway management. The stem might say something like this, palpating solid structures, which are outlined in blue in the figure, clicking the figure on the appropriate place to perform a cricothyroidomy. So now what you're doing is you're actually, you're not asking the student just to memorize a word, like, well, make a hole in the cricothyroid membrane. You're first of all, asking the student to know clinically without any labeling, the patient's not going to tell me what's what. If I feel something hard up here, the first thing I need to know is that that's the hyoid bone. And if I feel this larger thing in a different size and shape, that's the thyroid cartilage. And the next hard thing I'm going to feel is the cricoid cartilage. And in between there is the cricothyroid membrane. This gives them an infinite number of choices because even within the correct choice, however many pixels are in between there, they could choose anywhere in between what you would define. Maybe you would define a box that would look sort of like this as the correct options. But it also gets a little bit, it gets away from the more discreet figure that I just showed you with the lines on it, because someone might confuse a cricothyroidomy with a superior laryngeal nerve block. And they might say, oh, we go back here to the greater horn of this cartilage. So they might think that it's done something non-laterally. They might think that it's something done down here in the sternal notch. So this is a great example of the way you can use a hotspot question item to bring the student to really demonstrate something that feels really good in terms of clinical relevance and really gets them away from just all, you know, about words and text and things like that. Another good example of that is with peripheral nerve stimulator placement. For example, you would say in this picture, click on the correct location for placement of the peripheral nerve stimulator. Now you could present this in a multiple choice type question or a text-based question. And the options would say something like on the ulnar side of the arm between the elbow and the wrist, or something to that effect. But again, you're assessing the person's ability to memorize a certain word pattern, you know, word and words and terminology, as opposed to a situation like this, where you might, where you will define what you consider the correct response area for the hotspot question. And then when the student clicks anywhere in there, they might click here closer to the elbow, they might click down here closer to the wrist, any of those will be considered correct. Now, a little caveat, and I use this picture intentionally to show you a caveat of that, is that depending on your testing platform, do think about these little pitfalls that sometimes if it allows you to draw, sometimes they allow you to draw a rectangular correct area, such as you see here, it's sometime now probably no students going to click out here in, you know, space. But sometimes if you're showing something, you'll end up with the correct area incorporating some incorrect areas. So be careful of that, that you don't just in terms of how the platform works that you're building these questions in, that the correct area doesn't inadvertently include some area that some students might click in and the exam system is going to say, yes, that was a correct exam, a correct response to the item. Fill in the blank type questions are really useful to get away from some of the pitfalls I talked about with the multiple choice type questions. When we think about some of those things like endotracheal tube sizes, or calculation of calculations of drug dosages, calculation of mean arterial blood pressure, calculating the body surface area, all these things that have different formulas that are all considered correct and you can't nail it down that there's just one specific right way to do that. Even with some things that involve decimals, if there's any sort of calculation that involves some decimal points, you always have this issue about students rounding. Do they round up or down? Do they round up before or after the answer, you know, they came to the final answer? And so fill in the blank can be really beneficial in those types of cases because it doesn't require you to give the correct choice. So it allows a much wider range. So if the student goes way off on their calculation, they're not necessarily going to say, oh, wait a minute, that was not one of the choices. So I know that I did that wrong. And it also allows you to give a range of answers. So if you asked a question, for example, like this, what would be the calculated intubating dose of suctional choline for a 50 kilogram patient? Now, you might consider the answer to be anything between 50 and 75, if using one to one and a half as your as your reference for the dosing. And you could key the answer that way. So you could tell the computer that, you know, if it's computer based, I'm assuming computer based exam, that anything between those numbers is going to be considered correct. Do be careful, again, with us thinking about the platform that you're using and the testing platform is that it might be looking for a specific answer. So if you type in that the key is 50 and 75, and the person examining writes 50 dash 75, or 60, or 70, or something like that, you want to make sure that those aren't counted as being incorrect. So do be thoughtful about how you key these responses so that they can encompass everything that the student might put that would be considered correct. This will also come down to problems with with decimals sometimes. So if you know the calculation comes out to 50.1, and you put 50 as the correct answer, but a particular student says, well, I'm going to, I'm going to be really precise to put 50.1, you don't want that to be counted as incorrect, because again, it's construct irrelevant. The fact that the student would think that 50.1 is is a correct dose would not mean that they're going to give an incorrect dose clinically when they're when they're working this out. So you want to just be able to kind of think about think ahead to the range of responses that would be considered correct to make sure that your answer key encompasses all of those. Matching is another format where you have the students match things together. And so this is another another good way of getting beyond just the kind of, you know, putting the connection, here's a question, here's the answer, but putting a number of things together, making connections between numerous items, and sometimes matching type questions can be used for ordering as well, putting things in the correct order. So it can be used for ordering and prioritization, it can be used to differentiate things, connect different concepts, connect concepts that go together. You do want to, again, be thoughtful about the directions that you give the student in terms of how they put, how they do, what they're supposed to do on the matching items. So in some cases, you might see, so this particular item is asking to match basic gas laws along with the variable which is constant in those individual laws. This could have a line after number one where the person would say, oh, Boyle's temperature is constant so I'm going to put a B. That might be the way I respond to this question. The platform might be that I can use my mouse and pick up this blue box and drag it onto this number one orange box and that's the way that I do the match. So your stem should give the person the right direction, what you want them to do. So you might say, drag the constant temperatures from the right side onto their corresponding gas laws on the left side or however it is. But think about giving the students some clear direction on how they're going to respond to this question. Okay, with that overview of the various item types, let's do some practice. So let's look at some. And one of the important things, just like writing, just like professional writing or scientific writing, the writing is the one thing but it's the rewriting and the revising, the reviewing that really gets you to a quality product. And so test item writing is a really difficult thing. Don't expect to jump into it. If you're new at this, don't expect to jump in and be able to knock it out of the park right from the outset. Expect to look over them and revise them and review them and have other people look at them and proofread them for you because it does take time to really get down to getting all these little pieces right to where the test item does just what you want it to do without being confusing or misleading or sometimes leading the examinees to answer things that are not intended. So let's look at some items. And what we'll do here is I would suggest read the item. I'll read through the item and then I'd suggest pause. Pause the video and then just think about it for your own self. Think about it for yourself. You know, give me a second, a couple seconds in between here to pause, think about it and then start up again and then hear me talk through what I think would be a better way to approach these different items. And so we can do a little self-assessment here as we go. So in this first item, the stem is that a 16-year-old male is admitted into the OR for knee arthroscopy, possible open repair following an injury sustained in a football game. He's well developed and quite athletic. He is induced with propofol and succinylcholine is given to facilitate endotracheal intubation. After induction, his capnograph begins to rise. What is the most likely problem? So go ahead and pause there for a second and then think about what you think about this stem without any responses. It's just the stem and then come back and we'll talk about it. Okay, so this stem, just very straightforward. It's too much. It's just too much. It's too wordy. There's too much going on in here. There's a lot and you can see the intent of the exam item writer. They're writing this exam. They want to be interesting. They want to, they want the person to put themselves in this scenario. This 16-year-old kid came in. He's having an arthroscopy and he had a football game, but none of that really means anything. And the problem with that is that the more details you put in, the more likely it becomes that an astute student finds some connection to something that you did not intend. And so now they might put together, well, he's a football player and he's athletic and maybe that has something to do with metabolic disease or, you know, the fact that there's not actually a lot of context to the actual concept itself, which just says that he's induced with propofol and sucs and the capnograph begins to rise. It doesn't say anything about was he intubated? Was he not intubated? Was their breath sounds auscultated after the intubation? So it leaves a lot of important details out while putting a lot of irrelevant details in. So this is a stem that's not actually very focused, even in spite of the fact that it is very verbose. And so it's very wordy and really does not lead the person to direct the examinee to the actual issue that we're looking for. All right, next item. Isoflurane. Answer responses. One, decreases systemic vascular resistance. Two, has direct myocardial depression. Three, inhibits myocardial Steele syndrome. Four, has sympathetic effects. So let me give you a second to think about that one. Pause the video. Think about what you see right or wrong with this and then start up again. We'll talk about it. So this stem goes to the opposite end of the spectrum. This one obviously is the last one was very, very verbose and very wordy, but also sort of unfocused. This one was also very unfocused because it does not give any direction at all. It does not say, I want you to tell me about isoflurane as it relates pharmacologically, as it relates physiologically. It's just isoflurane. So if I can't read the stem and have some idea of what the answer is going to be, then it's not a good stem. Secondly, as we look at the response choices, they're all a little bit different. Some are capitalized. Some are not capitalized. Some have, most of them have periods. One does not have a period. That can be a cue to the examinee to say, well, that's probably the right one because it's the one that stands out to me. There's also no sort of context that goes with it. Decreases SVR, direct myocardial depression. Sure, but at what doses? That could be a yes or no. That could be a correct or incorrect answer based on the dose at which you are giving it. And so really a better stem would say isoflurane at a dose of 2 MAC or 2% will have what effect on the heart or will have what effect on the SVR or will have which effects which would tend to reduce the blood pressure. There's a lot of different ways that you might construct a question related to talking about the effects of isoflurane, but just isoflurane with a colon does not get you to where you want to be in terms of directing the student toward the concept that we're trying to test them on. Okay, next one. A patient has chest pain following induction. This is most likely due to response choices are one, decreases systemic vascular resistance, two, myocardial ischemia, three, overdose of propofol, four, he is scared, he has cancer. Pause the video. Think about this question for a minute and then come back and let's talk through it. Okay, so we've got a number of problems with this item. A patient has chest pain following induction. This is most likely due to, now first of all, there's not enough context there. So there's not enough to say what type of patient. Are we talking about a patient who had an MI two days ago? That's the first thing. So the stem is not very focused. Secondly, the response choices are really kind of wonky because we have a lot of inconsistency between them. If we look at the choices, number one, for example, says decreases systemic vascular resistance. Well, a decreased systemic vascular resistance could be a cause of chest pain, but as I look at that, I see two things that don't make sense. First of all, the D is capitalized and so it doesn't seem to follow as the second part of the sentence. The stem is kind of written as a sentence as you would read through. And so grammatically, that doesn't make sense, right? Most likely due to decreases and not decreased. So the fact that it doesn't fit grammatically makes me as a test taker say, that seems like it couldn't be right because it just doesn't, it doesn't make sense. It doesn't quite flow. Myocardial ischemia, overdoses of propofol. So again, I made an intentional misspelling here because once again, it leads to a construct irrelevant variance. It leads that student who on one hand, the student that doesn't know how to spell propofol might, might like that and say, sure, that sounds good. But another student might look at that and say, well, that's not the actual word of the induction drug. And even though a high dose of propofol could, we could construct an idea where that would lead to chest pain. It's incorrectly spelled. So I think that it's probably wrong. And then the fourth response is, is just one of those weird ones. He's scared he has cancer. It does not make sense. And it has no, it makes no sense in the, in the construct of the stem, first of all, and it doesn't really fit with the scenario. And so it's one of those things. It's another one of those peanut butter responses that it doesn't really do too much in terms of helping the person to, to have to make some distinctions between things because they could look at that and say, sure, maybe that could cause chest pain following induction in this random patient that I have no context about. Yes or no, who knows? It's, it's just a gamble. I could flip a coin. A better way to approach this, if we want to approach this concept though, would be probably a multiple correct response, right? We might say if a patient has chest pain following induction, which of these would be a consideration or which of these might, might you think would lead to that? And the answer choices could be things like hypotension, tachycardia, ST changes, you know, would be related to it. So you could come up with a, with a list of things that, that you would want the student to recognize as either being contributory or not contributory to it. But the way the stem and the responses are put together is all very kind of disheveled and doesn't really flow very well together. Okay. Let's look at another one. What is the dose of midazolam for anxiolysis in an adult? So the response choices are 1 to 2, 5 to 10, 2 to 3 milligrams, 0.5 to 1 milliliter. Pause the video. Give that some thought. Think about how you would, what do you think is right or wrong about that? How you might fix that? And then start up again. Let's talk through it. Okay. What is the dose of midazolam for anxiolysis in an adult? Now this is an important concept, and I want to put this example in here because it's an important concept that people understand, know how to dose drugs correctly, but also it, there's a lot of variability in that, right? So there's a, there's a big amount of variability in, you know, from one textbook to the next and from one patient to the next. And so that already makes this kind of difficult. More importantly, as you look at the response choices, there's not a lot of consistency there. And sometimes when we have a number of choices, a novice mistake is in order to try to make them seem discreet, we change up the presentation. 1-2, second choice is written out 5, 5 to 10, written out longhand instead of Arabic numerals. Next choice is 2 to 3 with a milligram. So does that differentiate it from 1 to 2 in choice A? Is 1 to 2 also indicating milligrams? And the fourth choice is 0.5 to 1 milliliter. Obviously that's going to be highly dependent on the concentration that we're using. And so avoid this pitfall of trying to create artificial differences between what, what tends to be kind of a low number of choices, because you can't put like 50 to 100, because everyone, you know, hopefully is going to know that that's not the correct choice. And so sometimes in an effort to make the response choices discreet and distinct, people try to change them up, but we're changing them up in ways that again, just creates more confusion. It's not actually looking at the person's ability to, to know what's an appropriate dose or not. So a much better way to approach this would be very simply with a fill in the blank. What is the dose of midazolam? Or what is the dose range? Again, make sure that your answer key will encompass all the things that would be considered appropriately correct choices for the given question. Okay. Which of the following does not argue against the presence of malignant hyperthermia? Response choices being masseter rigidity, general flaccidity, bradycardia, and hypocapnia. Let's go ahead and pause, take a look at that, and then we'll come back and talk about it. Okay. So a big problem with this one is that we've got the double negative, does not argue against the presence of malignant hyperthermia. When you have these negatives, when you say, which of these does not do something or which does not argue against, does that mean that I'm looking for something that argues for it or something that just is not contraindicated or, or not, you know, does not knock along with the thing. So just figuring out what you're looking for, you know, the concept that you're looking for is a challenge for the examinee. And so that's already a problem. So you want to be careful about those negatives. When you do have those negatives, you know, you should call them out when you can't find any other way around it, call them out to make it really clear, you know, what you want, because you don't want the person to miss the question, get the question wrong, not because they don't understand the concept, but, but just because they were, they were confused about which, which side of the negative we were, we were on. Okay. Let's look at another one. Final one here that we're going to work through together, which does not cause serotonin syndrome. So Prozac, sumatriptan, trazodone, fentanyl, take a look at those, pause your video while you look at that and then come back. We'll work through this last one. Okay. As we look at this one, a couple of problems. One is think about using generic names for drugs instead of trade names. So again, when, when we, when people miss a question, we don't want them to miss it because they were familiar with the name or not familiar with a certain name or, or something like that. The other thing that really stands out as I look at the answer choices here is that one of these drugs really stands out as being in a whole different category as all the rest. And so it can be really easy for me to see fentanyl and say, well, gosh, that, that has, it's, you know, really unrelated to any of the other drugs here. And so, so to fix this one, we might, first of all, change Prozac, do fluoxetine. We want to use all kind of generic names. We would try to think about keeping more consistency, maybe find another drug that's along the same lines of something that might be useful for something in, in the CNS, you know, more, more like an antidepressant or something like that. And then maybe a more straightforward look or more straightforward approach would be to say something like choose three, put six choices, choose three of these, which would cause or would risk causing exacerbation of serotonin syndrome. So that's kind of a good overview and a little bit of practice for you there. As we think about now summarizing, constructing the exam, we want to, first of all, reflect on the student outcome objectives. What is it that we want the student to be able to do as a result of this module that we're using this assessment to test their achievement in? We want to determine what format of an exam item will best assess their knowledge or cognitive skill desired on each of those individual pieces that will reflect, again, back to the testing blueprint. We want to be clear and concise in the delivery. We don't want to put a lot of verboseness. We don't want to put a lot of extra words and extra window dressing in there. We want to think about the concept that we're, that we're assessing. Think about how to clearly and concisely tell the student, this is what I want you to demonstrate for me. And then here's clearly the options that you can choose or the, the option responses that you can choose to demonstrate that you understand or can do this, this cognitive task, whatever it is. Avoid distraction, avoid queuing in your responses, things that make a certain response stand out and avoid trickery. Sometimes people do that sort of on purpose. And what you're, what you're actually doing there is yes, you're catching someone and catching some cognitive ability, but it's not the cognitive ability that you're actually trying to assess as part of the course. If, if you're putting something that's confusing and they're getting the thing wrong because they're just confused or there was an incorrect spelling or something like that, then you're not really assessing the, the actual concept. Finally, proofread your exam or have a friend do it. And then use review principles, which we'll talk about in the very next module to evaluate the quality of the work and the quality of how well the exam did what you thought that it was supposed to do, which is to assess the abilities of the students on the individual concepts and constructs, which belong in this particular module. So thanks for being with us. One more module is going to tell us now after we give the exam. So hope you'll join me for that one.
Video Summary
In Part Three of this educational module, the focus is on creating valid examination questions, known as test items. The aim is to design items that effectively assess students' competencies and knowledge while adhering to item writing conventions. The presenter, leveraging significant experience from exam committees, emphasizes crafting clear, relevant questions that test applicable knowledge over rote memorization. <br /><br />Key points cover the necessity of aligning exams with modular, course, and program objectives to ensure they assess the competencies required in real-world scenarios, like making clinical decisions or differentiating treatments. Attention is given to ensuring question stems are straightforward and focused, avoiding extraneous details that can cause construct-irrelevant variance—a situation where non-relevant factors affect a student's exam performance.<br /><br />Other critical aspects discussed include the importance of having plausible distractors in multiple-choice questions, ensuring responses are mutually exclusive and grammatically consistent. Emphasis is placed on avoiding grammatical cues or unequal lengths in answers that might inadvertently hint at the correct response.<br /><br />The module also explores various item formats beyond typical multiple-choice, like multiple correct responses, hotspot questions, fill-in-the-blank, and matching, each offering unique ways to better gauge higher-order thinking skills and clinical competencies.<br /><br />Finally, the presenter underscores the importance of careful proofreading and reviewing of test items to minimize errors and ensure that they accurately assess students' understanding of the material. The discussion illustrates how well-constructed exams should lead to clear, fair, and comprehensive assessments of student knowledge and skills.
Keywords
examination questions
test items
item writing conventions
competencies assessment
clinical decisions
plausible distractors
mutually exclusive responses
item formats
higher-order thinking
proofreading
10275 W. Higgins Rd., Suite 500, Rosemont, IL 60018
Phone: 847-692-7050
Help Center
Contact Us
Privacy Policy
Terms of Use
AANA® is a registered trademark of the American Association of Nurse Anesthesiology. Privacy policy. Copyright © 2024 American Association of Nurse Anesthesiology. All rights reserved.
×
Please select your language
1
English