# Lesson Observations – a mathematical modelling

If you use Twitter, you’ll know there’s been a lot of talk recently about lesson observations and what Ofsted say they will do in lessons. One of the most noticeable things is that Mike Cladingbowl (@mcladingbowl), Ofsted’s Director of Schools has published this document:

#### Why do Ofsted inspectors observe individual lessons and how do they evaluate teaching in schools?

I’d encourage you to read it if you haven’t already.

Here are two parts:

You should not expect a lesson grade after being observed by an Ofsted Inspector and to re-emphasise:

So, although it’s far more complex than this makes out and there are plenty of grey areas, you can see how this would generate lots of discussion.

Now, @LearningSpy , @DeputyMitchell and @DavidBrownHMI were discussing observations and what head teachers might do to make sure they know what the quality of teaching is in their school. It is also noticeable that Ofsted are starting to point out that the way **they** inspect was not intended to be mimicked by SLTs. Of course, many point out that this was inevitable. As part of this discussion, David referred to this report:

#### Do We Know a Successful Teacher When We See One? Experiments in the Identification of Effective Teachers

In which the validity of lesson gradings is tested. I’ve not read the report (because you have to pay) but one of the key points tweeted by David was that lesson gradings are only 67% accurate. [Edit: This should say are **inaccurate** 67% of the time.] That prompted this tweet about how certain Deputy Mitchell could be about his evidence from lesson observations. (Note the winking face means he’s not serious.)

Still. This creates an interesting mathematical point and an opportunity for some modelling. So, here goes. Note you could skip through the maths heavy parts and still pick up some points. Note that the following assumes the assertion that 67% of lesson grading are wrong is true. I, of course, do not know that this is a correct assumption.

————————————————————————————————————

Modelling a faculty of 10 teachers. They’ve all had a lesson graded by observation. The assumption is that there’s a 67% chance that the grading is wrong and a 33% chance that it is correct.

Let X be the number of teachers who have been given the **correct** grade. The grading is either correct or not. We assume the gradings are independent (this is perhaps questionable) and with a constant probability of being correctly graded.

A binomial model: X~B(10,0.33)

************************************************************************************

Firstly, let us ask the question *“What is the chance that all 10 grades are in fact the correct grade?”*

*p(all 10 are correct) = 0.33^10 = 0.0000153*

There is, to all intents and purposes, 0% chance that all the grading are correct.

**************************************************************************************

Secondly, *“What is the chance a specific number of lesson gradings are correct?”*

n is the number of lessons being graded correctly. Percentages have been rounded.

Also shown are cumulative percentage for less than or equal to n and greater than or equal to n.

So, if for example, you decide, well let’s hope at least half are right, then there’s only a 21% of that being the case.

*************************************************************************************

Just by way of example, here are 10 hypothetical situations where a group of 10 teachers have had a lesson graded. Still assuming that the 33% correct applies, below is a summary of how many of them could be correct and wrong. It’s natural to think that 3 would be right and 7 wrong but this is just to illustrate that it isn’t as simple as assuming 33% will be right.

Of course if we were to run the sample again, we would get different answers.

However, in this set of 10, you can see that in one case, there was only 1 lesson actually correct and in the best case, there were 5 correct. As you’d expect, 2, 3 and 4 are the most common number correct (3.3 would be the mean average).

——————————————————————————————————————————-

Again, the statistician in me needs to emphasise that this is a hypothetical situation and there are a number of assumption being made (as outlined above). The samples are just to illustrate a point and repeating the experiment would lead to different results.

My main conclusion is that if the 33% assumption is correct and there are 10 people in the faculty then there is only a 21% chance that 5 or more gradings are correct.

If I was a head teacher, I would not be too confident about lesson observation grading data.

[Notes: If you’d like anything explaining further, I’ll give it a try. If you’re better at modelling than me or can spot some obvious (or subtle) omissions/improvements, I’d be interested in hearing them.]

Dave, I’ve become confused, you seen to quote a figure of 67% accuracy, but then model using the assumption of 67% inaccurate? Is there a typo?

Ah, yes. That should say 67% are not correct.

Figured as much, thanks