Jump to content
coach_hanes

Jim Menick's post on Mutual Pref Judging

Recommended Posts

I think you misunderstand what Menick thinks the point of MJP is.  Its not to give you judges you like, its to (roughly) ensure that no team has an advantage in the round because of the judge assigned.  So, assuming judge rankings generated by a team correspond to some actual preference of judges for the team, getting a 2-1 judge is more likely to result in a loss for you than getting a 5-5 judge.  (To the degree that the assumption is wrong, MJP fails logically.  How big an issue this is depends on how accurate team rankings of judges are in practice.)

 

To phrase this more probabilistically, given 5 usable tiers of judges, by MJP model assumption we have the Expectation that a 1-1 judge will vote for either team 50% of the time if the only predictive variable is the judge, as will a 2-2, 3-3, 4-4, or 5-5.  (That is, there's no inherent bias in favor of either team in a mutually preffed judge).  Because we're only using the judge as the predictive variable, that means we're ignoring difference in skill for making this prediction.  (Obviously a better team will do better than 50% against a worse team in front of a mutually preffed judge.  But that's not due to judge bias, that's due to skill differential, and the MJP model ignores that, or rather, wants to make that the sole cause of deviations from model expectation, so is trying to control the judge factor).

 

Now, let's assume that a 2-1 judge is biased such that, ignoring skill differences, their vote will tend to 40-60 split.  Perfectly winnable as the better team, but there's a bias against you.  3-1 goes to 30-70, 4-1 goes to 20-80, and 5-1 is an almost insurmountable 10-90.

 

So, as a debater, would you rather have skill be the sole determiner of outcome (Menick's model of judge preference), or get a judge you like better, even if his paradigm/approach to debate is biased in favor of your opponent?  How highly should we value a fair judge over a preferred judge?  (I think there's an argument to be made that 2-1 is superior to 5-5, but it isn't as clear that 2-1 should be preferred to 3-3, and any such intermediate position is going to be about balancing debater's preferences for both fair rounds and judges they like, instead of always choosing one or the other).

Share this post


Link to post
Share on other sites

Thank you for your reply. I do understand Jim's idea, just as you have explained it. I'm curious if this is the community's understanding of MPJ -- because it sure wasn't mine. Is Jim out there on a limb, or is his system the way most tournaments do MPJ? I suspect the former (because tournaments that assigned lots of 4s and 5s wouldn't see many schools return next year), but I'm disconnected from the national circuit these days and don't know.

 

You make a clear case for the importance of mutuality, but I think your argument overstates it. Most debaters dash off these preference sheets quickly, based on reputation and without much careful thought. You can't impute much difference between a judge ranked 2 and a judge ranked 3. The debaters' hurried opinions on the judges is not enough information to say that a 3-2 ranked judge is actually more biased than a 3-3 ranked judge. Now, I would agree with you that a 5-1 judge is clearly biased, forcing someone to do a lot of adapting while the other debater gets to be on home ground, but an 80-point differential is too high. This would imply that one of the worst teams at a tournament (which, let's say, happens to be policy-oriented) and one of the best teams at the tournament (which might happen to be kritik-oriented) are on EQUAL footing if they get a policy-oriented 5-1 judge. Granted, the kritik team might have to dig deep in their backfiles, but saying they only have a 50-50 chance to win is too much.

 

To answer your question: judge I liked better, definitely. If you had told me that I might have been given a 2-1 judge (on the downside) but instead got a 5-5 judge, I would have been hopping mad. I would gladly have accepted a one or two rank differential, if it meant I got I judged I liked better (1-1, 1-2, 2-3, and 1-3 would all have been good by me).

Share this post


Link to post
Share on other sites

As a coach for a UDL school, I've never actually attended a tournament using MPJ.  Certainly I think Jim's interpretation is right *given the name*.  (Mutual judge preference should mean that you favor judges who are equally preferred when possible).  But that says nothing about how it is actually used in practice.

 

Now, I have serious reservations about the assumptions that debaters ratings of judges are meaningful in any fine-grained sense.  You probably get your strikes right, and maybe there's a real difference between 1s and 5s, but 1s and 3s?  I'm not convinced.  In fact, I'm pretty sure much of the 3-4 range is probably 'we've never had this judge before, ever'.

Share this post


Link to post
Share on other sites

As a coach for a UDL school, I've never actually attended a tournament using MPJ.  Certainly I think Jim's interpretation is right *given the name*.  (Mutual judge preference should mean that you favor judges who are equally preferred when possible).  But that says nothing about how it is actually used in practice.

 

Now, I have serious reservations about the assumptions that debaters ratings of judges are meaningful in any fine-grained sense.  You probably get your strikes right, and maybe there's a real difference between 1s and 5s, but 1s and 3s?  I'm not convinced.  In fact, I'm pretty sure much of the 3-4 range is probably 'we've never had this judge before, ever'.

Usually 4 is saved for this judge is sketchy but we could win their balllot

Share this post


Link to post
Share on other sites

Now, I have serious reservations about the assumptions that debaters ratings of judges are meaningful in any fine-grained sense.  You probably get your strikes right, and maybe there's a real difference between 1s and 5s, but 1s and 3s?  I'm not convinced.  In fact, I'm pretty sure much of the 3-4 range is probably 'we've never had this judge before, ever'.

 

I would agree with this. Really, it should just be a thumbs up, thumbs down, don't know/neutral rating system. Or if we're going to do a 6-rank scale, the tournament could require each judge to complete a brief survey (experience, preferences, etc.), which would be made available to the debaters before doing the rankings.

Share this post


Link to post
Share on other sites

I would agree with this. Really, it should just be a thumbs up, thumbs down, don't know/neutral rating system. Or if we're going to do a 6-rank scale, the tournament could require each judge to complete a brief survey (experience, preferences, etc.), which would be made available to the debaters before doing the rankings.

http://judgephilosophies.wikispaces.com/

Share this post


Link to post
Share on other sites

Definitely a good thing. It would take a long time to read them all to rank all your judges. I meant more like the NFL used to do (still does?): about 10-12 numerical scale questions. As I recall, it asked about experience, comfort with speed, and willingness to vote on topicality, theory, counterplans, and kritiks. Something like that would be easy for the debaters to scan quickly.

Share this post


Link to post
Share on other sites

Definitely a good thing. It would take a long time to read them all to rank all your judges. I meant more like the NFL used to do (still does?): about 10-12 numerical scale questions. As I recall, it asked about experience, comfort with speed, and willingness to vote on topicality, theory, counterplans, and kritiks. Something like that would be easy for the debaters to scan quickly.

I agree--the Ohio league had these when I was debating. In fact, I think I'm going to suggest that CDL start doing this for their tournaments.

 

Added: I can't find one on the OHSSL site for policy, but here's one for LD that I think is a good starting point to modify for policy: http://www.ohssl.org/ohsslfiles/state/ld_paradigm.pdf

Edited by Edgehopper

Share this post


Link to post
Share on other sites

Added: I can't find one on the OHSSL site for policy, but here's one for LD that I think is a good starting point to modify for policy: http://www.ohssl.org/ohsslfiles/state/ld_paradigm.pdf

 

Ohio uses the same sheet as the NFL, which you can find at http://www.joyoftournaments.com/nfl/nationals/paradigms/Blank%20CX%20Paradigm.pdf

Share this post


Link to post
Share on other sites

Edgehopper and jgorman, those are just the forms I was thinking of.

 

I remember sitting down with my debaters before rounds began at the NFL and looking through the list of their judges and these responses. We figured out strategies and tactics to try to pick up both ballots in every round. Those numerical responses were so clear, it was easy to know what a judge liked (an "8" on whether CPs were unacceptable versus, "I guess counterplans can be ok but I'll vote on whatever..." like on the philosophy wiki).

 

If debaters got those kind of numerical judge responses' in a spreadsheet or csv, they could sort it quickly by whatever characteristics they value most to figure out their top judges. For example: "Show me all judges who put a 1-3 on CPs and an 8 or 9 on kritiks."

Share this post


Link to post
Share on other sites

The one problem with just numeric ratings on things like 'Is the K acceptable' is that it conceals a lot of variation.

 

I'm perfectly fine with kritikal argumentation, but I'm a Popperian empiricist at heart, and so I disagree rather strongly with what is considered 'evidence against' and what is considered a 'refutation' in much kritikal argumentation today.  And while I'm willing to pretend to a different epistemology of argumentation in round, I can't imagine what such a thing actually looks like, so I'm certainly not going to assume one unless I have one laid out for me.

 

(I also hold a counterfactualist interpretation of fiat, which makes 'fiat bad' kind of nonsensical unless they're arguing nihilism.  Again, I'll listen to argumentation on the matter, but you'd better define what fiat is so I know what you think is bad, and can assess the degree to which the affirmative is actually doing that.  Since many Ks start from a position of 'fiat is bad, and what we should be talking about is stuff relevant to people in the room', this has a significant bearing on K argumentation).

 

So I'd give questions like 'Is the K acceptable' a 9, but I don't necessarily mean the same thing as another judge who gives it a 9.

Share this post


Link to post
Share on other sites

The one problem with just numeric ratings on things like 'Is the K acceptable' is that it conceals a lot of variation.

 

I'm perfectly fine with kritikal argumentation, but I'm a Popperian empiricist at heart, and so I disagree rather strongly with what is considered 'evidence against' and what is considered a 'refutation' in much kritikal argumentation today.  And while I'm willing to pretend to a different epistemology of argumentation in round, I can't imagine what such a thing actually looks like, so I'm certainly not going to assume one unless I have one laid out for me.

 

(I also hold a counterfactualist interpretation of fiat, which makes 'fiat bad' kind of nonsensical unless they're arguing nihilism.  Again, I'll listen to argumentation on the matter, but you'd better define what fiat is so I know what you think is bad, and can assess the degree to which the affirmative is actually doing that.  Since many Ks start from a position of 'fiat is bad, and what we should be talking about is stuff relevant to people in the room', this has a significant bearing on K argumentation).

 

So I'd give questions like 'Is the K acceptable' a 9, but I don't necessarily mean the same thing as another judge who gives it a 9.

All true, but compare that to the SQ. In national circuit, few judges go into that level of detail on their philosophy page. In CDL, most don't say anything. And you can always scribble the additional info on your card or write "ask me" if it won't fit.

 

It should be more useful off the national circuit, where there's greater variety and wider experience in judges. Numerical ratings won't capture too much difference between you and a typical college judge, but they easily capture the difference between you and old school traditional judges, lay judges, and confused English teachers who think all in-round definitions have to come from the Oxford English Dictionary.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...