Monday 17 October 2022

Judging in PS - considerations

Hello everyone, it’s been a while. While I don’t train that seriously nowadays, I still pick up the mod pretty often and think about PS quite a lot, and gained some interesting insights from reading about medical education as well. This is the first post about PS judging in this 'series', an attempt at addressing many of the questions raised is in the second post.

So PSO22 is coming around and it will try yet again a slightly different way of judging. Anyone who’s been around the competitive PS scene for a few years will know the countless discussions (in worse cases - drama, arguments, salt, grudges) surrounding any attempt to assess our artform.


One can ask whether PS (or arts in general) should be competitive, or have criteria, or have numerical scores - unfortunately, human nature dictates that humans are competitive, and competitions, criteria, and scoring exist in PS and other arts, for better or worse. So let’s move onto more practical questions.


The most important question is ‘what purpose does PS competition serve?’ This is the most important question because PS competitions exist to address this.


Q: Is it to award a title to someone? 


A: We will never agree who the #1 is, because different people prioritise different elements of PS (be it finesse of technique, technical skill, innovation or other things). A title in itself has reduced meaning if it has been awarded through unreliable/invalid methods of assessment, or if competitors are not all ‘serious’ about preparation. Unlike professional sports or arts, a title in PS is not related to one’s primary income. Nonetheless, because competition exists, communities have been trying different ways of assessing PS for competing. 


Q: Is it to define what makes a better spinner or a better combo?


A: On a general level, it’s easy to define a ‘good combo’ - all its elements contribute to visuals and mechanics: combos that don’t do this will have wasteful or detrimental material. Of course, individuals may disagree on whether certain elements contribute in a positive way. It follows that a ‘good spinner’ is someone who makes many ‘good combos’. Do competitions exist merely to be satisfied with attaining ‘good’ rather than ‘exceptional’ or ‘groundbreaking’?


Q: Is it to promote activity and progression in the hobby?


A: I feel this is getting closer than the previous 2 questions. Direct competition fuels improvement and drives people to experiment outside of their comfort zones, in a way that PS collabs do not seem to. While there have been many historic CV combos in the aesthetic sense, the majority of groundbreaking combos in more technical (i.e. material and theory-based) aspects have been in tournaments. A perfectly disciplined human would continue pushing themselves in the same way regardless of whether an easily tangible goal like CV/tourney/solo exists, but there are no perfect humans. The excitement and discussions about tournament submissions and results also increase activity.


If we summarise the above, we end up with ‘competition should reward different aspects of PS in the many ways one can make a good combo, while giving further rewards to people who push the boundaries’. Perhaps personal projects like solo videos are more suited to experimental revolutionary material, but it is illogical if the highest level competitive event of our artform does not differentiate revolutionary combos or revolutionary spinners.


Before I discuss the system I want to try in the themes I’m judging for PSO22 (which can be generalised for themeless battles like WT), I will discuss what has been tried or suggested before, but didn’t work that well.


Q: Why can’t we just use comments only, no subcriteria?


A: In a world with great judges, PS competitions would produce reasonable winners with comments only. We don’t live in a perfect world. While numbered scores are arbitrary in their divisions and judges may not follow the example videos for what a certain score represents, a comments-only system will change those 5-7 arbitrary numbers into 1 large arbitrary result (i.e. the vote towards who wins). This works if we trust that the judges represent the views we desire for that competition.


It’s easier to think about a recent example - it’s justifiable to vote Mond over Saltient in WT21 R3 by prioritising execution. On a personal level, it is equally valid to prioritise basic control, or finesse of technique, or technical difficulty, or novel material. But how well does this align with what international PS competitions aim to do? Would voting Mond in a comments-only judgment promote spinners to continue pushing boundaries, or does it encourage spinners to stay in their comfort zones? While subcriteria cannot stop this (and should not explicitly stop specific judgments), judges should be held accountable for considering the specified elements. Comments-only does not address disagreements about judges overly prioritising certain aspects of spinning, nor does it address different understandings of elements like ‘structure’ or ‘creativity’. Having no subcriteria would make criticisms harder to specify, since the judge can just brush it off with ‘my general impression was this, I already explained myself, I define this element differently to how you do’ etc.


There have been events with comments only judging, where judges provide examples of combos they prefer (which may work for smaller events and was done in one JEB tournament before), but for an international event this risks suggesting spinners should try to replicate existing impressions rather than create more evolved versions of their own paths. In past arguments over numbered scores, disagreement was often about the outcome rather than the scores (as in, even when reasonable detailed justifications above and beyond the initially submitted comments were given, people still had complaints) - comments-only would not fix the fact people get annoyed over their friend or favourite not winning, since this is an issue of sportsmanship and maturity. 


Q: What if we make judges assess some varied, tough combos before they are allowed to judge the real event?


A: This was tried in WT21 judge selections. It didn’t work as well as expected. From previous events e.g. WT17 and WT19, as the tournament progresses (i.e. judges get more experience judging the spinners in the event, there are less combos sent = more time spent assessing each combo, more detailed comments), judging appears to improve in quality. Humans pay more attention to their performance when they know they are being assessed. While making sample judgments helps exclude some blatantly ‘off’ judgments, it isn’t that great at stopping strange judgments in the first half of a WT. Judge selection is surely more influential in determining results than the fine details of the criteria itself, but is harder to deal with. Judgments in previous events are the best determiner. Sample judgments still have a role in assessing new candidates, while allowing discussion before the actual event.


Q: Assuming one concedes the above and agrees to using subcriteria, why should they be given numerical scores? Numbers are annoying, introduce more variability and create strange inconsistencies.


A: Judges could be instructed to consider and comment specifically on various subcriteria in their comments-only judgments. However, it would be impossible to know whether the final win/loss vote appropriately accounted for those elements; or the judge could make a final vote contrary to what the tone of their comments indicates.


A system where the judge votes which spinner did better in a given subcriteria e.g. spinner A is better in exec, spinner B is better in difficulty etc, then totals those votes (perhaps with weighting for whatever aspects the tournament or theme wants to prioritise) could be tried. However, this would not account for large gaps in respective elements. This was tried in SCT18 and worked well since the competitors who passed the qualification round were all solid, but for a larger event with more varied submissions, I doubt it would work well. It’s strange if someone submits a barely landed combo, or an extremely easy combo, or an extremely uncreative combo (e.g. 2/10 vs 8/10 in current scoring), while suffering the same penalty as someone who sends something that is just slightly worse (7/10 vs 8/10). 


If we are to use numbers, it may be helpful to have less subcriteria, with more unified score weightings (e.g. all subcriteria scored out of 5, rather than having some be out of 10, some be out of 5, some be out of 3. If certain elements are to be prioritised, it’s easier to just put in a separate multiplier afterwards). In a talk about how communication and professionalism can be assessed in medical students (given to my uni’s medicine faculty by a professor of medical education), using more detailed subcriteria served to annoy the examiners while making the results less reliable on statistical analysis. WT19 criteria had too many subcriteria (with some unintuitive definitions that overlapped in some regards).


Q: Why don’t we try to get consensus about the criteria by asking a lot of representatives from different countries?


A: This approach works in established fields with established theories and established experts (who usually conduct such discussions in a mutually understood language), and has been done for many curriculum, guidelines and regulations in professional fields. Even if we could overcome the language barriers between countries, PS still struggles to explain many foundational concepts in any given language - e.g. the English-speaking community is still struggling to express ‘good structure’ or ‘good pacing’ in words. While there are many established 'good' spinners, they may not be able to express their understanding in words, they probably do not agree with each other (and may never agree with certain other established spinners in the discussion), nor may their views align with what the tournament aims to prioritise.


There are many ways to describe important elements, which in turn have varying overlap, which then have varied practicality when being used as subcriteria. Fortunately, the discussions stemming from previous competitions serve as a good proxy for this topic. Unfortunately, these past discussions tell us that we are unlikely to achieve a consensus. There have been arguments for years that come up again every time we have a world event about: weight of execution-related elements, how one assesses difficulty, what degree or kinds of repetition are bad structure, what the penalty for reusing identical or similar linkages should be (and where the material was previously shown), what constitutes good flow, and so on. More discussion is a good thing, but the impression I got is that we’ve ended up repeating old points without introducing any new helpful ideas.


Q: I feel [insert element here] is important. Why shouldn’t it be a subcriteria?


A: There are many ways to break up the puzzle pieces of what a ‘good combo’ consists of. However, just because a certain way of grouping certain elements is a good term e.g. ‘pacing’, ‘tension’, ‘structure’, ‘coherence’, or ‘refinement’, does not mean it is good as a division of subcriteria. To elaborate, ‘structure’ can be considered as arrangement of mechanics (like ‘density’), arrangement of visual impressions (like ‘effect’), arrangement of new ideas (like ‘integration’). While ‘structure’ is a useful way of considering PS, it is hard to use as a subcriteria. A similar point can be made about ‘density’, ‘integration’, ‘effect’, which were part of the WT19 subcriteria under ‘effectiveness’. Similarly, ‘coherence’ overlaps with ‘structure’ and is influenced by ‘pacing’ and ‘tension’. ‘Pacing’ can be considered as use of speed, effect of movements of hand, wrist and mod, visual effect of the mod during the tricks performed, which in turn is influenced by the angle chosen and background/mod colours etc. While many abstract terms are useful for general discussion, is there a more practical way of dividing things for judging purposes? 


Q: So you’ve raised all these criticisms but what’s your constructive proposal? If you are only criticising but not making active suggestions, what’s the point?


A: If you’ve read up to here, congratulations! Now we can move onto my suggestions for addressing a lot of these things: judging proposals. I won’t claim there is any definitive ‘solution’ since there surely isn’t one, and there won’t be a way to satisfy everyone, but at least I can offer what is (probably) a more practical way of breaking things down.


No comments:

Post a Comment