Potential Problems & Scalability
A potential concern with POMMS as presented above would be that testers are rewarded only for quantity, but not quality. This is averted by having very high minimum requirements placed on what counts as a point. For instance, a bug report must be filed in the bug-tracking tool and contain at least a proper title, meaningful tags, and a detailed description of how to reproduce it.
The quality of the work done by the testers is only controlled indirectly -- evidence of shoddy work will often surface sooner rather than later. This system also reinforces the much needed trust between the testers and the developers, where direct control would only hurt the system as people start to be afraid of making errors instead of posting what they find. We all make mistakes; it is the density of mistakes that matters, not their absolute number.
Another important issue is the scalability of the system. Switching between testing systems is a difficult and tedious prospect, and once a system such as POMMS is set up, the developers have to be certain that it is not outgrown any time soon.
To date, our implementation has shown a very favorable, almost logarithmic scalability for small numbers of testers up to about 50. As an estimate, managing 100 testers is about twice as much work as having only 10 testers, although many more than 100 testers would not be easily manageable without automated tools.
System Fairness and Performance
Many testers battling for the very same points can lead to pretty stiff competition. This is specifically the case for focused bug reporting, which always works on a first-come, first-served basis. At first glance, this seems to be a good thing for the developers, as progress is rapid, but ultimately may harm the overall quality of the testing environment. This problem can generally be circumvented by always providing rewarding alternative testing activities via the quest board for less competition-minded people.
One easy-to-miss point is the immense importance of system fairness. Basically, your freshly implemented POMMS beta testing behaves like an MMO, and the testers are its players. Even the tiniest imbalances and unfairness will surface, multiplied by 10k in magnitude, to then be thrown into the developers' faces in a way that would suggest the tester's life just has been destroyed. At times it is scarily similar to the average MMO forum.
The crux is to make sure the scoring and reporting system is 110 percent fair. Make no exceptions to rules, because if you do, everyone else will cry for exceptions too. Initially, your testers will be on your side, but that can change very quickly if you don't handle the set-up of the system and its maintenance with great care. Always be transparent with any changes to rules or structures, and explain the why and when. Show appreciation for the work people do and accept that things can go wrong on both sides. Even minor direct and personal acknowledgement and appreciation from the devs toward individual testers can work wonders for motivation and the testing climate.
In the six months since our first implementation, the conclusion that can be drawn so far is very positive. About 90 percent of the testers very much like working within the framework of POMMS, while 10 percent mainly complained about its competitive nature and left. This is unfortunate, and probably due to several mistakes that were made during the setup phase, before the rules were fair to both competitive and non-competitive testers.
Comparing the amount of work that is done by the testing team now to how much was done during the time we ran an unstructured beta, we have effectively multiplied productivity by 10. Efforts are coordinated without much shouting, and the general atmosphere is positive albeit a bit competitive at times.
Summary and Conclusion
In summary, the herein described testing structure comes with huge advantages for small and medium scale software productions, entailing only few complications. It does not require a lot of manpower once set up, and has proven to be very efficient already in its first implementation.