Well, we have been talking about multiple ways of selecting a bunch of high achievers from a larger -- much, much larger -- pool of highly driven souls with tremendous motivation (and in quite a few cases, money and other resources). At the end of the day (er, exam), you still have to use one procedure (or one combination of procedures) and arrive at a list of candidates who can go on to the next stage.
Phani has a great post (with graphs, distributions, and stuff!) on the considerations that will inform an organization's choices during this entire process.
Obviously no such thing like a perfect exam exists. An exam must be completed before the examinees drop dead and must be evaluated in a reasonable time frame and as objectively as possible. This puts a limit on the range of questions (therefore, marks) for the measurement and defines the measurement window [...]. There will then be examinees who will cluster at the boundaries (shown by the arrows), the actual number of them depending on where the boundaries of this window lie within the range of variation of actual ability.
A public/finishing exam is usually conducted for a very large population at that level. The measurement window is chosen to spread across a wide range. If the exam is designed and conducted properly and in the absence of negative marks, the percentage of marks obtained and the percentile will be similar and will have a reasonable correlation with the actual ability. However, the unavoidable clustering at the top due to the finite size of the measurement could be problem. As in eg., if one needs to resolve and sequence the clever examinees for the purpose of, say, admission to a course that has limited number of seats - so limited that only top 2% are to be picked up. Selection of a small fraction and sequencing them reliably is the perceived need. The problem arises because of the inherent scatter in the data.