It’s not that they have to be measured differently, but it’s kinda turned out that way.
In days gone by there were no standards for how power was to be measured and even now that standards exist no one is required to use them or tell you how they made their measurement. High end PA systems are engineered, so real math has to happen and engineers won’t use things without valid specifications. The music market tends to be a little more marketing oriented.
For analogue amps power has often been measured by putting a broadband (across the frequency range of interest) pink noise signal into an amp and turning it up until it distorts. The question is how much distortion is allowable for the power rating. Maybe a high quality amp allows 0.05% Total Harmonic Distortion, maybe something else 1%, maybe something else 10%. The nice thing about analogue amps are that they are more forgiving if you go above their limit. They just have some gentle distortion that in many cases people liked. If you drive an analogue amp into distortion, you are running it above its power rating.
When we talk about power or level numbers in audio, we are concerned with the average and the peak. Sharp attacks are going to hit peaks and the note ringing out is going to be our average. The pink noise signal used to determine the power rating usually has a crest factor (distance between the average and the peak) of 6dB. Each 3dB is a doubling of power. Soooo, if we measure an amp to have an average power rating of 200 Watts for a particular total harmonic distortion, we are going to double that twice (6dB) to get the peak power rating of 800 Watts (because the pink noise used has a crest factor of 6dB).
Digital devices don’t distort nicely at all when overloaded. If you get into distortion, they fall off a cliff and become unusable very quickly. So, we really care a lot about that peak number and then have to back down from there. The difference is that a well designed digital amp has very low harmonic distortion all the way up to it’s maximum peak; it doesn’t really degrade. And with a well designed digital amp, it is also possible to sustain that high level for a long time. But these nice digital amps were getting short changed on the spec sheet because their advantages didn’t shine using the traditional test methods.
There was some talk that music is a lot different from pink noise and you can actually get much more power out of an amp with music than the pink noise test would say you could. So the Audio Engineering Society (AES) wrote an amp power test standard using a burst signal rather than pink noise which digital amp makers began to use. This produces a higher baseline power rating, but leaves less in headroom, so it is closer to the peak number. This skews the power ratings higher in modern digital amps. Manufacturers may be giving you the peak number rather than the average number if they don’t say.
In this forum we usually talk about compression in the context of the musical character of the playing, but here compression applies to how loud your average level can be. The more you compress the sharp transient sounds, the more you can turn up the amp and make your average level louder. Compression means you are decreasing the distance between the average and the peak. This means you will have more rumble level, but less impact on the pluck.
The linked article brings up two other relevant factors. A valve amp has connections for both a 4 ohm and an 8 ohm cabinet, so it will always deliver its max rated power. If you hook most digital amps up to an 8 Ohm cabinet, you only get half the power rating on the label. (A few exceptions exist)
The other is the sensitivity of the cabinet. In one cabinet maybe you get 97dB-SPL out of it at 1 meter away if you put in 1 Watt. In another cabinet maybe you get 91dB-SPL out of it at 1 meter away if you put in 1 Watt. If you put a 200 Watt amp on the 97dB cabinet, you will need to put a 800 Watt amp on the 91dB cabinet to get the same level (6dB difference requires doubling the power twice). This makes power and level comparisons very difficult.