I was curious about the actions in HumanML3D, so I made a simple word cloud out of the descriptions. After removing a few nuisance words (in, at, out, up, to, etc - there were a lot of person and man as well)

Simple HumanML3D word cloud from 29232 files, including the mirrored actions. We removed a lot of positional and directional words to make the actions stand out more. It seems we have a lot of walking action here.

Then I realized I didn’t need to do this - I should be able to get all the verbs from the helpful markers that I discarded the first time round

Word cloud, taking only the words marked with VERB marker.

At this point we can probably get something a little more concrete by counting the number of verbs and how many times each word appears. We ended up with 1671 unique verbs, the top ten are

As we have noted before the walking action is quite dominant. Verbs with low count included quite a lot of misspelt words as well as possibly mistagged words such as ‘rotatibg’, ‘leftwhilst’, etc.