The command line and GUI replaced voice as an interaction model decades ago, because for most tasks…
The command line and GUI replaced voice as an interaction model decades ago, because for most tasks it’s easier to do it yourself than tell a secretary to do it for you. (And make no mistake, much of the time secretaries were more intelligent and capable than the people who commanded them.) Until the advent of weakly superhuman artificial general intelligences, the absolute best case for voice and natural language as an interaction model will be strictly worse than instructing a human subordinate.
The sorry steady state of most large organizations (the SNAFU) is the direct result of the flaws of natural language as an interaction model. As an organization becomes more close-knit and efficient, jargon proliferates and language becomes more exact for things that matter, until what people speak to one another resembles one of the less well-thought-out programming languages (perl, php, basic, fortran). In other words, the end goal of any natural language interface (even between humans) is to become a command line interface.
Of course, between humans, the learning curve is hidden because it is merged with the organic emergence of a shared cant. But, this hiding of the learning curve can only be performed once: people who come later to a community must learn the language of that community. Why not standardize, when we can? After all, machines aren’t very good at improvizing collaboratively generated vocabularies based on implication but are very good at consistently interpreting the same conventions the same way (and copying those conventions exactly to other machines).