We now specifically wait for the length to decrease by one. This seems
like a more deterministic condition to wait on.
Previously we were waiting till the id of the deleted message remained
visible; intuitively, this should have worked but it seems that there
is some race condition that was causing the test to fail sporadically.