I have a large matrix (approx. 80,000 X 60,000), and I basically want to scramble all the entries (that is, randomly permute both rows and columns independently).
I believe it'll work if I loop over the columns, and use randperm to randomly permute each column. (Or, I could equally well do rows.) Since this involves a loop with 60K iterations, I'm wondering if anyone can suggest a more efficient option?
I've also been working with numpy/scipy, so if you know of a good option in python, that would be great as well.
Thanks for all the thoughtful answers! Some more info: the rows of the matrix represent documents, and the data in each row is a vector of tf-idf weights for that document. Each column corresponds to one term in the vocabulary. I'm using pdist to calculate cosine similarities between all pairs of papers. And I want to generate a random set of papers to compare to.
I think that just permuting the columns will work, then, because each paper gets assigned a random set of term frequencies. (Permuting the rows just means reordering the papers.) As Jonathan pointed out, this has the advantage of not making a new copy of the whole matrix, and it sounds like the other options all will.
reshapethe matrix to a 1 × 4800000000 "array",
randpermit, and finally
reshapeit back to a 80000 × 60000 matrix.
EDIT: Actually Matlab automatically uses linear indexing, so the first
is not needed. Just
reshape(x(randperm(4800000000), 80000, 60000))
is enough (thus reducing 1 unnecessary potential copying).
Note that, this assumes you have a dense matrix. If you have a sparse matrix, you could extract the values, and then randomly reassign indices to them. If there are N nonzero entries, then only 8N copying are needed at worst (3 numbers are required to describe one entry).