Reorder vector using a vector of indices

Reorder vector using a vector of indices

I'd like to reorder the items in a vector, using another vector to specify the order:

char A = 'a', 'b', 'c' ; size_t ORDER = 1, 0, 2 ; vector<char> vA(A, A + sizeof(A) / sizeof(*A)); vector<size_t> vOrder(ORDER, ORDER + sizeof(ORDER) / sizeof(*ORDER)); reorder_naive(vA, vOrder); // A is now 'b', 'a', 'c'

The following is an inefficient implementation that requires copying the vector:

void reorder_naive(vector<char>& vA, const vector<size_t>& vOrder) assert(vA.size() == vOrder.size()); vector vCopy = vA; // Can we avoid this? for(int i = 0; i < vOrder.size(); ++i) vA[i] = vCopy[ vOrder[i] ];

Is there a more efficient way, for example, that uses swap()?

Don't use all caps in your function names. Or anything, for that matter. Only #define's can get away with that.
– GManNickG
Aug 12 '09 at 18:32

#define

And constants, Gman.
– jkeys
Aug 12 '09 at 21:50

There is an ambiguity about the content of ORDER. Is it the index at which the corresponding letter should be stored, or is it the index of the letter that should be stored at this position ? Both interpretation would be correct with the given example, though they are different.
– chmike
Sep 20 '12 at 13:26

I had in mind the latter, but both interpretations give exactly the same result.
– Marc Eaddy
Oct 3 '12 at 22:09

To anybody looking for an efficient version of reorder_naive above, DO NOT use the solutions proposed below. They calculate the first, and not the latter interpretation(see comments above) of the question, but DO NOT provide the same result.
– gg349
Feb 26 '14 at 16:53

reorder_naive

13 Answers
13

I improved chmike's algorithm. This function agrees with his for all 11! permutations of (0..10) passed as the reordering vector. Also it doesn't modify reordering vector.

template< class T > void reorder(vector<T> &v, vector<size_t> const &order ) for ( int s = 1, d; s < order.size(); ++ s ) for ( d = order[s]; d < s; d = order[d] ) ; if ( d == s ) while ( d = order[d], d != s ) swap( v[s], v[d] );

Here's an STL style version which I put a bit more effort into. It's about 47% faster (that is, almost twice as fast over (0..10)!) because it does all the swaps as early as possible and then returns. The reorder vector consists of a number of orbits, and each orbit is reordered upon reaching its first member. It's faster when the last few elements do not contain an orbit.

template< typename order_iterator, typename value_iterator > void reorder( order_iterator order_begin, order_iterator order_end, value_iterator v ) typedef typename iterator_traits< value_iterator >::value_type value_t; typedef typename iterator_traits< order_iterator >::value_type index_t; typedef typename iterator_traits< order_iterator >::difference_type diff_t; diff_t remaining = order_end - 1 - order_begin; for ( index_t s = index_t(), d; remaining > 0; ++ s ) for ( d = order_begin[s]; d > s; d = order_begin[d] ) ; if ( d == s ) -- remaining; value_t temp = v[s]; while ( d = order_begin[d], d != s ) swap( temp, v[d] ); -- remaining; v[s] = temp;

And finally, just to answer the question once and for all, a variant which does destroy the reorder vector. (It fills it with -1's.) It's about 16% faster than the preceding version. This one uses an ugly typecast, but deal with it. This covers 11! ~= 40 mil permutations of 11 characters in 4.25 seconds, not counting overhead, on my 2.2 GHz laptop.

template< typename order_iterator, typename value_iterator > void reorder_destructive( order_iterator order_begin, order_iterator order_end, value_iterator v ) typedef typename iterator_traits< value_iterator >::value_type value_t; typedef typename iterator_traits< order_iterator >::value_type index_t; typedef typename iterator_traits< order_iterator >::difference_type diff_t; diff_t remaining = order_end - 1 - order_begin; for ( index_t s = index_t(); remaining > 0; ++ s ) index_t d = order_begin[s]; if ( d == (diff_t) -1 ) continue; -- remaining; value_t temp = v[s]; for ( index_t d2; d != s; d = d2 ) swap( temp, v[d] ); swap( order_begin[d], d2 = (diff_t) -1 ); -- remaining; v[s] = temp;

@Potatoswatter: yes, that's the way stackoverflow works..., never really understood the reasoning when one guy does the (vast majority of the) editing i. It's like they want to discourage you from improving your answer or something...
– Pieter
Sep 10 '09 at 8:43

Complain on meta
– jmucchiello
Oct 31 '09 at 19:32

Do you have any example code showing that this works? I can't even compile the second version with gcc. And the first version is not reordering my vector correctly.
– Alec Jacobson
Dec 2 '10 at 15:37

@mangledorf: ideone.com/TsWbu - Note that the reorder vector contains the final location for each corresponding element in the data vector, which is different from a reorder vector that selects which initial data vector element should appear at each final location. The OP example was ambiguous in this regard. My code can probably be adjusted to work the other way, but I don't have time right now :v( .
– Potatoswatter
Dec 3 '10 at 3:30

It seems this algorithm (at least second version) does swaps even if the element is already on its place. So if you call it with order vector being 0, 1, 2, ... it will actually do lots of copying around.
– Adam Badura
May 7 '12 at 7:05

vector

0

1

2

Here is the correct code

void REORDER(vector<char>& vA, vector<size_t>& vOrder) assert(vA.size() == vOrder.size()); // for all elements to put in place for( int i = 0; i < va.size() - 1; ++i ) // while the element i is not yet in place while( i != vOrder[i] ) // swap it with the element at its final place int alt = vOrder[i]; swap( vA[i], vA[alt] ); swap( vOrder[i], vOrder[alt] );

note that you can save one test because if n-1 elements are in place the last nth element is certainly in place.

On exit vA and vOrder are properly ordered.

This algorithm performs at most n-1 swapping because each swap moves the element to its final position. And we'll have to do at most 2N tests on vOrder.

Your approach of updating the order vector really improves the code, but do you really need the internal while loop? I believe you don't with the following rationale: In the first step of the algorithm you do not need the while loop. It will work in just one step. The fancy part is that you are actually performing a reduction of the original problem.
– David Rodríguez - dribeas
May 8 '09 at 9:30

@dribeas this is unfortunately not true but what I also thought in the first place. Try with the sequence 2 3 1 0 you'll see that 1 and 0 wont be in the correct place.
– chmike
May 8 '09 at 11:05

it's broken, vA[i] ends up containing vA[k] where k is the last index in a cycle
– Gregory Pakosz
Sep 19 '12 at 11:50

vA[i]

vA[k]

k

Could you be more explicit ? What cycle are you referring to ? The only error I see in my code is that int should be size_t.
– chmike
Sep 20 '12 at 12:53

The only error I see in my code is that I should have used size_t instead of int for i and alt. The code is correct and is fast but at the cost of modifying vOrder. If vOrder can't be modified, which was not specified, then a slightly different algorithm must be used.
– chmike
Sep 20 '12 at 13:09

If it is ok to modify the ORDER array then an implementation that sorts the ORDER vector and at each sorting operation also swaps the corresponding values vector elements could do the trick, I think.

Code please! :)
– Marc Eaddy
Aug 19 '09 at 22:59

It appears to me that vOrder contains a set of indexes in the desired order (for example the output of sorting by index). The code example here is a variation of cycle sort. Every swap places at least one element in it's proper place. This code example effectively reorders vA according to vOrder, while "unordering" or "unpermuting" vOrder back to its original state (0 :: n-1). If vA contained the values 0 through n-1 in order, then after reorder, vA would end up where vOrder started.

template <class T> void reorder(vector<T>& vA, vector<size_t>& vOrder) assert(vA.size() == vOrder.size()); // for all elements to put in place for( size_t i = 0; i < vA.size(); ++i ) // while vOrder[i] is not yet in place // every swap places at least one element in it's proper place while( vOrder[i] != vOrder[vOrder[i]] ) swap( vA[vOrder[i]], vA[vOrder[vOrder[i]]] ); swap( vOrder[i], vOrder[vOrder[i]] );

This can also be implemented a bit more efficiently using moves instead swaps. A temp object is needed to hold an element during the moves. Example C code, reorders A according to indexes in I, also sorts I :

void reorder(int *A, int *I) int i, j, k; int tA; /* reorder A according to I */ /* every move puts an element into place */ /* time complexity is O(n) */ for(i = 0; i < sizeof(A)/sizeof(A[0]); i++) if(i != I[i]) tA = A[i]; j = i; while(i != (k = I[j])) A[j] = A[k]; I[j] = j; j = k; A[j] = tA; I[j] = j;

Never prematurely optimize. Meassure and then determine where you need to optimize and what. You can end with complex code that is hard to maintain and bug-prone in many places where performance is not an issue.

With that being said, do not early pessimize. Without changing the code you can remove half of your copies:

template <typename T> void reorder( std::vector<T> & data, std::vector<std::size_t> const & order ) std::vector<T> tmp; // create an empty vector tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector for ( std::size_t i = 0; i < order.size(); ++i ) tmp.push_back( data[order[i]] ); data.swap( tmp ); // swap vector contents

This code creates and empty (big enough) vector in which a single copy is performed in-order. At the end, the ordered and original vectors are swapped. This will reduce the copies, but still requires extra memory.

If you want to perform the moves in-place, a simple algorithm could be:

template <typename T> void reorder( std::vector<T> & data, std::vector<std::size_t> const & order ) for ( std::size_t i = 0; i < order.size(); ++i ) std::size_t original = order[i]; while ( i < original ) original = order[original]; std::swap( data[i], data[original] );

This code should be checked and debugged. In plain words the algorithm in each step positions the element at the i-th position. First we determine where the original element for that position is now placed in the data vector. If the original position has already been touched by the algorithm (it is before the i-th position) then the original element was swapped to order[original] position. Then again, that element can already have been moved...

This algorithm is roughly O(N^2) in the number of integer operations and thus is theoretically worse in performance time as compare to the initial O(N) algorithm. But it can compensate if the N^2 swap operations (worst case) cost less than the N copy operations or if you are really constrained by memory footprint.

Your algorithm is optimal is vOrder can't be modified or if N is small. Yours may spend more time scanning vOrder but will only swap vA values. Mine swap vOrder and vA values which may end up to be more expensive than to rescan vOrder with small N values.
– chmike
May 8 '09 at 8:49

Hm, shouldn't 'tmp.resize( data.size() );' be 'tmp.reserve(...)' as you're using 'tmp.push_back(...)' ? To keep 'resize', the insertion should be 'tmp[i] = ...'.
– rotoglup
Oct 21 '11 at 7:31

You could do it recursively, I guess - something like this (unchecked, but it gives the idea):

// Recursive function template<typename T> void REORDER(int oldPosition, vector<T>& vA, const vector<int>& vecNewOrder, vector<bool>& vecVisited) // Keep a record of the value currently in that position, // as well as the position we're moving it to. // But don't move it yet, or we'll overwrite whatever's at the next // position. Instead, we first move what's at the next position. // To guard against loops, we look at vecVisited, and set it to true // once we've visited a position. T oldVal = vA[oldPosition]; int newPos = vecNewOrder[oldPosition]; if (vecVisited[oldPosition]) // We've hit a loop. Set it and return. vA[newPosition] = oldVal; return; // Guard against loops: vecVisited[oldPosition] = true; // Recursively re-order the next item in the sequence. REORDER(newPos, vA, vecNewOrder, vecVisited); // And, after we've set this new value, vA[newPosition] = oldVal; // The "main" function template<typename T> void REORDER(vector<T>& vA, const vector<int>& newOrder) // Initialise vecVisited with false values vector<bool> vecVisited(vA.size(), false); for (int x = 0; x < vA.size(); x++) REORDER(x, vA, newOrder, vecVisited);

Of course, you do have the overhead of vecVisited. Thoughts on this approach, anyone?

In a first read it seems to me as if this roughly ends up with the same memory and time costs. While you are not copying to another vector, you are copying to just as many local variables as elements in the vector. I would have to work harder on the interpretation to test for correctness.
– David Rodríguez - dribeas
May 8 '09 at 8:09

Yeah, that was my thought too. In fact, it would probably use up more space because of the stack frames, and take more time because of method-calling overhead. But it was a fun intellectual exercise anyway ;-P
– Smashery
May 8 '09 at 8:15

To iterate through the vector is O(n) operation. Its sorta hard to beat that.

He's talking about space efficiency, not algorithm efficiency.
– Justicle
Aug 12 '09 at 23:01

Your code is broken. You cannot assign to vA and you need to use template parameters.

vA

vector<char> REORDER(const vector<char>& vA, const vector<size_t>& vOrder) assert(vA.size() == vOrder.size()); vector<char> vCopy(vA.size()); for(int i = 0; i < vOrder.size(); ++i) vCopy[i] = vA[ vOrder[i] ]; return vA;

The above is slightly more efficient.

The OP wants to reorder vA vector. So the prototype should be: void REORDER(vector<char>& vA, const vector<size_t>& vOrder).
– Donotalo
May 8 '09 at 6:51

... or return a vector.
– dirkgently
May 8 '09 at 7:15

returning a vector will end up with the result of having to create a copy, which, to my interpretation, is what the OP wants to avoid. Another optimization of your approach would be not creating a vector of size elements but rather reserving memory and using push_backs. This would remove the cost of default constructing all elements before you copy.
– David Rodríguez - dribeas
May 8 '09 at 8:12

@dirkgently: You're right, vA shouldn't be const.
– Marc Eaddy
Aug 19 '09 at 23:04

It is not clear by the title and the question if the vector should be ordered with the same steps it takes to order vOrder or if vOrder already contains the indexes of the desired order.
The first interpretation has already a satisfying answer (see chmike and Potatoswatter), I add some thoughts about the latter.
If the creation and/or copy cost of object T is relevant

template <typename T> void reorder( std::vector<T> & data, std::vector<std::size_t> & order ) std::size_t i,j,k; for(i = 0; i < order.size() - 1; ++i) j = order[i]; if(j != i) for(k = i + 1; order[k] != i; ++k); std::swap(order[i],order[k]); std::swap(data[i],data[j]);

If the creation cost of your object is small and memory is not a concern (see dribeas):

Note that the two pieces of code in dribeas answer do different things.

I was trying to use @Potatoswatter's solution to sort multiple vectors by a third one and got really confused by output from using the above functions on a vector of indices output from Armadillo's sort_index. To switch from a vector output from sort_index (the arma_inds vector below) to one that can be used with @Potatoswatter's solution (new_inds below), you can do the following:

sort_index

arma_inds

new_inds

vector<int> new_inds(arma_inds.size()); for (int i = 0; i < new_inds.size(); i++) new_inds[arma_inds[i]] = i;

It's an interesting intellectual exercise to do the reorder with O(1) space requirement but in 99.9% of the cases the simpler answer will perform to your needs:

void permute(vector<T>& values, const vector<size_t>& indices) vector<T> out; out.reserve(indices.size()); for(size_t index: indices) assert(0 <= index && index < values.size()); out.push_back(values[index]); values = std::move(out);

Beyond memory requirements, the only way I can think of this being slower would be due to the memory of out being in a different cache page than that of values and indices.

out

values

indices

I came up with this solution which has the space complexity of O(max_val - min_val + 1), but it can be integrated with std::sort and benefits from std::sort's O(n log n) decent time complexity.

O(max_val - min_val + 1)

std::sort

O(n log n)

std::vector<int32_t> dense_vec = 1, 2, 3; std::vector<int32_t> order = 1, 0, 2; int32_t max_val = *std::max_element(dense_vec.begin(), dense_vec.end()); std::vector<int32_t> sparse_vec(max_val + 1); int32_t i = 0; for(int32_t j: dense_vec) sparse_vec[j] = order[i]; i++; std::sort(dense_vec.begin(), dense_vec.end(), [&sparse_vec](int32_t i1, int32_t i2) return sparse_vec[i1] < sparse_vec[i2];);

The following assumptions made while writing this code:

std::sort

This should avoid copying the vector:

void REORDER(vector<char>& vA, const vector<size_t>& vOrder) assert(vA.size() == vOrder.size()); for(int i = 0; i < vOrder.size(); ++i) if (i < vOrder[i]) swap(vA[i], vA[vOrder[i]]);

I believe this to be broken. Order: [ 2, 0, 1 ], first step will swap first and third element. In the second step i > order[i] and nothing is done, in the third step i > order[i] again and nothing is done.
– David Rodríguez - dribeas
May 8 '09 at 7:24

This code works with the example but wouldn't work with the vOrder sequence [3 0 1 2]. For i == 0 , 3 and 2 are swapped, then nothing is changed for subsequent i values.
– chmike
May 8 '09 at 7:35

Thanks for pointing out the mistake. I should've checked more before posting the code. :(
– Donotalo
May 8 '09 at 9:29

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

IN kYcf SChmlTe dz,mQ 14dIFMu20QYwrU3J

搜尋此網誌

Cyufjy