Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freeze in case of large integer secondary keys (over 2^63) #11

Open
pallinger opened this issue Mar 25, 2014 · 8 comments
Open

freeze in case of large integer secondary keys (over 2^63) #11

pallinger opened this issue Mar 25, 2014 · 8 comments

Comments

@pallinger
Copy link

EDIT: For a concise description of the bug in the title you can read my last comment.

Using ten dumped keys, I created a small key list file, and tried to dump those keys:
java -jar riak-data-migrator-0.2.6.jar -d -r ./ --loadkeys sample.keys.txt -h IP -p 8087 -H 8098
The output is:
Dumping bucket BUCKET
and the program just sits there doing nothing, at least for half an hour (after that I killed it).

After that I tried getting the objects in sample.keys.txt using curl, and succeeded.

After this, I created a test bucket with some data in it manually, and the dump seemed to be successful.

What to do now? How can I trace what the seemingly halted program is doing? How can I avoid having that timeout?

@dmitrizagidulin
Copy link

Couple of questions - what version of Riak are you testing this against?
Can you check the riak-side error logs, to see if there's anything suspicious there?

@pallinger
Copy link
Author

riak version: 1.4.8
There were a few entries in the console.log in the 10 minutes I waited for a dump, but I do not think they are related:

2014-03-26 09:46:10.162 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 0 -> 50 msec/key, based on maximum vnode mailbox size 652 from '[email protected]'
2014-03-26 09:46:25.213 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 50 -> 0 msec/key, based on maximum vnode mailbox size 0 from '[email protected]'
2014-03-26 09:50:37.898 [info] <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor busy_dist_port <0.723.0> [{initial_call,{riak_kv_index_hashtree,init,1}},{almost_current_function,{gen_server,loop,6}},{message_queue_len,0}] {#Port<0.149447>,'[email protected]'}
2014-03-26 09:58:10.270 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 0 -> 10 msec/key, based on maximum vnode mailbox size 241 from '[email protected]'
2014-03-26 09:58:25.232 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 10 -> 0 msec/key, based on maximum vnode mailbox size 2 from '[email protected]'
2014-03-26 09:58:40.233 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 0 -> 50 msec/key, based on maximum vnode mailbox size 622 from '[email protected]'
2014-03-26 09:58:55.235 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 50 -> 0 msec/key, based on maximum vnode mailbox size 0 from '[email protected]'
2014-03-26 09:59:10.235 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 0 -> 10 msec/key, based on maximum vnode mailbox size 237 from '[email protected]'
2014-03-26 09:59:25.253 [info] <0.367.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 10 -> 0 msec/key, based on maximum vnode mailbox size 0 from '[email protected]'

Meanwhile, I found that certain objects seem to cause this behaviour (if I exclude their keys from the keylist, the dump succeeds). The offenders I found (by analysing them using the REST interface manually) has larger amount of secondary indices (over 100), while the others have less.

My real problem with riak-data-migrator is that I can query all those failed keys I found (about 10 keys in the 50 I checked) together in less than a minute using REST, and e.g. an entry with 122 secondary keys can be wget-ed in 0.02sec, while riak-data-migrator chokes on it completely.

One more interesting thing is that it does not matter where the offending key is in the list, I do not even get a partial result, even if I specify --riakworkercount 1.

@pallinger pallinger changed the title timeout or no data is dumped on real bucket no data is dumped in case of large number of secondary keys Mar 26, 2014
@pallinger
Copy link
Author

Upon more investigation, it seems that the dump hangs if there are objects with integer-type secondary keys. I will try to get a more definite proof by modifying my test bucket.

@pallinger
Copy link
Author

Upon testing with integer indices, I found out that the error manifests itself if there is an integer index greater than or equal to 2^63 (2^63-1 is fine), so this seems to be a long int problem. I am not sure how to resolve it, as it probably affects the serialisation format currently used. If I will have some more time, I will try to tinker with the source to create a patch.

To reproduce:

IP=192.168..
curl -X POST -H 'x-riak-index-test_int: 9223372036854775807' -d 'just for testing integer secondary indices' http://$IP:8098/buckets/test/keys/integersecondary9223372036854775807
java -jar riak-data-migrator-0.2.6.jar -d -r ./ -b test -h $IP -p 8087 -H 8098 # Will succeed
curl -X POST -H 'x-riak-index-test_int: 9223372036854775808' -d 'just for testing integer secondary indeices' http://192.168.225.201:8098/buckets/test/keys/integersecondary9223372036854775808
java -jar riak-data-migrator-0.2.6.jar -d -r ./ -b test -h $IP -p 8087 -H 8098 # Will hang

Interestingly, I do get a 0% completed. Read 0 @ 0 obj/sec progress report after about 10 seconds (and before the hang), which I do not get if querying from the real database, but that's the only difference in behaviour.

Also, there is nothing in the riak-data-migrator-error.log and riak-data-migrator-status.log that are created during the process.

@pallinger pallinger changed the title no data is dumped in case of large number of secondary keys freeze in case of large integer secondary keys over 2 to 63 Mar 26, 2014
@pallinger pallinger changed the title freeze in case of large integer secondary keys over 2 to 63 freeze in case of large integer secondary keys (over 2^63) Mar 26, 2014
@dmitrizagidulin
Copy link

Ah, ok, yeah, I think this might be related to basho/riak-java-client#112
I'll need to check that we're using the correct java client version in the data migrator.

@dmitrizagidulin
Copy link

Also, I have to warn -- 100 indices on an object is way too many. Your read and write performance (let alone backup) is going to suffer.
What sort of use case are you using those for? Maybe I can recommend alternative methods of achieving that?

@pallinger
Copy link
Author

basho/riak-java-client#112 seems to indicate that they now use long instead of int. However, that would cause the exact behaviour that I observed, as we would need to have arbitrary precision integers to conform with the BP and REST interface.
That means that this error should be addressed mostly in the lower layer, but - if possible - the migrator should at least throw some warnings as now it seems to silently eat exceptions and stuck in an infinite loop or something like that.
I posted a new issue in the java-riak-client regarding this: basho/riak-java-client#402

@pallinger
Copy link
Author

We do have to have 100 and more indices -- we use it for an inverted index on the documents (kind of yokozuna replacement - that won't be available anyway until riak 2.0). We could of course use primary keys and manual conflict (non-)resolution to simulate the current workings, but did not have the time for that yet. We have really considered the options, but none seem to be plausible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants