Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException #5

Open
flamingofugang opened this issue Jul 14, 2016 · 6 comments
Open

ArrayIndexOutOfBoundsException #5

flamingofugang opened this issue Jul 14, 2016 · 6 comments

Comments

@flamingofugang
Copy link
Contributor

flamingofugang commented Jul 14, 2016

I got the following exception error when I run hadoop mr:

Sampling started
16/07/14 09:25:29 INFO input.FileInputFormat: **Total input paths to process : 0**
16/07/14 09:25:29 INFO partition.InputSampler: Using 0 samples
16/07/14 09:25:29 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
16/07/14 09:25:29 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:340)
        at org.rdfhdt.mrbuilder.HDTBuilderDriver.runDictionaryJob(HDTBuilderDriver.java:242)
        at org.rdfhdt.mrbuilder.HDTBuilderDriver.main(HDTBuilderDriver.java:112)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Here is the code snippet causing exception:
InputSampler.writePartitionFile(job, new InputSampler.IntervalSampler<Text, Text>(this.conf.getSampleProbability()));

It seems the input files are not found... I created 'input' directory, and put ntriples '.nt' files in it.

Any idea?

Best,
Gang

@artob
Copy link
Contributor

artob commented Aug 8, 2016

@flamingofugang Just to check, is this a regression given the changes in the last two months, or is this the furthest as yet in making HDT-MR actually work?

@flamingofugang
Copy link
Contributor Author

flamingofugang commented Aug 8, 2016

There is still lzo compression library issue, I will report later on.

@artob
Copy link
Contributor

artob commented Aug 11, 2016

Related pull request: #4

@artob
Copy link
Contributor

artob commented Aug 11, 2016

@flamingofugang Does your pull request #4 resolve this?

@flamingofugang
Copy link
Contributor Author

The java program takes in lzo compressed ntriples file as input, and the lzo file should be indexed as far as I understand.

I changed the pom.xml to make dependency on a locally build hadoop lzo package with native lzo library available.

I recommend this should be explained in the README file:

First the user need to install lzo and lzop
Second, build hadoop lzo package: https://github.com/twitter/hadoop-lzo
Then register that jar in the local .m2 repository, then build this hdt-mr package.

@tangina-sultana
Copy link

Hi, can you share the installation process of HDT-MR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants