You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Abdera implementation of the Atom Syndication Format and Atom Publishing Protocol Accumulo secure implementation of BigTable ActiveMQ message broker supporting different communication protocols and clients, including a full Java Message Service (JMS) 1.1 client. Allura Python-based an open source implementation of a software forge Ant Java-based build tool Apache Arrow "A high-performance cross-system data layer for columnar in-memory analytics".[1][2] APR Apache Portable Runtime, a portability library written in C Archiva Build Artifact Repository Manager Apache Beam ,an uber-API for big data Beehive Java visual object model Bloodhound defect tracker based on Trac[3] Calcite dynamic data management framework Camel declarative routing and mediation rules engine which implements the Enterprise Integration Patterns using a Java-based domain specific language
Training and saving the model of NER [TrainGeneTag.java program],
import com.aliasi.chunk.CharLmHmmChunker;
import com.aliasi.corpus.parsers.GeneTagParser;
import com.aliasi.hmm.HmmCharLmEstimator;
import com.aliasi.tokenizer.IndoEuropeanTokenizerFactory;
import com.aliasi.tokenizer.TokenizerFactory;
import com.aliasi.util.AbstractExternalizable;
import java.io.File;
import java.io.IOException;
public class LingPipeNER {
static final int MAX_N_GRAM = 8;
static final int NUM_CHARS = 256;
static final double LM_INTERPOLATION = MAX_N_GRAM; // default behavior
public static void main(String[] args) throws IOException {
File corpusFile = new File("D:\\resources\\lingPipe-technologies-trainFile.txt");
File modelFile = new File("D:\\resources\\muc6.HmmChunker");
System.out.println("Setting up Chunker Estimator");
TokenizerFactory factory = IndoEuropeanTokenizerFactory.INSTANCE;
HmmCharLmEstimator hmmEstimator = new HmmCharLmEstimator(MAX_N_GRAM,NUM_CHARS,LM_INTERPOLATION);
CharLmHmmChunker chunkerEstimator = new CharLmHmmChunker(factory,hmmEstimator);
System.out.println("Setting up Data Parser");
GeneTagParser parser = new GeneTagParser();
parser.setHandler(chunkerEstimator);
System.out.println("Training with Data from File=" + corpusFile);
parser.parse(corpusFile);
System.out.println("Compiling and Writing Model to File=" + modelFile);
AbstractExternalizable.compileTo(chunkerEstimator,modelFile);
}
}
Now loading the model and passing an input sentence to get the entity output [RunChunker.Java program],
but the output is empty. No chunking found for the chunker.chunk(strList[i]). Please guide me.
If you have any better example program to train and find the Name entity, please share it.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi,
My own train file,
Abdera implementation of the Atom Syndication Format and Atom Publishing ProtocolAccumulo secure implementation of BigTableActiveMQ message broker supporting different communication protocols and clients, including a full Java Message Service (JMS) 1.1 client.Allura Python-based an open source implementation of a software forgeAnt Java-based build toolApache Arrow "A high-performance cross-system data layer for columnar in-memory analytics".[1][2]APR Apache Portable Runtime, a portability library written in CArchiva Build Artifact Repository ManagerApache Beam ,an uber-API for big dataBeehive Java visual object modelBloodhound defect tracker based on Trac[3]Calcite dynamic data management frameworkCamel declarative routing and mediation rules engine which implements the Enterprise Integration Patterns using a Java-based domain specific languageTraining and saving the model of NER [TrainGeneTag.java program],
import com.aliasi.chunk.CharLmHmmChunker;
import com.aliasi.corpus.parsers.GeneTagParser;
import com.aliasi.hmm.HmmCharLmEstimator;
import com.aliasi.tokenizer.IndoEuropeanTokenizerFactory;
import com.aliasi.tokenizer.TokenizerFactory;
import com.aliasi.util.AbstractExternalizable;
import java.io.File;
import java.io.IOException;
public class LingPipeNER {
static final int MAX_N_GRAM = 8;
static final int NUM_CHARS = 256;
static final double LM_INTERPOLATION = MAX_N_GRAM; // default behavior
}
Now loading the model and passing an input sentence to get the entity output [RunChunker.Java program],
import com.aliasi.chunk.Chunk;
import com.aliasi.chunk.Chunker;
import com.aliasi.chunk.Chunking;
import com.aliasi.util.AbstractExternalizable;
import com.sun.org.apache.xerces.internal.xs.StringList;
import edu.stanford.nlp.ling.tokensregex.PhraseTable;
import java.io.File;
import java.util.Iterator;
import java.util.Set;
public class RunChunker {
// System.out.println("Chunking=" + chunking);
Set chunkSet = chunking.chunkSet();
Iterator it = chunkSet.iterator();
while (it.hasNext()) {
Chunk chunk = it.next();
int start = chunk.start();
int end = chunk.end();
String text = sentence.substring(start,end);
System.out.println(" chunk=" + chunk + " text=" + text);
}
}
}
}
Using lingpipe 3.9.3 version,
com.aliasi
lingpipe
3.9.3
Expected output:
Technology: camel
Technology: camel
but the output is empty. No chunking found for the chunker.chunk(strList[i]). Please guide me.
If you have any better example program to train and find the Name entity, please share it.
Thanks in advance.
The text was updated successfully, but these errors were encountered: