Skip to content

Commit ef23500

Browse files
committed
fix bugs
1 parent 9e5a579 commit ef23500

File tree

9 files changed

+73
-80
lines changed

9 files changed

+73
-80
lines changed

WEBERT.py

+19-19
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@ class BERT:
4646
4747
:param inputs: input data
4848
:param file: name of the document.
49-
:param language: input language (english).
50-
:param stopwords: boolean variable for the stopword remotion (False).
51-
:param model: base or large model (base).
52-
:param cased: boolean variable to compute cased or lower-case model (False).
49+
:param language: input language (By defalut: english).
50+
:param stopwords: boolean variable for removing stopwords (By defalut: False).
51+
:param model: base or large model (By defalut: base).
52+
:param cased: boolean variable to compute cased or lower-case model (By defalut: False).
5353
:returns: WEBERT object
5454
"""
5555

@@ -155,11 +155,11 @@ def __data_preparation(self):
155155

156156
def get_bert_embeddings(self, path, dynamic=True, static=False):
157157
"""
158-
Bert embeddings computation using Transformes. It store and transforms the texts into BERT embeddings. The embedings are stored in csv files.
158+
Bert embeddings computation using Transformes. It store and transforms the texts into BERT embeddings. The embeddings are stored in csv files.
159159
160160
:param path: path to save the embeddings
161-
:param dynamic: boolean variable to compute the dynamic embeddings (True).
162-
:param static: boolean variable to compute the static embeddings (False).
161+
:param dynamic: boolean variable to compute the dynamic embeddings (By defalut: True).
162+
:param static: boolean variable to compute the static embeddings (By defalut: False).
163163
:returns: static embeddings if static=True
164164
165165
"""
@@ -251,16 +251,16 @@ def get_bert_embeddings(self, path, dynamic=True, static=False):
251251
class BETO:
252252
"""
253253
WEBERT-BETO computes BETO to get static or dynamic embeddings.
254-
BETO is a pretrained BERT model from spanish corpus (https://github.com/dccuchile/beto)
254+
BETO is a pretrained BERT model from spanish corpus (https://github.com/dccuchile/beto).
255255
BETO uses Transformers (https://github.com/huggingface/transformers).
256256
It can be computed using only spanish model.
257257
Also considers cased or uncased options, and remotion of stopwords.
258258
259259
:param inputs: input data
260260
:param file: name of the document.
261-
:param stopwords: boolean variable for the stopword remotion (False).
262-
:param model: base or large model (base).
263-
:param cased: boolean variable to compute cased or lower-case model (False).
261+
:param stopwords: boolean variable for removing stopwords (By defalut: False).
262+
:param model: base or large model (By defalut: base).
263+
:param cased: boolean variable to compute cased or lower-case model (By defalut: False).
264264
:returns: WEBERT object
265265
"""
266266

@@ -365,11 +365,11 @@ def __data_preparation(self):
365365

366366
def get_bert_embeddings(self, path, dynamic=True, static=False):
367367
"""
368-
BETO embeddings computation using Transformes. It store and transforms the texts into BETO embeddings. The embedings are stored in csv files.
368+
BETO embeddings computation using Transformes. It store and transforms the texts into BETO embeddings. The embeddings are stored in csv files.
369369
370370
:param path: path to save the embeddings
371-
:param dynamic: boolean variable to compute the dynamic embeddings (True).
372-
:param static: boolean variable to compute the static embeddings (False).
371+
:param dynamic: boolean variable to compute the dynamic embeddings (By defalut: True).
372+
:param static: boolean variable to compute the static embeddings (By defalut: False).
373373
:returns: static embeddings if static=True
374374
375375
"""
@@ -468,8 +468,8 @@ class SciBERT:
468468
469469
:param inputs: input data
470470
:param file: name of the document.
471-
:param stopwords: boolean variable for the stopword remotion (False).
472-
:param cased: boolean variable to compute cased or lower-case model (False).
471+
:param stopwords: boolean variable for removing stopwords (By defalut: False).
472+
:param cased: boolean variable to compute cased or lower-case model (By defalut: False).
473473
:returns: WEBERT object
474474
"""
475475

@@ -575,11 +575,11 @@ def __data_preparation(self):
575575

576576
def get_bert_embeddings(self, path, dynamic=True, static=False):
577577
"""
578-
SciBert embeddings computation using Transformes. It store and transforms the texts into SciBERT embeddings. The embedings are stored in csv files.
578+
SciBert embeddings computation using Transformes. It store and transforms the texts into SciBERT embeddings. The embeddings are stored in csv files.
579579
580580
:param path: path to save the embeddings
581-
:param dynamic: boolean variable to compute the dynamic embeddings (True).
582-
:param static: boolean variable to compute the static embeddings (False).
581+
:param dynamic: boolean variable to compute the dynamic embeddings (By defalut: True).
582+
:param static: boolean variable to compute the static embeddings (By defalut: False).
583583
:returns: static embeddings if static=True
584584
585585
"""
171 Bytes
Binary file not shown.

docs/build/doctrees/index.doctree

455 Bytes
Binary file not shown.

docs/build/html/_modules/WEBERT.html

+19-23
Original file line numberDiff line numberDiff line change
@@ -180,10 +180,6 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
180180
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="k">import</span> <span class="n">kurtosis</span><span class="p">,</span> <span class="n">skew</span>
181181

182182

183-
184-
185-
186-
187183
<span class="c1">#%%</span>
188184
<span class="c1"># specify GPU device</span>
189185

@@ -205,10 +201,10 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
205201
<span class="sd"> </span>
206202
<span class="sd"> :param inputs: input data</span>
207203
<span class="sd"> :param file: name of the document.</span>
208-
<span class="sd"> :param language: input language (english).</span>
209-
<span class="sd"> :param stopwords: boolean variable for the stopword remotion (False).</span>
210-
<span class="sd"> :param model: base or large model (base).</span>
211-
<span class="sd"> :param cased: boolean variable to compute cased or lower-case model (False).</span>
204+
<span class="sd"> :param language: input language (By defalut: english).</span>
205+
<span class="sd"> :param stopwords: boolean variable for removing stopwords (By defalut: False).</span>
206+
<span class="sd"> :param model: base or large model (By defalut: base).</span>
207+
<span class="sd"> :param cased: boolean variable to compute cased or lower-case model (By defalut: False).</span>
212208
<span class="sd"> :returns: WEBERT object</span>
213209
<span class="sd"> &quot;&quot;&quot;</span>
214210

@@ -314,11 +310,11 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
314310

315311
<div class="viewcode-block" id="BERT.get_bert_embeddings"><a class="viewcode-back" href="../index.html#WEBERT.BERT.get_bert_embeddings">[docs]</a> <span class="k">def</span> <span class="nf">get_bert_embeddings</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">dynamic</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">static</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
316312
<span class="sd">&quot;&quot;&quot;</span>
317-
<span class="sd"> Bert embeddings computation using Transformes. It store and transforms the texts into BERT embeddings. The embedings are stored in csv files.</span>
313+
<span class="sd"> Bert embeddings computation using Transformes. It store and transforms the texts into BERT embeddings. The embeddings are stored in csv files.</span>
318314
<span class="sd"> </span>
319315
<span class="sd"> :param path: path to save the embeddings</span>
320-
<span class="sd"> :param dynamic: boolean variable to compute the dynamic embeddings (True).</span>
321-
<span class="sd"> :param static: boolean variable to compute the static embeddings (False).</span>
316+
<span class="sd"> :param dynamic: boolean variable to compute the dynamic embeddings (By defalut: True).</span>
317+
<span class="sd"> :param static: boolean variable to compute the static embeddings (By defalut: False).</span>
322318
<span class="sd"> :returns: static embeddings if static=True</span>
323319
<span class="sd"> </span>
324320
<span class="sd"> &quot;&quot;&quot;</span>
@@ -410,16 +406,16 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
410406
<div class="viewcode-block" id="BETO"><a class="viewcode-back" href="../index.html#WEBERT.BETO">[docs]</a><span class="k">class</span> <span class="nc">BETO</span><span class="p">:</span>
411407
<span class="sd">&quot;&quot;&quot;</span>
412408
<span class="sd"> WEBERT-BETO computes BETO to get static or dynamic embeddings. </span>
413-
<span class="sd"> BETO is a pretrained BERT model from spanish corpus (https://github.com/dccuchile/beto)</span>
409+
<span class="sd"> BETO is a pretrained BERT model from spanish corpus (https://github.com/dccuchile/beto).</span>
414410
<span class="sd"> BETO uses Transformers (https://github.com/huggingface/transformers). </span>
415411
<span class="sd"> It can be computed using only spanish model.</span>
416412
<span class="sd"> Also considers cased or uncased options, and remotion of stopwords.</span>
417413
<span class="sd"> </span>
418414
<span class="sd"> :param inputs: input data</span>
419415
<span class="sd"> :param file: name of the document.</span>
420-
<span class="sd"> :param stopwords: boolean variable for the stopword remotion (False).</span>
421-
<span class="sd"> :param model: base or large model (base).</span>
422-
<span class="sd"> :param cased: boolean variable to compute cased or lower-case model (False).</span>
416+
<span class="sd"> :param stopwords: boolean variable for removing stopwords (By defalut: False).</span>
417+
<span class="sd"> :param model: base or large model (By defalut: base).</span>
418+
<span class="sd"> :param cased: boolean variable to compute cased or lower-case model (By defalut: False).</span>
423419
<span class="sd"> :returns: WEBERT object</span>
424420
<span class="sd"> &quot;&quot;&quot;</span>
425421

@@ -524,11 +520,11 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
524520

525521
<div class="viewcode-block" id="BETO.get_bert_embeddings"><a class="viewcode-back" href="../index.html#WEBERT.BETO.get_bert_embeddings">[docs]</a> <span class="k">def</span> <span class="nf">get_bert_embeddings</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">dynamic</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">static</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
526522
<span class="sd">&quot;&quot;&quot;</span>
527-
<span class="sd"> BETO embeddings computation using Transformes. It store and transforms the texts into BETO embeddings. The embedings are stored in csv files.</span>
523+
<span class="sd"> BETO embeddings computation using Transformes. It store and transforms the texts into BETO embeddings. The embeddings are stored in csv files.</span>
528524
<span class="sd"> </span>
529525
<span class="sd"> :param path: path to save the embeddings</span>
530-
<span class="sd"> :param dynamic: boolean variable to compute the dynamic embeddings (True).</span>
531-
<span class="sd"> :param static: boolean variable to compute the static embeddings (False).</span>
526+
<span class="sd"> :param dynamic: boolean variable to compute the dynamic embeddings (By defalut: True).</span>
527+
<span class="sd"> :param static: boolean variable to compute the static embeddings (By defalut: False).</span>
532528
<span class="sd"> :returns: static embeddings if static=True</span>
533529
<span class="sd"> </span>
534530
<span class="sd"> &quot;&quot;&quot;</span>
@@ -627,8 +623,8 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
627623
<span class="sd"> </span>
628624
<span class="sd"> :param inputs: input data</span>
629625
<span class="sd"> :param file: name of the document.</span>
630-
<span class="sd"> :param stopwords: boolean variable for the stopword remotion (False).</span>
631-
<span class="sd"> :param cased: boolean variable to compute cased or lower-case model (False).</span>
626+
<span class="sd"> :param stopwords: boolean variable for removing stopwords (By defalut: False).</span>
627+
<span class="sd"> :param cased: boolean variable to compute cased or lower-case model (By defalut: False).</span>
632628
<span class="sd"> :returns: WEBERT object</span>
633629
<span class="sd"> &quot;&quot;&quot;</span>
634630

@@ -734,11 +730,11 @@ <h1>Source code for WEBERT</h1><div class="highlight"><pre>
734730

735731
<div class="viewcode-block" id="SciBERT.get_bert_embeddings"><a class="viewcode-back" href="../index.html#WEBERT.SciBERT.get_bert_embeddings">[docs]</a> <span class="k">def</span> <span class="nf">get_bert_embeddings</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">dynamic</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">static</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
736732
<span class="sd">&quot;&quot;&quot;</span>
737-
<span class="sd"> SciBert embeddings computation using Transformes. It store and transforms the texts into SciBERT embeddings. The embedings are stored in csv files.</span>
733+
<span class="sd"> SciBert embeddings computation using Transformes. It store and transforms the texts into SciBERT embeddings. The embeddings are stored in csv files.</span>
738734
<span class="sd"> </span>
739735
<span class="sd"> :param path: path to save the embeddings</span>
740-
<span class="sd"> :param dynamic: boolean variable to compute the dynamic embeddings (True).</span>
741-
<span class="sd"> :param static: boolean variable to compute the static embeddings (False).</span>
736+
<span class="sd"> :param dynamic: boolean variable to compute the dynamic embeddings (By defalut: True).</span>
737+
<span class="sd"> :param static: boolean variable to compute the static embeddings (By defalut: False).</span>
742738
<span class="sd"> :returns: static embeddings if static=True</span>
743739
<span class="sd"> </span>
744740
<span class="sd"> &quot;&quot;&quot;</span>

docs/build/html/_sources/index.rst.txt

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Welcome to WEBERT's documentation!
22
============================================================
33
This toolkit computes word embeddings using Bidirectional Encoder Representations from Transformers (BERT) for cased and large models in spanish and english automatically.
4-
BERT embeddings are computed using Transformers (https://github.com/huggingface/transformers). The project is ongoing.
4+
BERT embeddings are computed using Transformers (https://github.com/huggingface/transformers). The project is currently ongoing.
55

66
The code for this project is available at https://github.com/PauPerezT/WEBERT
77

@@ -30,7 +30,7 @@ To install the requeriments, please run::
3030
Executing commands
3131
^^^^^^^^^^^^^^^^^^
3232

33-
This are
33+
3434

3535
Run it automatically from linux terminal
3636
----------------------------------------
@@ -45,12 +45,12 @@ To compute Bert embeddings automatically
4545
Optional arguments Optional Values Description
4646
==================== =================== =====================================================================================
4747
-h Show this help message and exit
48-
-f File folder of the set of txt documents.
48+
-f Path folder of the txt documents (Only txt format).
4949

50-
By defaul './texts'
50+
By default './texts'
5151
-s Path to save the embeddings.
5252

53-
By defaul './bert_embeddings'
53+
By default './bert_embeddings'
5454
-bm Bert,Beto,SciBert Choose between three different BERT models.
5555

5656
By default BERT

0 commit comments

Comments
 (0)