You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 6, 2025. It is now read-only.
Copy file name to clipboardexpand all lines: CONTRIBUTING.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ As the [Requests Code Of Conduct](http://docs.python-requests.org/en/master/dev/
16
16
17
17
## Your first contribution
18
18
19
-
A great way to start contributing to Camelot is to pick an issue tagged with the [help wanted](https://github.com/socialcopsdev/camelot/labels/help%20wanted) tag or the [good first issue](https://github.com/socialcopsdev/camelot/labels/good%20first%20issue) tag. If you're unable to find a good first issue, feel free to contact the maintainer.
19
+
A great way to start contributing to Camelot is to pick an issue tagged with the [help wanted](https://github.com/camelot-dev/camelot/labels/help%20wanted) tag or the [good first issue](https://github.com/camelot-dev/camelot/labels/good%20first%20issue) tag. If you're unable to find a good first issue, feel free to contact the maintainer.
20
20
21
21
## Setting up a development environment
22
22
@@ -36,7 +36,7 @@ $ pip install ".[dev]"
36
36
37
37
### Submit a pull request
38
38
39
-
The preferred workflow for contributing to Camelot is to fork the [project repository](https://github.com/socialcopsdev/camelot) on GitHub, clone, develop on a branch and then finally submit a pull request. Here are the steps:
39
+
The preferred workflow for contributing to Camelot is to fork the [project repository](https://github.com/camelot-dev/camelot) on GitHub, clone, develop on a branch and then finally submit a pull request. Here are the steps:
40
40
41
41
1. Fork the project repository. Click on the ‘Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub.
42
42
@@ -106,7 +106,7 @@ The function docstrings are written using the [numpydoc](https://numpydoc.readth
106
106
107
107
## Filing Issues
108
108
109
-
We use [GitHub issues](https://github.com/socialcopsdev/camelot/issues) to keep track of all issues and pull requests. Before opening an issue (which asks a question or reports a bug), please use GitHub search to look for existing issues (both open and closed) that may be similar.
109
+
We use [GitHub issues](https://github.com/camelot-dev/camelot/issues) to keep track of all issues and pull requests. Before opening an issue (which asks a question or reports a bug), please use GitHub search to look for existing issues (both open and closed) that may be similar.
Copy file name to clipboardexpand all lines: README.md
+6-6
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@
16
16
17
17
---
18
18
19
-
**Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/atlanhq/camelot/blob/master/docs/_static/pdf/foo.pdf).
19
+
**Here's how you can extract tables from PDF files.** Check out the PDF used in this example [here](https://github.com/camelot-dev/camelot/blob/master/docs/_static/pdf/foo.pdf).
20
20
21
21
<pre>
22
22
>>> import camelot
@@ -57,7 +57,7 @@ There's a [command-line interface](https://camelot-py.readthedocs.io/en/master/u
57
57
- Each table is a **pandas DataFrame**, which seamlessly integrates into [ETL and data analysis workflows](https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873).
58
58
-**Export** to multiple formats, including JSON, Excel, HTML and Sqlite.
59
59
60
-
See [comparison with other PDF table extraction libraries and tools](https://github.com/atlanhq/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).
60
+
See [comparison with other PDF table extraction libraries and tools](https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools).
61
61
62
62
## Installation
63
63
@@ -82,7 +82,7 @@ $ pip install camelot-py[cv]
82
82
After [installing the dependencies](https://camelot-py.readthedocs.io/en/master/user/install.html#using-pip), clone the repo using:
Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/atlanhq/camelot/blob/master/HISTORY.md).
129
+
Camelot uses [Semantic Versioning](https://semver.org/). For the available versions, see the tags on this repository. For the changelog, you can check out [HISTORY.md](https://github.com/camelot-dev/camelot/blob/master/HISTORY.md).
130
130
131
131
## License
132
132
133
-
This project is licensed under the MIT License, see the [LICENSE](https://github.com/atlanhq/camelot/blob/master/LICENSE) file for details.
133
+
This project is licensed under the MIT License, see the [LICENSE](https://github.com/camelot-dev/camelot/blob/master/LICENSE) file for details.
Copy file name to clipboardexpand all lines: docs/dev/contributing.rst
+4-4
Original file line number
Diff line number
Diff line change
@@ -29,8 +29,8 @@ Your first contribution
29
29
30
30
A great way to start contributing to Camelot is to pick an issue tagged with the `help wanted`_ or the `good first issue`_ tags. If you're unable to find a good first issue, feel free to contact the maintainer.
.. _good first issue: https://github.com/camelot-dev/camelot/labels/good%20first%20issue
34
34
35
35
Setting up a development environment
36
36
------------------------------------
@@ -51,7 +51,7 @@ Submit a pull request
51
51
52
52
The preferred workflow for contributing to Camelot is to fork the `project repository`_ on GitHub, clone, develop on a branch and then finally submit a pull request. Here are the steps:
1. Fork the project repository. Click on the ‘Fork’ button near the top of the page. This creates a copy of the code under your account on the GitHub.
57
57
@@ -134,7 +134,7 @@ Filing Issues
134
134
135
135
We use `GitHub issues`_ to keep track of all issues and pull requests. Before opening an issue (which asks a question or reports a bug), please use GitHub search to look for existing issues (both open and closed) that may be similar.
See `comparison with other PDF table extraction libraries and tools`_.
82
82
83
83
.. _ETL and data analysis workflows: https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873
84
-
.. _comparison with other PDF table extraction libraries and tools: https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
84
+
.. _comparison with other PDF table extraction libraries and tools: https://github.com/camelot-dev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools
Copy file name to clipboardexpand all lines: docs/user/advanced.rst
+6-6
Original file line number
Diff line number
Diff line change
@@ -224,12 +224,12 @@ Table areas that you want Camelot to analyze can be passed as a list of comma-se
224
224
.. csv-table::
225
225
:file: ../_static/csv/table_areas.csv
226
226
227
-
.. note:: ``table_areas`` accepts strings of the form x1,y1,x2,y2 where (x1, y1) -> top-left and (x2, y2) -> bottom-right in PDF coordinate space. In PDF coordinate space, the bottom-left corner of the page is the origin, with coordinates (0, 0).
227
+
.. note:: ``table_areas`` accepts strings of the form x1,y1,x2,y2 where (x1, y1) -> top-left and (x2, y2) -> bottom-right in PDF coordinate space. In PDF coordinate space, the bottom-left corner of the page is the origin, with coordinates (0, 0).
228
228
229
229
Specify table regions
230
230
---------------------
231
231
232
-
However there may be cases like `[1] <../_static/pdf/table_regions.pdf>`__ and `[2] <https://github.com/socialcopsdev/camelot/blob/master/tests/files/tableception.pdf>`__, where the table might not lie at the exact coordinates every time but in an approximate region.
232
+
However there may be cases like `[1] <../_static/pdf/table_regions.pdf>`__ and `[2] <https://github.com/camelot-dev/camelot/blob/master/tests/files/tableception.pdf>`__, where the table might not lie at the exact coordinates every time but in an approximate region.
233
233
234
234
You can use the ``table_regions`` keyword argument to :meth:`read_pdf() <camelot.read_pdf>` to solve for such cases. When ``table_regions`` is specified, Camelot will only analyze the specified regions to look for tables.
235
235
@@ -316,7 +316,7 @@ In this case, the text that `other tools`_ return, will be ``24.912``. This is r
316
316
317
317
You can solve this by passing ``flag_size=True``, which will enclose the superscripts and subscripts with ``<s></s>``, based on font size, as shown below.
@@ -340,7 +340,7 @@ You can solve this by passing ``flag_size=True``, which will enclose the supersc
340
340
Strip characters from text
341
341
--------------------------
342
342
343
-
You can strip unwanted characters like spaces, dots and newlines from a string using the ``strip_text`` keyword argument. Take a look at `this PDF <https://github.com/socialcopsdev/camelot/blob/master/tests/files/tabula/12s0324.pdf>`_ as an example, the text at the start of each row contains a lot of unwanted spaces, dots and newlines.
343
+
You can strip unwanted characters like spaces, dots and newlines from a string using the ``strip_text`` keyword argument. Take a look at `this PDF <https://github.com/camelot-dev/camelot/blob/master/tests/files/tabula/12s0324.pdf>`_ as an example, the text at the start of each row contains a lot of unwanted spaces, dots and newlines.
344
344
345
345
::
346
346
@@ -366,7 +366,7 @@ You can strip unwanted characters like spaces, dots and newlines from a string u
366
366
Improve guessed table areas
367
367
---------------------------
368
368
369
-
While using :ref:`Stream <stream>`, automatic table detection can fail for PDFs like `this one <https://github.com/socialcopsdev/camelot/blob/master/tests/files/edge_tol.pdf>`_. That's because the text is relatively far apart vertically, which can lead to shorter textedges being calculated.
369
+
While using :ref:`Stream <stream>`, automatic table detection can fail for PDFs like `this one <https://github.com/camelot-dev/camelot/blob/master/tests/files/edge_tol.pdf>`_. That's because the text is relatively far apart vertically, which can lead to shorter textedges being calculated.
370
370
371
371
.. note:: To know more about how textedges are calculated to guess table areas, you can see pages 20, 35 and 40 of `Anssi Nurminen's master's thesis <http://dspace.cc.tut.fi/dpub/bitstream/handle/123456789/21520/Nurminen.pdf?sequence=3>`_.
372
372
@@ -626,7 +626,7 @@ We don't need anything else. Now, let's pass ``copy_text=['v']`` to copy text in
626
626
Tweak layout generation
627
627
-----------------------
628
628
629
-
Camelot is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 <https://github.com/socialcopsdev/camelot/issues/170>`_ and `#215 <https://github.com/socialcopsdev/camelot/issues/215>`_), PDFMiner can group characters that should belong to the same sentence into separate sentences.
629
+
Camelot is built on top of PDFMiner's functionality of grouping characters on a page into words and sentences. In some cases (such as `#170 <https://github.com/camelot-dev/camelot/issues/170>`_ and `#215 <https://github.com/camelot-dev/camelot/issues/215>`_), PDFMiner can group characters that should belong to the same sentence into separate sentences.
630
630
631
631
To deal with such cases, you can tweak PDFMiner's `LAParams kwargs <https://github.com/euske/pdfminer/blob/master/pdfminer/layout.py#L33>`_ to improve layout generation, by passing the keyword arguments as a dict using ``layout_kwargs`` in :meth:`read_pdf() <camelot.read_pdf>`. To know more about the parameters you can tweak, you can check out `PDFMiner docs <https://euske.github.io/pdfminer/>`_.
0 commit comments