Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests]: test metadataproviders in online/offline modes #23

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

paulolimac
Copy link

@paulolimac paulolimac commented Oct 30, 2017

Addresses #20, partially.

Implements some unit tests to Amazon and Wikipedia Medataproviders.
In two modes:

  • online (normal use)
  • offline (downloading Amazon and Wikipedia responses in html files).

For more info, see commits.

What do:
Test Amazon and Wikipedia Medataproviders by unittests.
Download Amazon and Wikipedia responses in html files.

Why do:
Test software in a online way (typical use).
Create some test cases.
Improve reliability and quality.
Meet the requirements.
Avoid errors, misunderstands, defects, failures, and so on.
Test the software in a offline way.
Same as online but using only local resources.
Hit servers in a low rate.
Avoid in taking a easy ban from servers.
Stay with a low count accesses by the servers.

How do:
Use unittest python build in.
All files are under `./tests/` folder.
Implement 4 tests:
- Attribute `VALIDE_URL` from `Amazon` module.
- Attribute `VALIDE_URL` from `Wikipedia` module.
- Function `Lookup` from `Amazon` module.
- Function `Lookup` from `Wikipedia` module.
Implements environment constants in `./tests/resources/constants.py`
Before running the tests, go to set `./tests/resources/constants.py`,
to configure your execution:
`UPDATE_HTML_FILES_CAPTURED = True` -use in your first testing run,
                                    -to download htmls files and,
                                    -after run, mark it `False` but,
                                    -mark it `True` sometimes only,
                                    -to update your testable html files.
`DO_ONLINE_TESTS = True` -to run online tests,
                         -it slowdown your test so,
                         -mark it `False` and,
                         -mark it `True` sometimes only.
Run the tests by the command bellow:
`$ python3 -m unittest discover -s tests -v`
And if `DO_ONLINE_TESTS = True` and `UPDATE_HTML_FILES_CAPTURED = True`,
The result will be something like this:
```
New HTML file downloaded in:  tests/resources/html_files_captured/latest_html_files_captured/amazon_with_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/2017-10-30T13:23:54/amazon_with_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/latest_html_files_captured/amazon_without_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/2017-10-30T13:23:54/amazon_without_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/latest_html_files_captured/wikipedia_with_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/2017-10-30T13:23:54/wikipedia_with_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/latest_html_files_captured/wikipedia_without_song_table.html
New HTML file downloaded in:  tests/resources/html_files_captured/2017-10-30T13:23:54/wikipedia_without_song_table.html
test_Amazon_VALID_URL (test_MetaDataProviders.test_Amazon.TestRepo) ... ok
test_Amazon_lookup (test_MetaDataProviders.test_Amazon.TestRepo) ... ok
test_Wikipedia_VALID_URL (test_MetaDataProviders.test_Wikipedia.TestRepo) ... ok
test_Wikipedia_lookup (test_MetaDataProviders.test_Wikipedia.TestRepo) ... ok

----------------------------------------------------------------------
Ran 4 tests in 12.259s

OK
```
The html files are saved in two folders:
- DIR1- ./tests/resources/html_files_captured/latest_html_files_captured/
- DIR2- ./tests/resources/html_files_captured/0000-00-00T00:00:00/
In DIR1 stay saved only the latest html files downloaded and is used in offline tests.
And in DIR2 stay saved all html old html files downloaded by datetime.
Run the tests by the command bellow:
`$ python3 -m unittest discover -s tests -v`
And if `DO_ONLINE_TESTS = True` and `UPDATE_HTML_FILES_CAPTURED = False`,
The result will be something like this:
```
test_Amazon_VALID_URL (test_MetaDataProviders.test_Amazon.TestRepo) ... ok
test_Amazon_lookup (test_MetaDataProviders.test_Amazon.TestRepo) ... ok
test_Wikipedia_VALID_URL (test_MetaDataProviders.test_Wikipedia.TestRepo) ... ok
test_Wikipedia_lookup (test_MetaDataProviders.test_Wikipedia.TestRepo) ... ok

----------------------------------------------------------------------
Ran 4 tests in 14.166s

OK
```
Run the tests by the command bellow:
`$ python3 -m unittest discover -s tests -v`
And if `DO_ONLINE_TESTS = False` and `UPDATE_HTML_FILES_CAPTURED = False`,
The result will be something like this:
```
test_Amazon_VALID_URL (test_MetaDataProviders.test_Amazon.TestRepo) ... ok
test_Amazon_lookup (test_MetaDataProviders.test_Amazon.TestRepo) ... ok
test_Wikipedia_VALID_URL (test_MetaDataProviders.test_Wikipedia.TestRepo) ... ok
test_Wikipedia_lookup (test_MetaDataProviders.test_Wikipedia.TestRepo) ... ok

----------------------------------------------------------------------
Ran 4 tests in 1.109s

OK
```
The running time decrease and we avoid a ban or 503 from servers.

Where do:
$ git diff --staged -M --stat
 .gitignore                                     |  3 ++-
 MetaDataProviders/__init__.py                  |  0
 __init__.py                                    |  0
 tests/__init__.py                              |  0
 tests/resources/__init__.py                    |  0
 tests/resources/constants.py                   | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/resources/fake_response.py               | 35 +++++++++++++++++++++++++++++++++++
 tests/resources/html_file_downloader.py        | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 tests/test_MetaDataProviders/__init__.py       |  0
 tests/test_MetaDataProviders/test_Amazon.py    | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/test_MetaDataProviders/test_Wikipedia.py | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/test_split.py                            | 22 ++++++++++++++++++++++
 12 files changed, 256 insertions(+), 1 deletion(-)
$ git diff --staged -M
diff --git a/.gitignore b/.gitignore
index 788494e..b3b6999 100644
--- a/.gitignore
+++ b/.gitignore
@@ -5,4 +5,5 @@ venv/
 splits/
 __pycache__/
 .idea
-tracks.txt
\ No newline at end of file
+tracks.txt
+tests/resources/html_files_captured/
diff --git a/MetaDataProviders/__init__.py b/MetaDataProviders/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/__init__.py b/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/__init__.py b/tests/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/resources/__init__.py b/tests/resources/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/resources/constants.py b/tests/resources/constants.py
new file mode 100644
index 0000000..5afdd65
--- /dev/null
+++ b/tests/resources/constants.py
@@ -0,0 +1,49 @@
+#constants.py
+
+#UPDATE_HTML_FILES_CAPTURED
+#boolean (default: 'False')
+#Mark 'False' to not update your HTML files, from Amazon and Wikipedia by running 'html_file_downloader.py'.
+#Mark 'True' to update your HTML files, from Amazon and Wikipedia by running 'html_file_downloader.py'.
+#The 'HTML_file_downloader.py' gets newest HTMLs to be used in offline tests and, stores all old HTMLs in a proper datetimed subfolder.
+UPDATE_HTML_FILES_CAPTURED = False
+
+#DO_ONLINE_TESTS
+#boolean (default: 'False')
+#Mark 'False' to run only offline tests, and 'True' to run online tests.
+#If you mark 'True', the online tests will consume two things from you:
+crisbal#1- Your internet bandwith;
+crisbal#2- Your accesses in Amazon.
+DO_ONLINE_TESTS = False
+
+#RESPONSES_IN_HTML_FILES_DIR_PATH
+#string (default: './tests/resources/html_files_captured/latest_html_files_captured/')
+#Points to directory (folder) that contains all *.html responses exported from Amazon and Wikipedia.
+#The HTML are used in offline tests, helping in simulate the responses.
+#The folder contains only HTML from 3 sites used in tests (Amazon, Wikipedia).
+RESPONSES_IN_HTML_FILES_DIR_PATH = './tests/resources/html_files_captured/latest_html_files_captured/'
+
+#TRACK_FILENAME_INFO
+#Name of tha file that contains tracks and times information.
+TRACK_FILENAME = 'tracks.txt'
+
+#AMAZON TEST INFO
+#Relation of Amazon Urls to be accessed
+AMAZON_URLS = {
+    'with_song_table' : 'https://www.amazon.com/Dogs-Eating-blink-182/dp/B00B054FFA',
+    'without_song_table' : 'https://www.amazon.com/p/feature/rzekmvyjojcp6uc',
+    #'with_404' : 'https://www.amazon.com/404',
+}
+
+WIKIPEDIA_URLS = {
+    'with_song_table' : 'https://en.wikipedia.org/wiki/Dogs_Eating_Dogs',
+    'without_song_table' : 'https://en.wikipedia.org/wiki/Wikipedia:About',
+    #'with_404' : '',
+}
+
+VALID_URLS = {
+    'amazon' : 'https?://(?:\w+\.)?amazon\..*/.*',
+    'wikipedia' : 'https?://(?:\w+\.)?wikipedia\..*/.*',
+}
+
+
+
diff --git a/tests/resources/fake_response.py b/tests/resources/fake_response.py
new file mode 100644
index 0000000..51c9d57
--- /dev/null
+++ b/tests/resources/fake_response.py
@@ -0,0 +1,35 @@
+#fake_response.py
+
+import unittest
+
+import requests
+from urllib.parse import urlparse
+import os.path
+from io import BytesIO
+
+from resources import constants
+
+#Return a fake response from a loaded json filesystem by url_path
+def fake_requests_get(url):
+
+    root_domain = urlparse(url).hostname.split('.')[1]
+    kind_of_file = ''
+
+    for rd, d in [('amazon', constants.AMAZON_URLS),('wikipedia', constants.WIKIPEDIA_URLS)]:
+        if root_domain == rd:
+            for kind, site in d.items():
+                if url == site:
+                    kind_of_file = kind
+
+    resource_file = os.path.normpath(
+        constants.RESPONSES_IN_HTML_FILES_DIR_PATH
+        + root_domain
+        + '_'
+        + kind_of_file
+        + '.html')
+    faked_response = None
+
+    with open(resource_file, mode='rb') as f:
+        data = f.read()
+        faked_response = BytesIO(data)
+    return faked_response
diff --git a/tests/resources/html_file_downloader.py b/tests/resources/html_file_downloader.py
new file mode 100644
index 0000000..08ac3b2
--- /dev/null
+++ b/tests/resources/html_file_downloader.py
@@ -0,0 +1,48 @@
+import os
+import re
+import datetime
+#import requests
+from urllib.request import build_opener
+
+from resources import constants
+
+def write_html_file(page_name, data_to_be_saved, iso_date):
+    for specific_dir in ['latest_html_files_captured', iso_date]:
+        filename = os.path.normpath(
+            './tests/resources/html_files_captured/'
+            + specific_dir
+            + '/'
+            + page_name
+            + '.html')
+        dirpath = os.path.dirname(filename)
+        if not os.path.exists(dirpath):
+            os.makedirs(dirpath)
+        print('New HTML file downloaded in: ', filename)
+        with open(filename, 'w') as file_html:
+            file_html.write(page_html)
+
+
+def access_html_from(url):
+    opener = build_opener()
+    opener.addheaders = [('User-agent', 'Album-Splitter')]
+    page_html = opener.open(url).read()
+    return page_html.decode()
+
+
+def domain_from(url):
+    for domain, url_regex in constants.VALID_URLS.items():
+        pattern = re.compile(url_regex)
+        if pattern.match(url):
+            return domain
+
+iso_date = datetime.datetime.utcnow().replace(microsecond=0).isoformat()
+for d in [constants.AMAZON_URLS, constants.WIKIPEDIA_URLS]:
+    for key, url in d.items():
+        site_name = domain_from(url)
+        if site_name:
+            file_name = site_name + '_' + key
+            page_html = access_html_from(url)
+            write_html_file(file_name, page_html, iso_date)
+
+
+
diff --git a/tests/test_MetaDataProviders/__init__.py b/tests/test_MetaDataProviders/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/test_MetaDataProviders/test_Amazon.py b/tests/test_MetaDataProviders/test_Amazon.py
new file mode 100644
index 0000000..ccbc93f
--- /dev/null
+++ b/tests/test_MetaDataProviders/test_Amazon.py
@@ -0,0 +1,50 @@
+#test_Amazon.py
+#run this command to test all: 'python3 -m unittest discover -s tests -v'
+
+import unittest
+from unittest.mock import patch
+
+from urllib.request import OpenerDirector
+from http.client import HTTPResponse
+
+from MetaDataProviders import Amazon
+from resources import constants
+from resources import fake_response
+
+
+class TestRepo(unittest.TestCase):
+
+    @classmethod
+    def setUpClass(self):
+        self.module_Amazon = Amazon
+
+
+    def test_Amazon_VALID_URL(self):
+        self.assertEqual(self.module_Amazon, Amazon)
+        self.assertEqual(type(self.module_Amazon), type(Amazon))
+        self.assertEqual(self.module_Amazon.VALID_URL, constants.VALID_URLS['amazon'])
+        self.assertEqual(self.module_Amazon.VALID_URL, Amazon.VALID_URL)
+
+
+    def test_Amazon_lookup(self):
+        with_song_table = constants.AMAZON_URLS['with_song_table']
+        without_song_table = constants.AMAZON_URLS['without_song_table']
+        #with_404 = constants.AMAZON_URLS['url_with_404']
+
+        tracks_filename = constants.TRACK_FILENAME
+
+        with patch('http.client.HTTPResponse.read') as mocked_read:
+            with patch('urllib.request.OpenerDirector.open', side_effect = fake_response.fake_requests_get) as mocked_response:
+                self.assertEqual(self.module_Amazon.lookup(with_song_table, tracks_filename), True)
+                self.assertEqual(self.module_Amazon.lookup(without_song_table, tracks_filename), None)
+                #self.assertRaises(HTTPError, self.module_Amazon.lookup(with_404, tracks_filename))
+
+        if constants.DO_ONLINE_TESTS:
+            self.assertEqual(self.module_Amazon.lookup(with_song_table, tracks_filename), True)
+            self.assertEqual(self.module_Amazon.lookup(without_song_table, tracks_filename), None)
+            #self.assertRaises(HTTPError, self.module_Amazon.lookup(with_404, tracks_filename))
+
+
+if __name__ == '__main__':
+    unittest.main()
+
diff --git a/tests/test_MetaDataProviders/test_Wikipedia.py b/tests/test_MetaDataProviders/test_Wikipedia.py
new file mode 100644
index 0000000..0f465c6
--- /dev/null
+++ b/tests/test_MetaDataProviders/test_Wikipedia.py
@@ -0,0 +1,50 @@
+#test_Wikipedia.py
+#run this command to test all: 'python3 -m unittest discover -s tests -v'
+
+import unittest
+from unittest.mock import patch
+
+from urllib.request import OpenerDirector
+from http.client import HTTPResponse
+
+from MetaDataProviders import Wikipedia
+from resources import constants
+from resources import fake_response
+
+
+class TestRepo(unittest.TestCase):
+
+    @classmethod
+    def setUpClass(self):
+        self.module_Wikipedia = Wikipedia
+
+
+    def test_Wikipedia_VALID_URL(self):
+        self.assertEqual(self.module_Wikipedia, Wikipedia)
+        self.assertEqual(type(self.module_Wikipedia), type(Wikipedia))
+        self.assertEqual(self.module_Wikipedia.VALID_URL, constants.VALID_URLS['wikipedia'])
+        self.assertEqual(self.module_Wikipedia.VALID_URL, Wikipedia.VALID_URL)
+
+
+    def test_Wikipedia_lookup(self):
+        with_song_table = constants.WIKIPEDIA_URLS['with_song_table']
+        without_song_table = constants.WIKIPEDIA_URLS['without_song_table']
+        #with_404 = constants.WIKIPEDIA_URLS['url_with_404']
+
+        tracks_filename = constants.TRACK_FILENAME
+
+        with patch('http.client.HTTPResponse.read') as mocked_read:
+            with patch('urllib.request.OpenerDirector.open', side_effect = fake_response.fake_requests_get) as mocked_response:
+                self.assertEqual(self.module_Wikipedia.lookup(with_song_table, tracks_filename), True)
+                self.assertEqual(self.module_Wikipedia.lookup(without_song_table, tracks_filename), None)
+                #self.assertRaises(HTTPError, self.module_Wikipedia.lookup(with_404, tracks_filename))
+
+        if constants.DO_ONLINE_TESTS:
+            self.assertEqual(self.module_Wikipedia.lookup(with_song_table, tracks_filename), True)
+            self.assertEqual(self.module_Wikipedia.lookup(without_song_table, tracks_filename), None)
+            #self.assertRaises(HTTPError, self.module_Wikipedia.lookup(with_404, tracks_filename))
+
+
+if __name__ == '__main__':
+    unittest.main()
+
diff --git a/tests/test_split.py b/tests/test_split.py
new file mode 100644
index 0000000..3f21fae
--- /dev/null
+++ b/tests/test_split.py
@@ -0,0 +1,22 @@
+#test_split.py
+#run this command to test all: 'python3 -m unittest discover -s tests -v'
+
+import unittest
+
+from tests.resources import constants
+
+if constants.UPDATE_HTML_FILES_CAPTURED:
+    from tests.resources import html_file_downloader
+
+class TestSplit(unittest.TestCase):
+
+    @classmethod
+    def setUpClass(self):
+        pass
+
+    #def test_compare_compare(self):
+    #    pass
+
+
+if __name__ == '__main__':
+    unittest.main()
@crisbal
Copy link
Owner

crisbal commented Oct 30, 2017

Hey, thanks for the PR, I will take a look at this when I have some time. It looks good.

Also thanks to the detailed explaination

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants