Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRSA_ibe list_mission method not working due to an HTML parsing error of some kind. #1423

Closed
odysseus9672 opened this issue Apr 24, 2019 · 6 comments

Comments

@odysseus9672
Copy link
Contributor

I'm trying to use the ibe module of astroquery. As a first step, I wanted to go through the list_missions, list_datasets, and list_tables methods to get an idea for what has been implemented. When I call the list_missions method, though, I get nonsense.

In [1]: from astroquery.ibe import IbeClass 
In [2]: Irsa = IbeClass()
In [3]: Irsa.list_missions()
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 5 of the file /opt/local/bin/ipython-3.6. To get rid of this warning, change code that looks like this:

 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml")

  markup_type=markup_type))
Out[3]: 
['',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '']

I've manually constructed the URL and checked in my browser and using astroquery's BaseQuery._request and can confirm that my system is recieving HTML of some kind, but there seems to be something going wrong in these lines of the list_missions method:

root = BeautifulSoup(response.text)
links = root.findAll('a')
missions = [os.path.basename(a.attrs['href']) for a in links]

System information: Macbook Pro, MacOS X 10.14.4, using python 3.6 installed via MacPorts (version 3.6.8), astropy installed using the MacPorts package py36-astropy (version 3.1), astroquery installed using the MacPorts pip-3.6 (pip version 19.0.3, astroquery version 0.3.9), and the Beautiful Soup is version 4.6.0 (looks like it was installed by pip). I have no idea what version of lxml I'm using nor what it was installed by.

@odysseus9672
Copy link
Contributor Author

odysseus9672 commented Apr 24, 2019

Upon further experimentation, the problem appears to be the use of os.path.basename. Because the href attrs ends with a / it is returning an empty string. In my test code modifying line 279 to read

 missions = [ os.path.basename(a.attrs['href'].rstrip('/')) for a in links] 

produces non-trivial results. The nice thing about doing it this way is that if the string doesn't end in / this will still work.

@keflavich
Copy link
Contributor

This issue is real, and fixable:

            root = BeautifulSoup(response.text, 'html5')
            links = root.findAll('a')
            splitattrs = [a.attrs['href'].split("/") for a in links]
            missions = [entry[entry.index('search')+1] for entry in splitattrs]

can get the mission list at least. I'll dig forward and see if you already implemented this...

@odysseus9672
Copy link
Contributor Author

odysseus9672 commented Jan 3, 2021

I'm confused, @keflavich . Did the old fix in #1424 stop working?

@keflavich
Copy link
Contributor

Apparently that issue was... approved but not merged? It looks like I approved it, then you closed it? That change was never incorporated, though:
https://github.com/astropy/astroquery/blob/master/astroquery/ibe/core.py#L279
https://github.com/astropy/astroquery/blame/dc9bbe232f2a0ebf71ed6d2a1673a871da22fb73/astroquery/ibe/core.py#L279

@odysseus9672
Copy link
Contributor Author

odysseus9672 commented Jan 3, 2021

Ah, I see. I can't reopen the old fix now, but it was exceedingly simple. It was a one line change, actually. Since I can't just reopen the old issue, I'll submit a new pull request.

New pull request at: #1923

It's also possible that this was lost in the shuffle when discussing deleting unused code in pull request #1430 ?

@keflavich
Copy link
Contributor

Re-solved in #1923, so I'm closing this one. But indeed, #1430 kinda ate the original.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants