Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Compatibility with emscripten/WASM environments #667

Open
psychemedia opened this issue Dec 16, 2024 · 26 comments
Open

[ENH] Compatibility with emscripten/WASM environments #667

psychemedia opened this issue Dec 16, 2024 · 26 comments
Labels
enhancement New feature or request external The problem is not caused by FastF1

Comments

@psychemedia
Copy link

psychemedia commented Dec 16, 2024

Proposed new feature or change:

The rapidfuzz package is not currently available as a pure Python wheel, or as an emscripten/WASM targeted wheel.

The rapidfuzz package is listed as a requirement of fastf1 which means that although the fastf1 package can be installed without dependencies, all the other dependencies except rapidfuzz need to be explicitly identified and installed manually.

Trying to import the fastf1 package, or some of the packages inside it, errors in fastf1/__init__.py with the import from fastf1.events import get_session when loading fastf1 in WASM environments such as marimo notebooks, JupyterLite etc.

These environments offer an install free way of getting started with Python, and it would be nice to be able to run fastf1 in them. (The ergast module works fine, eg in marimo, loaded via from fastf1.ergast import Ergast.)

@psychemedia
Copy link
Author

I've raised the availability of a universal or wasm32 build of the rapidfuzz package as an issue on that repo, so this issue may be best resolved upstream.

@theOehrly
Copy link
Owner

@psychemedia I've already noticed that issue in the rapidfuzz repo this morning and I have subscribed to notifications for it. I'm in favour of resolving this upstream as well. For now, I'll keep an eye on the issue there. If necessary, I might offer some help for getting this done.

@theOehrly theOehrly added enhancement New feature or request external The problem is not caused by FastF1 labels Dec 17, 2024
@psychemedia
Copy link
Author

@theOehrly Its seems to have been addressed — the package is being rebuilt under CI at the mo, and looks like it should be uploaded to PyPi soon.

@psychemedia
Copy link
Author

It is now possible to upload and manually install the rapidfuzz dependency to a JupyterLite environment that runs purely in the browser, and then also install fastf1. Loading JSON data seems to work okay, but trying to access jsonStream data fails:

image

@psychemedia
Copy link
Author

psychemedia commented Jan 9, 2025

Ah, a CORS issue.... Using a proxy such as https://corsproxy.io/ fixes it.

For example:

import requests
from jupyterlite_simple_cors_proxy import xurl
url = "https://livetiming.formula1.com/static/2019/2019-09-08_Italian_Grand_Prix/2019-09-08_Race/SessionInfo.jsonStream"

#Proxied as:
# https://corsproxy.io/https%3A//livetiming.formula1.com/static/2019/2019-09-08_Italian_Grand_Prix/2019-09-08_Race/SessionInfo.jsonStream'

r = requests.get(xurl(url))
r.text
image

So is there a simple trick / hack that can be applied to proxy the URL called as part of _api.fetch_page(), presumably?

r = Cache.requests_get(base_url + path + pages[name], headers=headers)

@theOehrly
Copy link
Owner

I did some quick research and I haven't found any way to get around the CORS issue in pyodide without using a CORS proxy. It is easily possible to detect that FastF1 is running in pyodide or similar (https://pyodide.org/en/stable/usage/faq.html#how-to-detect-that-code-is-run-with-pyodide). So we could just silently enable a CORS proxy in the background.
I'm not a huge fan of just sending requests through some kind of proxy without telling users. If this was done at all, it would need to be limited to the livetiming API specifically. I still don't like it a lot, but there aren't a lot of other options or are there?

@psychemedia
Copy link
Author

Handling URL requests is a real pain in emscripten/wasm environments.

If a proxy were to be used I think a warning should be raised to users, eg on detecting emscripten environment, and then requiring them to enable a proxy service, and potentially also specify a service from a list of options. (In mapping libraries, users are often given options for whixh maptile to use, for example.) A more complex offering might allow a user to specify their own proxy somehow.

@theOehrly
Copy link
Owner

To be honest, while it would be cool if this was possible, there aren't that many users interested in this (yet). I'm all for making this work, but I have a bunch of things to fix in FastF1 that are quite a bit more important, in my opinion. I don't see myself working on it this year.
In terms of code complexity, this shouldn't be too difficult. The questions are more about how to implement it properly and do some research on CORS proxies in emscripten/wasm environments or find alternatives. I'm happy to review and support a PR for this if anyone wants to work on it.

@theOehrly theOehrly added the good first issue Good for newcomers label Jan 11, 2025
@theOehrly theOehrly changed the title [ENH] Provide a fallback to using rapidfuzz for WASM environments [ENH] Compatibility with emscripten/WASM environments Jan 11, 2025
@psychemedia
Copy link
Author

Example third party proxies:

Creating your own proxy using cloudflare workers ( ryanking13/cors

@psychemedia
Copy link
Author

psychemedia commented Jan 13, 2025

Out of curiosity, I prompted claude.ai, which came up with the code below.

The following has usage:

import fastf1


# Then enable CORS proxy with debug logging
enable_cors_proxy(
    domains=['api.formula1.com', 'livetiming.formula1.com'],
    debug=True,
#    proxy_url='https://corsproxy.io/'
)

session = fastf1.get_session(2019, 'Bahrain', 'Q')
session.load(telemetry=False, laps=True, weather=False)
# etc

It seems to work with non-cached requests. There is a more general problem with the cached requests in the use of sqlite. I seem to recall sqlite issues in the past, and will take a further look when I get a chance. [UPDATE: we can set fastf1.Cache.enable_cache('/tmp'), and then if we need to persist the cache across sessions, import shutil; shutil.copytree("/tmp", "/drive/fasff1cache") etc., and then copy back to /tmp. (Alternatively, create import os;os.mkdir("/fastf1cache") etc and use that as an ephemeral cache directory.)

import functools
from urllib.parse import urlparse, quote
import requests
import requests_cache
from requests_cache.session import CachedSession
import fastf1
import logging
from typing import List, Optional, Dict, Any
from dataclasses import dataclass

@dataclass
class ProxyConfig:
    """Configuration for the CORS proxy."""
    proxy_url: str = "https://api.allorigins.win/raw?url="
    domains: List[str] = None
    debug: bool = False
    retry_count: int = 3
    timeout: int = 30

class CORSProxyPatcher:
    """Patches FastF1 to handle CORS requests through a proxy service."""
    
    def __init__(self, config: ProxyConfig = None):
        """
        Initialize the CORS proxy patcher for FastF1.
        
        Args:
            config (ProxyConfig): Configuration object for the proxy
        """
        self.config = config or ProxyConfig()
        self.domains = self.config.domains or []
        
        self._setup_logging()
        self._setup_session()
        
    def _setup_logging(self) -> None:
        """Configure logging based on debug setting."""
        self.logger = logging.getLogger('CORSProxyPatcher')
        if self.config.debug:
            logging.basicConfig(
                level=logging.DEBUG,
                format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
            )
    
    def _setup_session(self) -> None:
        """Set up the requests session with retry functionality."""
        self.session = requests.Session()
        retry_strategy = requests.adapters.Retry(
            total=self.config.retry_count,
            backoff_factor=0.5,
            status_forcelist=[500, 502, 503, 504]
        )
        adapter = requests.adapters.HTTPAdapter(max_retries=retry_strategy)
        self.session.mount("http://", adapter)
        self.session.mount("https://", adapter)
    
    def should_proxy(self, url: str) -> bool:
        """
        Check if the URL should be routed through proxy based on domain.
        
        Args:
            url (str): URL to check
            
        Returns:
            bool: True if URL should be proxied
        """
        parsed = urlparse(url)
        should_proxy = any(domain in parsed.netloc for domain in self.domains)
        if self.config.debug:
            self.logger.debug(f"URL: {url} - Should proxy: {should_proxy}")
        return should_proxy
    
    def get_proxied_url(self, url: str) -> str:
        """
        Get the proxied version of the URL if needed.
        
        Args:
            url (str): Original URL
            
        Returns:
            str: Proxied URL if needed, original URL otherwise
        """
        if self.should_proxy(url):
            if 'allorigins' in self.config.proxy_url:
                proxied = f"{self.config.proxy_url}{quote(url, safe='')}"
            else:
                proxied = f"{self.config.proxy_url}{url}"
            if self.config.debug:
                self.logger.debug(f"Original URL: {url}")
                self.logger.debug(f"Proxied URL: {proxied}")
            return proxied
        return url
        
    def modify_headers(self, headers: Optional[Dict[str, str]] = None) -> Dict[str, str]:
        """
        Modify request headers to handle CORS.
        
        Args:
            headers (dict, optional): Original headers
            
        Returns:
            dict: Modified headers
        """
        modified_headers = headers.copy() if headers else {}
        modified_headers.update({
            'Origin': 'null',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Site': 'cross-site',
            'Accept': 'application/json, text/plain, */*',
            'User-Agent': 'Mozilla/5.0 (compatible; FastF1/Python)'
        })
        return modified_headers

    def log_response(self, response: requests.Response, url: str) -> None:
        """
        Log response details for debugging.
        
        Args:
            response (Response): Response object
            url (str): Original URL
        """
        if self.config.debug:
            self.logger.debug(f"\nRequest to: {url}")
            self.logger.debug(f"Status Code: {response.status_code}")
            self.logger.debug(f"Headers: {dict(response.headers)}")
            try:
                self.logger.debug(f"Response Text: {response.text[:500]}...")
            except Exception as e:
                self.logger.debug(f"Couldn't read response text: {e}")

    def make_request(self, method: str, url: str, headers: Optional[Dict[str, str]] = None, 
                    **kwargs: Any) -> requests.Response:
        """
        Make an HTTP request with proper error handling and logging.
        
        Args:
            method (str): HTTP method ('get' or 'post')
            url (str): URL to request
            headers (dict, optional): Request headers
            **kwargs: Additional request parameters
            
        Returns:
            Response: Response object
            
        Raises:
            requests.exceptions.RequestException: If request fails
        """
        proxied_url = self.get_proxied_url(url)
        modified_headers = self.modify_headers(headers)
        kwargs['headers'] = modified_headers
        kwargs['timeout'] = kwargs.get('timeout', self.config.timeout)
        
        try:
            if fastf1.Cache._requests_session_cached and not fastf1.Cache._tmp_disabled:
                session = fastf1.Cache._requests_session_cached
            else:
                session = self.session
                
            response = getattr(session, method)(proxied_url, **kwargs)
            response.raise_for_status()
            
            self.log_response(response, url)
            return response
            
        except requests.exceptions.RequestException as e:
            if self.config.debug:
                self.logger.error(f"Request failed: {str(e)}")
            raise

    def patch_fastf1(self) -> None:
        """Patch FastF1's request methods to use CORS proxy."""
        def wrapped_get(cls, url: str, headers: Optional[Dict[str, str]] = None, **kwargs: Any) -> requests.Response:
            return self.make_request('get', url, headers, **kwargs)
            
        def wrapped_post(cls, url: str, headers: Optional[Dict[str, str]] = None, **kwargs: Any) -> requests.Response:
            return self.make_request('post', url, headers, **kwargs)
            
        fastf1.Cache.requests_get = classmethod(wrapped_get)
        fastf1.Cache.requests_post = classmethod(wrapped_post)

def enable_cors_proxy(
    domains: List[str],
    proxy_url: Optional[str] = None,
    debug: bool = False,
    retry_count: int = 3,
    timeout: int = 30
) -> CORSProxyPatcher:
    """
    Enable CORS proxy support for FastF1.
    
    Args:
        domains (list): List of domains to route through the proxy
        proxy_url (str, optional): Base URL of the CORS proxy service
        debug (bool): Enable debug logging
        retry_count (int): Number of retry attempts for failed requests
        timeout (int): Request timeout in seconds
        
    Returns:
        CORSProxyPatcher: Configured proxy patcher instance
    """
    config = ProxyConfig(
        proxy_url=proxy_url or "https://api.allorigins.win/raw?url=",
        domains=domains,
        debug=debug,
        retry_count=retry_count,
        timeout=timeout
    )
    
    patcher = CORSProxyPatcher(config)
    patcher.patch_fastf1()
    
    return patcher

@psychemedia
Copy link
Author

psychemedia commented Jan 13, 2025

@theOehrly Trying to use fastf1 with the xeus-python jupyterlite kernel, the package installation fails because websockets (from signalr_aio) is required but is not available. Is the signalr_aio package just required for live timing? Could it (i.e. websockets) be relaxed as a requirement?

UPDATE: I can workaround this by installing from the wheel directly and installing other requirements manually (except rapidfuzz, which is waiting on being added to emscripten-forge). ANOTHER UPDATE: this may just have been an issue with me trying to install from conda-forge/emscripten-forge; there is an any wheel for websockets.

@psychemedia
Copy link
Author

Demo of fastf1 running in JupyterLite/pyodide kernel: https://github.com/f1datajunkie/jupyterlite-fastf1/blob/main/README.md

@theOehrly
Copy link
Owner

The demo is surprisingly performant. I somehow expected this to run a lot slower.

Yes, websockets is only used for livetiming. This could in theory be relaxed. I think it would be possible to add a platform specific exclusion in the dependency specification. But it seems like we could only exclude wasm/emscripten as a whole, not specifically just the xeus kernel. So this isn't really a good approach.

I see two options for going forward with this.

  1. Implement this into FastF1. In that case, the code above is a good starting point but would need some modifications before I'm happy to merge. Among other things, use FastF1's logger and don't patch because we can of course tie it in directly.
  2. Add a guide for installing and using FastF1 in Pyodide environments to the documentation. Link to this guide from the installation section of the readme and the docs. You keep the patcher as a separate package.

I'm honestly open to both solutions.

@psychemedia
Copy link
Author

I tend to work in quite an ill-disciplined way , iterating and breaking things in use as I try to actually use them. I hope to try quite a few things out in JupyterLite with fastf1 over the next few months, which may well gereralise and shake things out a bit more. The xeus-kernel should start to work as packages hopefully become available in emscripten-forge, and I also want to try f1dataR in an R kernel; doing those two experiments may raise further issues here. It would also be nice to be able to demo fastf1 in marimo notebooks (I'm not sure what package repos that can install from). The live data suppport needs thinking about differently (maybe finding a way to log the data into pglite in one notebook and then reading it from another? (I have some separate jupyter_anywidget_pglite experiments but not looked at trying reads/writes and live updates across different notebooks yet.)

The above proxy package has the feel it could be split out as a generic utility like requests-cache, and/or around requests-cache, with the fastf1 dependency; and perhaps also iterated separately using the fastf1 logger for a tighter binding/integration here. But I think I need to play with it a bit more to see if there are issues: a) in use; b) in generalising for use with other kernels.

I'm happy to try to make useful demos/landing pages/content that can be reused here as docs, but it will all bit a bit shakedown as I try explore what works, where and how...

@theOehrly
Copy link
Owner

Ok, so from what I understand from your comment, you aren't really in favour of any of the two options I suggested?

I think you have gathered some valuable information here already. And I think it would be great to make that available to other users as guidance. People will not easily find this discussion here. You already have the demo repository that you shared above. What about putting some of the important details in the readme there? Then I could refer people there for more information. That is of course if you'd want to do that.

What would be your suggestion else? The current state is obviously not great for users.

Regarding livetiming recording, few people use that anyway. I don't think it's a huge issue to tell people that this doesn't work in emscripten environments for now.

@psychemedia
Copy link
Author

I'm happy to try to add some more docs on my repo, and then also add some here (I was just wary of adding them here at the moment because I tend to iterate my own experiments quite quickly, and often create breaking changes as I do so). My spare playtime is also being spend on Dakar doodles at the moment, but as soon as that is done, and when F1 testing starts, I'll hopefully have more playtime to spend on the fastf1 sketches.

@psychemedia
Copy link
Author

I updated some docs after a fashion on the repo https://github.com/f1datajunkie/jupyterlite-fastf1 and as a Github Pages site https://f1datajunkie.github.io/jupyterlite-fastf1/book/

@theOehrly
Copy link
Owner

Looks good, I think that's pretty helpful for people who are trying to get started with FastF1 in pyodide environments.

Guide to using this reposistory available as a Jupyter Book / ebook here.

The link to the "book" in the readme isn't working

You're OK with me referring to this from the installation section of FastF1?

When you experiment more with FastF1 on pyodide and figure out more stuff, I'd appreciate it if you share them here :) Looks like you're currently the person that has the best overview over that.

@psychemedia
Copy link
Author

Ah - thanks - linked fixed, as here.

Re: linking to that - yes, of course, please do. I apologise for not offering to contribute more directly here, but I think I could be rather disruptive in terms of my "informal" hack-it-and-see approach to development!

Re: further pyodide / wasm explorations, for sure I'll keep this thread posted as and when other additional upstream changes make more things possible.

@theOehrly
Copy link
Owner

@psychemedia I totally understand. Also, you're still providing valuable information with your "hack-it-and-see" approach :)

I've added a short section about pyodide/wasm/... to the installation section and linked to your repo now. Thanks a lot for that.

Maybe at some point (and maybe once the ecosystem has matured some more), me or someone else can take that information and integrate it more directly into FastF1 and/or its documentation. But it's very valuable if someone has already done the research to figure out what is and isn't possible.

@theOehrly theOehrly removed the good first issue Good for newcomers label Jan 17, 2025
@psychemedia
Copy link
Author

I'm happy to try to iterate closer to production fastf1 code but generality is of interest to me so I think watching what happens in the reqs for xeus kernel over next couple of months would be sensible.

I've also been wondering about cacheing strategies. Is there a reason you save files to disk rather than putting everything iinto eg sqlite? Also, is pickle guaranteed to pickle the same way for different py versions? Things like parquet are interesting, i think, in terms of queryability as well as remote queries using thinga like duckdb..

@theOehrly
Copy link
Owner

The single sqlite file is created by requests-cache. It's their default cache backend and I don't want to mess with that really.
Saving the other data to files was done for convenience and simplicity. It's quick to implement and I don't need to provide special functionality for clearing parts of the cache. People can just delete files.

To be fair, I have no idea what pickle guarantees. Apart from that, updating pandas can make the cache loading fail. Therefore, pickle.load has exception handling that invalidates the specific cache file, reloads the data and overwrites the old cache in case of an error. The same mechanism would make pickle-related problems fail gracefully, too.

It's something I'd probably implement differently today. But also, it's working without issues at the moment.

@psychemedia
Copy link
Author

Would optional support of different cache backend be of interest? I am interested in guaranteed reproducibility and my gut (historical) feeling is that pickle is not guaranteed not to change. Would an option to set the cache backend be of interest?

@theOehrly
Copy link
Owner

This is probably getting a bit out of scope for this specific issue.

So first, the current pickle-based cache hasn't been a problem for the last approx. 3 years, I think. I can't remember any real issues related to it.
Second, the original cache implementation has already been extended with various additional features and gotten messy.

In general, I am not against different cache backends. But I feel like implementing this is going to be the point were we might need to consider rewriting the cache implementation completely. And I'm not sure if we can do this while keeping backwards compatibility. So that might end up being FastF1 v4.0 then. In any case, it'll be a bit of work.

@psychemedia
Copy link
Author

psychemedia commented Jan 22, 2025

Latest update: I have an example of using fastf1 in a pyodide environment using Python Shinylive (py-shinylive).

Demo here: https://f1datajunkie.github.io/jupyterlite-fastf1/shinylive/app1

Code here: https://github.com/f1datajunkie/jupyterlite-fastf1/tree/main/shinyapp

Blog post: https://blog.ouseful.info/2025/01/22/tinkering-with-in-browser-shinylive-python-pyodide-dashboards/

@psychemedia
Copy link
Author

Using fastf1 in marimo

If we load rapidfuzz in separately, we can run fastf1 in marimo notebooks. Example notebook.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external The problem is not caused by FastF1
Projects
None yet
Development

No branches or pull requests

2 participants