Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance when opening files or invoking functions #39

Open
nadavspi opened this issue Dec 5, 2022 · 21 comments
Open

Slow performance when opening files or invoking functions #39

nadavspi opened this issue Dec 5, 2022 · 21 comments

Comments

@nadavspi
Copy link

nadavspi commented Dec 5, 2022

Love the idea of this package!

I have about 5000 files and I'm seeing a delay of a few seconds whenever I open a file in my Obsidian path, or callobsidian-jump, obsidian-backlink-jump, etc. I'm using a pretty much out of box Doom config. Files within my obsidian directory open quickly when obsidian.el isn't configured.

Is this expected? Anything I can do to help debug?

@licht1stein
Copy link
Owner

licht1stein commented Dec 5, 2022 via email

@nadavspi
Copy link
Author

nadavspi commented Dec 5, 2022

I figured as much after looking through the code 😀

@licht1stein
Copy link
Owner

licht1stein commented Dec 5, 2022 via email

@nadavspi
Copy link
Author

nadavspi commented Dec 5, 2022

I'm not sure my lisp chops will allow for that, but I may try.

@licht1stein
Copy link
Owner

licht1stein commented Dec 5, 2022 via email

@DrBanana419
Copy link

Did this issue get fixed for anyone? Jumping to notes is still slow for me.

@licht1stein
Copy link
Owner

Did this issue get fixed for anyone? Jumping to notes is still slow for me.

yes

@jayemar
Copy link
Collaborator

jayemar commented Jan 30, 2024

Did this issue get fixed for anyone? Jumping to notes is still slow for me.

In my own branch I've commented out the (obsidian-update) call inside the obsidian-jump function. This greatly increases how quickly I can jump between notes, but obviously sacrifices some of the guarantees of updated tags and file cache.

@smallstepman
Copy link

im also experiencing slowness

In my own branch I've commented out the (obsidian-update) call inside the obsidian-jump function. This greatly increases how quickly I can jump between notes, but obviously sacrifices some of the guarantees of updated tags and file cache.

@licht1stein would be nice to have an option to disable this in the config

@licht1stein
Copy link
Owner

Sure, I will look into it when I have time, if somebody else doesn't implement it

@fstamour
Copy link

I did a bunch of tests today. For my use case, I managed to make things 1000 times faster.

I've made a very very low quality commit of the things I've tried here.

Here's a summary:

  • avoid building big intermediary lists (e.g. pass the predicates directly to directory-files-recursively)
  • try to make obsidian-file-p as fast as possible
    • run faster, simpler predicates earlier and the heaviest later
    • check if it ends in .md early
    • check if path is inside .git or node_modules early
    • avoid obsidian-descendant-of-p
  • listing files using projectile-dir-files-alien instead of directory-files-recursively (This wouldn't be generally applicable, as it assumes that projectile is installed and that we're currently in a project.)
  • reduce the number of with-temp-buffer, this one I discovered using the profiler

Here's some utilities that I used over and over to help be optimize stuff:

(defmacro fsta/comment (&rest body)
  "Comment BODY."
  nil)

(defmacro fsta/measure-time (&rest body)
  "Measure the time it takes to evaluate BODY."
  `(let ((time (current-time)))  ;; TODO Time should be gensym'd
     (prog1
         (progn ,@body)
       (message "%.06fs" (float-time (time-since time))))))

(defun fsta/profile (fn)
  (unwind-protect
      (progn
        (profiler-start 'cpu+mem)
        (funcall fn)
        (profiler-report))
    (profiler-stop)))

Here are the functions that takes the most time:

  • obsidian-file-p: check if a path is a markdown file, it's not slow per-se, but it is called repeatedly
  • obsidian-reset-cache: it lists all the files)
  • obsidian-update-tags-list: it reads all files and looks for tags
  • obsidian--update-all-from-front-matter: it reads all the files and parse their "front matter"

Here are some other ideas that I didn't have time to test:

  • use a hash-table instead of a list for caching the list of files
  • use one timestamp per file (easy to do once you use a hash-table)
    • if the entry is expired, check the file's modification time to decide whether to re-read it or not
  • use emacs' file watching facilities to update the cache as soon as a file is changed
  • use hooks (like after-save-hook) to update a file's entry in the cache as soon as it's saved
  • in obsidian-file-p: if a file is already in the cache, then it means it already passed the predicate, we don't need to check every conditions again
  • extracting tags and extracting the front matter should be done at the same time, currently (AFAIR) every files are read twice when calling (obsidian-update)

(I might make a PR, if I have time, no promise 😅)

@licht1stein
Copy link
Owner

Wow, this is great work! I would be very grateful for the PR, with exception of the projectile-related part. I also think that hash-tables would be a better way to handle things.

@fstamour
Copy link

Notes for myself:

@licht1stein
Copy link
Owner

Notes for myself:

That would actually be a great addition to readme

@fstamour
Copy link

It would have been funny if it was already there, because I didn't even look at the readme 😆

@Dima-369
Copy link

I find the lag on (obsidian-jump) particularly annoying.

Here is my approach (which org-roam also uses) to soften this a bit, so at least jumping to existing notes is quick.

(defvar dima-obsidian-vault-path "~/Documents/Obsidian Vault/"
  "The directory to the Obsidian Vault.")

(defun dima-obsidian-update-after-save-hook ()
  (when (s-starts-with-p default-directory (f-expand dima-obsidian-vault-path))
    (message "Updating Obsidian Vault...")
    (obsidian-update)))

(add-hook 'after-save-hook #'dima-obsidian-update-after-save-hook)

Then create a custom (obsidian-jump) function like this.

Mine also invokes (obsidian-capture) when no match is found, which also mimics how org-roam does its find note.

(defun dima-obsidian-jump ()
    "Jump to Obsidian note.

Patches:
- remove slow call to `obsidian-update'
- when no target is there, create a new note"
    (interactive)
    (let ((gc-cons-threshold most-positive-fixnum))
      (let* ((files (obsidian-list-all-files))
             (dict (make-hash-table :test 'equal))
             (_ (-map (lambda (f) (puthash (file-relative-name f obsidian-directory) f dict)) files))
             (choices (-sort #'string< (-distinct (-concat (obsidian--all-aliases) (hash-table-keys dict)))))
             (choice (completing-read "Jump to: " choices))
             (target (obsidian--get-alias choice (gethash choice dict))))
        (if target
            (find-file target)
          ;; inline `obsidian-capture' without title variable
          (let* ((filename
                  (s-concat obsidian-directory "/" obsidian-inbox-directory "/" choice ".md"))
                 (clean-filename (s-replace "//" "/" filename)))
            (find-file (expand-file-name clean-filename) t)
            (save-buffer)
            (add-to-list 'obsidian-files-cache clean-filename))))))

Still, (obsidian-update) seems annoyingly slow to me with just 1500 Markdown files, so I will check the notes of @fstamour.

@jayemar
Copy link
Collaborator

jayemar commented Jul 19, 2024

I've put together a prototype branch to experiment with trying to improve performance using some of the ideas from this thread. The main change is that the files cache is now a hashtable instead of a list, with the values being a metadata hash table holding the tags, aliases, and links.

{<filepath> : {'tags: (list of tags associated with file)
               'aliases: (list of aliases associated with file)
               'links: {<linked-file-name: (response from markdown-link-at-pos)}}}

This simple change ended up leading to many other changes as it could then made sense to use the information in the cache for many other functions. If you do a diff you'll see that I've changed a lot. It's definitely more than is necessary to change as much of this was just me renaming things, removing things, and moving things around to help me better understand how things works. This is definitely not a PR, I just made it a WIP PR to make it easier to view and see the diffs to help get feedback.

This hashtable cache is populated once, and then an 'after-save-hook' is added to repopulate things for the saved file to keep things up to date. I think this should work well for files modified within obisidan.el, but it won't pick up changes to files modified by other processes. The idea I implemented is to have a timer running that periodically checks the list of cached files against the list of files on disk, and then removes the files that no longer exists and adds any new ones. This should work for files added/deleted by other process, but not files modified by other processes. So I don't love this solution.

I did look at emacs's file-notify-add-watch, and I think it could work. But it doesn't recursively watch directories, so a new watcher would need to be created for each directory, and I wasn't sure if having a watcher per directory in a directory-heavy vault would issues. Probably something to test.

I've added a few more tests to help make sure I wasn't totally breaking things, but I'd definitely like to have some more that test more of the tags and alias functionality.

One things I'd like to explore more is whether it makes more sense to have a single data structure as the source of data for all functions, or whether parallel function-specific data structures should be maintained that are kept in sync with the files cache.

For example, we currently use a separate aliases hashtable that is used for alias-related functions. I did implement the necessary changes so that this hashtable is updated when the files cache is updated. But this does add additional complexity when it may be find to only generate the aliases hashtable when it's needed.

Similarly, we could have a tags data structure and a backlinks data structure. That would make retrieving this information, but it would certainly complicate the code with more happening on every update. A tags hashtable is used when jumping to a tag, and this is generated at the time of the call to find a tag. It's a simpler implementation at the cost of more computation happening at the time of a function call.

@jayemar
Copy link
Collaborator

jayemar commented Oct 7, 2024

I've been using that branch referenced above for the past few months and it's been working really well, so I cleaned up the code a bit and submitted it as PR #100. There are a lot of changes and I've been the only one banging on it, so I'd appreciate any feedback other may have.

It's very responsive and has even allowed me to create a backlinks panel to immediately show all of the backlinks when I open a new note file, although I didn't include that in the PR as I was trying to keep it as minimal as possible.

@fstamour
Copy link

163 commits 😮

@fstamour
Copy link

Any chances you could split that in multiple (smaller) PRs? For example, keep the backlink panel in a separate PR, to keep thinks reviewable

@jayemar
Copy link
Collaborator

jayemar commented Dec 13, 2024

@fstamour from talking with @licht1stein back in October he advised pulling in the backlinks feature and making it a branch for a 2.0 release. You can see the conversation over in PR #100 (you'll need to expand the view to see the comments because, yeah, lots of commits). So it's kind of a beast of a PR, but it is now ready to be pulled in, pending review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants