Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import notes from Google Keep using takeout archive #4533

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gudvinr
Copy link

@gudvinr gudvinr commented Mar 17, 2025

This PR adds importer infrastructure to automatically add notes from external sources.

Currently, only Google Takeout archives for Google Keep are recognized but other sources can be easily added as long as they provide machine-readable output.

There's also an ImportService that provides an API for clients to use.
Only field available is binary content field which (at the moment) can only have a zip archive directly from takeout.

If any other source ever added, file name and/or mime type could be added to request.

No input source selection is needed because there are likely no note taking apps in the world that can export your data and export format is similar enough but incompatible.

Only server-side is implemented. I don't understand the structure of the web ui source code so I didn't even try to implement import UI.
Luckily, there only should be a file selector of some sort which then should call ImportMemos shoving content of the file to the server.

@gudvinr gudvinr requested a review from boojack as a code owner March 17, 2025 12:56
Copy link
Contributor

@ehsundar ehsundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up, somehow feels like it can be handled completely client side. I don't think simple RPCs are adequate here and we need to use grpc streams. Adds extra complications and also requires scheduling etc. Honestly I think parsing the export files in the web app and call apis accordingly is the best option.

wss, err := s.Store.GetWorkspaceStorageSetting(ctx)
if err != nil {
slog.ErrorContext(ctx, "failed to get workspace storage setting", "err", err)
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not acceptable as the returned error would not relevant to the issue at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the returned error would not relevant to the issue at all

That's why there's also a log message.

If you see what's inside GetWorkspaceStorageSetting, it becomes apparent that errors from there are basically impossible to get here. It's either database error or broken config data. Both will make app fail earlier than you can even call this method. And both can not be fixed from the client anyway.

So it's merely a safeguard for the case if user set upload limit less than default MaxUploadBufferSizeBytes.

}
}

message ImportResult {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message ImportResult {}
message ImportMemosResponse {
int32 count = 1;
}

}

func (imp *Importer) Convert(ctx context.Context, content []byte) (*ImportResult, error) {
for _, converter := range converters {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be avoided by taking an enum input

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't and there's a note in PR description:

No input source selection is needed because there are likely no note taking apps in the world that can export your data and export format is similar enough but incompatible.

Import source selector is useless here. Even if you have one, you still need to try and hence validate said import package. But from the user perspective it's an additional step that adds nothing.

With this approach you only need a file selector and that's it.

To make it easier to filter out converters, mime type and file name can be passed from the frontend but I already noted that.

type memoConverter func(context.Context, []byte) (*ImportResult, error)

var converters = []struct {
name string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is js style. Might be better if you use an interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use JS. And even if it's why is it a bad thing?

Interface doesn't bring anything here because converter should be stateless thus every converter will look like this just for the sake of using interfaces:

type xxxConverter struct{}

func (xxxConverter) Name() string
func (xxxConverter) Convert()

function gives you stateless guarantees for free (unless you use global variables) and with concrete interface implementations you should always be aware that you can't store anything inside struct{}.

name isn't really needed here, it's just to make errors more useful. And I didn't make map because ordering here is important if other converters are ever added.

return fs.WalkDir(s.dir, ".", s.scan)
}

func (s *keepScanner) scan(fp string, d fs.DirEntry, err error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like these kind of logic can be a separate package in /usememos. There's not any good note export parser out there and this could be an advantage. We may develop it to also cover Evernote, OneNote, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean not in /server/ but in /? I honestly don't care and if it is important it's not hard to move.

@gudvinr
Copy link
Author

gudvinr commented Mar 18, 2025

To sum up, somehow feels like it can be handled completely client side.

For takeout or another json export yes, it is doable. But what will you do if you need to import some weird obscure format? It's easier to do on the server.

E.g. evernote uses XML which already sucks ass to work with in JS.
OneNote uses proprietary onepkg archive with one files representing notes.

And there's moememos which doesn't even use frontend from memos. Thus it will either be impossible to import notes when you use mobile client or they need to implement importer one more time. Same for each other client.

I don't think simple RPCs are adequate here

Doesn't stop resources to be uploaded this way.

and we need to use grpc streams.

Not really. gRPC stream is for chunked data and each import (at least for takeout) is a singular package.
For cases where exports are bunch of jsons it might be better, here I agree.
However, changes required to support this case if needed are minimal.
If you know how other services export data, you can share your findings and we can adjust API to accommodate this.

Honestly I think parsing the export files in the web app and call apis accordingly is the best option.

Good luck making these imports atomic. Current implementation reuses existing store methods which do not carry any transaction information but it is possible to push everything in one go.

With client side imports if you fail midway, you'll end up with inconsistent state.
Sure you can workaround that with whatever marks or using special UIDs but it's easier (and more betterer) to avoid the issue altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants