Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window Size Configuration #27

Open
echeipesh opened this issue Nov 20, 2015 · 3 comments
Open

Window Size Configuration #27

echeipesh opened this issue Nov 20, 2015 · 3 comments
Assignees

Comments

@echeipesh
Copy link
Contributor

There are a number of parameters that modellab-geoprocessing uses that has some impact for performance for a typical case. These should be configurable so they can be tuned in deployment.

Default values of these should be placed in reference.conf, so they can be overwritten by application.conf in the application directory. Consult for reference:http://doc.akka.io/docs/akka/snapshot/general/configuration.html

It may be nice if there was a fallback to ENV variables if application.conf is not present, @hectcastro may have input on this.

akka-config should be used to read as in here: https://github.com/azavea/modellab-geoprocessing/blob/develop/src/main/scala/com/azavea/modellab/Instrumented.scala#L14
Structure of utility class may need to change.

Here are the parameters that need to be tuned:

Window Size:
This is the size of the chunks of base layers that that get loaded from S3. The larger the chunk the slower initial load, but also the more efficient the operations themselves:
https://github.com/azavea/modellab-geoprocessing/blob/develop/src/main/scala/com/azavea/modellab/LayerRegistry.scala#L16

Operation Size:
The size in which the operations are being evaluated. Each window results to call to above window. This should be smaller than the window size but not sure by how much:
https://github.com/azavea/modellab-geoprocessing/blob/develop/src/main/scala/com/azavea/modellab/LayerRegistry.scala#L18

NOTE: We measures these in storage tiles, which are 512

Tile Cache:
If a local path is given the S3 catalog should use Tile cache: https://github.com/azavea/modellab-geoprocessing/blob/develop/src/main/scala/com/azavea/modellab/Catalog.scala#L35

This will matter only between service restarts that do not reboot the machine, since these files are going into temp folder. The window reader caches things in memory, caching tiles to disk would be useful when the memory gets bumped or service has restarted. @caseypt and @hectcastro should +1 this feature. I am not totally sure if this is practically useful in current deployment.

@echeipesh echeipesh changed the title Performance Configuration Window Size Configuration Nov 20, 2015
@hectcastro
Copy link
Contributor

It may be nice if there was a fallback to ENV variables if application.conf is not present, @hectcastro may have input on this.

I think this is a nice-to-have that isn't necessary. The only scenario where it becomes necessary is if we have a value that needs to change across more than one AWS environment. Right now we're only in one, so skipping this feature should be fine.

Tile Cache:
If a local path is given the S3 catalog should use Tile cache: https://github.com/azavea/modellab-geoprocessing/blob/develop/src/main/scala/com/azavea/modellab/Catalog.scala#L35

Is there anything specific about the user in these requests? For example, if two users provide the same inputs to this service, is it OK for them to get the same outputs? Mostly asking because I'm trying to determine what layers of caching are acceptable. If different users can share a cache, I think we might be able to borrow Nginx's caching capabilities, which have lots of tunables.

@caseycesari
Copy link
Contributor

Is there anything specific about the user in these requests?

Not inherently so. There may be something we can encounter as multiple people are using the site, but the requests don't have any user information, and if two users generate the same requests, they should get the same results. Having one user cache it for the other would be helpful in this case.

@echeipesh
Copy link
Contributor Author

confirm no user specific state on HTTP level, so Nginx is totally an option.

S3 cache on the file system would help only in the very specific case where the service restarts without blowing away the filesystem. It would provide warm start for operations that have not been issued yet, so can't be covered by caching the TMS tiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants