Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metacello re-fetches baselines even if they were fetched before #8161

Open
chisandrei opened this issue Dec 22, 2020 · 1 comment
Open

Metacello re-fetches baselines even if they were fetched before #8161

chisandrei opened this issue Dec 22, 2020 · 1 comment

Comments

@chisandrei
Copy link
Contributor

Moved from Metacello/metacello#539

Happens in Pharo 8, but seems the same in Pharo 9.


Hello 👋

Recently, we decided to refactor baselines in a lower part of our decently sized project. That refactoring included splitting baselines that declare a lot of packages into multiple baselines that clearly specify dependency of a smaller set of packages by referencing other baselines from other repositories. This resulted in a significantly increased loading times, in particularly the fetch step that creates a linear list of loading directives. Upon closer inspection it turned out that dependent baselines are analysed over and over again even if metacello supposedly visited them already.

Project structure

To simplify the debugging process we recreated our project structure in a playground organization https://github.com/bugginrack.

In that project we have a bunch of libraries (https://github.com/bugginrack/MyLibrary) with the following baseline dependencies:
Dependencies-MyLybraryD

On top of that there is a framework (https://github.com/bugginrack/MyFramework):
Dependencies-MyFramework

Next, we have a few projects (https://github.com/bugginrack/MyProject), some of them depend on each other (A, B and C):
Dependencies-MyProject

Code size independency

Our thesis is that the size of the code-base does not have a significant influence on the fetching performance. To prove that we loaded the same baseline structure with a large (generated) code base ( video ) and without any code (video).

It took ~46s to finish the fetching phase for a project with a signicant amount of code:
FetchingEnd-GitHub-WithCode

and the same ~47s for a project without code:
FetchingEnd-GitHub-WithoutCode

Connection independency

To prove that it is connection independent we did the same experiment while loading code locally.
With code (video):

image

Without code (video):

FetchingEnd-Local-WithoutCode
image

The problem

The issue is that doubling the amount of same-level projects doubles the time it takes to fetch, while increasing the dependency depth exponentially increases the fetching time.
For our real system the loading times exceeded 2 hours.

Solution

The intermediate solution is of course to uglify, flatten and merge the baselines reducing the amount of interconnections to the minimum.

Q: Would it be possible to improve Metacello baseline fetching to skip already processed baselines?

Thank you!

@chisandrei
Copy link
Contributor Author

The linear load directive for Metacello visits every baseline every time. For example in a load directive for BaselineOfMyLibraryD , BaselineOfMyLibraryCore is visited three times.

Smalltalk globals
	at: #MyRepository
	put: 'github://bugginrack'.

recorder := Metacello new
	baseline: 'MyLibraryD';
	repository: MyRepository,'/MyLibrary/src';
	record.

recorder roots first printString
linear load : 
	linear load : baseline [BaselineOfMyLibraryD]
		load : BaselineOfMyLibraryB
	linear load : baseline [BaselineOfMyLibraryD]
		load : BaselineOfMyLibraryC
	linear load : baseline [BaselineOfMyLibraryD]
		load : BaselineOfMyLibraryCore
	linear load : baseline [BaselineOfMyLibraryD]
		explicit load : MyLibraryCore
			load : BaselineOfMyLibraryCore-CompatibleUserName.1608377720
		linear load : baseline [BaselineOfMyLibraryCore]
			load : MyLibrary-Core
			load : MyLibrary-Core-Extra
		explicit load : MyLibraryB
			load : BaselineOfMyLibraryB-CompatibleUserName.1608377720
		linear load : baseline [BaselineOfMyLibraryB]
			load : BaselineOfMyLibraryCore
		linear load : baseline [BaselineOfMyLibraryB]
			linear load : baseline [BaselineOfMyLibraryCore]
				load : MyLibrary-Core
				load : MyLibrary-Core-Extra
			load : MyLibrary-B
			load : MyLibrary-B-Extra
		explicit load : MyLibraryC
			load : BaselineOfMyLibraryC-CompatibleUserName.1608377720
		linear load : baseline [BaselineOfMyLibraryC]
			load : BaselineOfMyLibraryA
		linear load : baseline [BaselineOfMyLibraryC]
			load : BaselineOfMyLibraryCore
		linear load : baseline [BaselineOfMyLibraryC]
			linear load : baseline [BaselineOfMyLibraryCore]
				load : MyLibrary-Core
				load : MyLibrary-Core-Extra
			explicit load : MyLibraryA
				load : BaselineOfMyLibraryA-CompatibleUserName.1608377720
			linear load : baseline [BaselineOfMyLibraryA]
				load : BaselineOfMyLibraryCore
			linear load : baseline [BaselineOfMyLibraryA]
				load : MyLibrary-A
			load : MyLibrary-C
		load : MyLibrary-D

With some logging in MetacelloMCVersion>>#executeLoadFromArray: , the method is called 4 times to process BaselineOfMyLibraryCore.

executeLoadFromArray: 
baseline [BaselineOfMyLibraryD]
a Set('MyLibrary-D' 'MyLibraryB' 'MyLibraryCore' 'MyLibraryC')

executeLoadFromArray: 
baseline [BaselineOfMyLibraryCore]
a Set('MyLibrary-Core' 'MyLibrary-Core-Extra')

executeLoadFromArray: 
baseline [BaselineOfMyLibraryB]
a Set('MyLibrary-B-Extra' 'MyLibraryCore' 'MyLibrary-B')

executeLoadFromArray: 
<>baseline [BaselineOfMyLibraryCore]
a Set('MyLibrary-Core' 'MyLibrary-Core-Extra')

executeLoadFromArray: 
baseline [BaselineOfMyLibraryC]
a Set('MyLibrary-C' 'MyLibraryA' 'MyLibraryCore')

executeLoadFromArray: 
<>baseline [BaselineOfMyLibraryCore]
a Set('MyLibrary-Core' 'MyLibrary-Core-Extra')

executeLoadFromArray: 
baseline [BaselineOfMyLibraryA]
a Set('MyLibrary-A' 'MyLibraryCore')

executeLoadFromArray: 
<>baseline [BaselineOfMyLibraryCore]
a Set('MyLibrary-Core' 'MyLibrary-Core-Extra')

MetacelloMCVersion>>#loadVersion: introduces a cache but that one is cleared after every load of a baseline in MetacelloMCProjectSpec>>#ensureLoadedForDevelopmentUsing: .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant