Skip to content

Implemented draft of RMMR algorithm. #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open

Conversation

RamSaw
Copy link
Contributor

@RamSaw RamSaw commented Mar 28, 2018

First draft of an RMMR algotrithm. Algorithm is based on calculating distance between a method and a class where it can be moved. Distance between the method and the class is mean of distances between the method and methods in the class. Distance between two methods is calculated by the following formula: 1 - size_of_intersection_of_conceptual_sets / size_of_union_of_conceptual_sets. Conceptual set consists of classes (only classes that are in specified scope) whose methods are called in the body of given method and classes of method arguments.

In code the following was done. I created my classes for searching entities which my algorithm needs. They are RmmrStrategy and RmmrEntitySearcher. Implementation is very similar to the ones that are implemented already (NewStrategy and EntitySearcher). The difference is that my implementation do not consider any fields and additionally processes new statements in java language. Also I added RMMR algorithm implementation. As a basis I took ARI algorithm implementation so they must very be similar.

I added switch in RefactoringExecutionContext to run different entity searchers for RMMR and others algorithms. And changed RefactoringDetectionTest: it ran all algos but assertions do not fit for RMMR.

Further actions are: test algorithm (I began it and added some test classes) and check results with these that are described in the article, think about the best accuracy formula.

@ujohnny
Copy link
Collaborator

ujohnny commented Apr 11, 2018

Добрый день,

Давайте напишем описание для пул реквеста, как мини-документацию. Со ссылкой на документ, описывающий алгоритм, чтобы можно было в будущем (и текущем) к нему обратиться и понять, что здесь сделано верно, а что нет.

RamSaw added 2 commits April 11, 2018 22:04
…eak and wrong variable name (attribute instead of attributeType). Also added several test cases and changed accuracy formula to be more accurate.
…re not correct (it doesn't supports fields), so I added exact list of algos to run.
public class RMMR extends Algorithm {
public static final String NAME = "RMMR";

private static final Logger LOGGER = Logging.getLogger(MRI.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

private double getDistance(MethodEntity methodEntity1, MethodEntity methodEntity2) {
// TODO: Maybe add to methodEntity2 source class where it is located?
int sizeOfIntersection = intersection(methodEntity1.getRelevantProperties().getClasses(),
methodEntity2.getRelevantProperties().getClasses()).size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might like to extract 2 variables there and these 4 lines will become more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

methodEntity2.getRelevantProperties().getClasses()).size();
int sizeOfUnion = union(methodEntity1.getRelevantProperties().getClasses(),
methodEntity2.getRelevantProperties().getClasses()).size();
return sizeOfIntersection == 0 ? 1 : 1 - (double) sizeOfIntersection / sizeOfUnion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't hesitate to use parenthesis around conditional statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood you correctly, you suggested to write so: (sizeOfIntersection == 0). If yes, then I've corrected it.

String className = null;
PsiType type = expression.getType();
if (type instanceof PsiClassType) {
className = ((PsiClassType) expression.getType()).getClassName();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(PsiClassType) type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

progressCount.set(0);

entities.getMethods().forEach(methodEntity -> {
List<ClassEntity> methodClass = entities.getClasses().stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entities -> List of Entity
classEntities -> List of ClassEntity
methodClass -> ???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to methodClassEntity. This list must be size of 1 and consist of class entity in which method is located.

}
final String targetClassName = targetClass.getName();
double accuracy = (1 - minDistance) * difference; // TODO: Maybe consider amount of entities?
if (!targetClassName.equals(entity.getClassName())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a bit odd check looking on the code. if CLASS whose method is being analyzed there is taken into account when calculating the distance -- algorithm above might give a closer metric and you'll lose correct result. Probably more correct approach is to perform this check before the call to getDistance. Please correct me if I'm wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it seems to me to be correct. We measure distance between CLASS and method, and distances between other classes and method. getDistance considers if it is calculating distance between CLASS and method so it will be counted as described in the article (without measuring method with itself which would be always 0 distance). And we suggest refactoring if there is less distance than with CLASS.

public void visitFile(PsiFile file) {
indicator.checkCanceled();
if (strategy.acceptFile(file)) {
LOGGER.info("Indexing " + file.getName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look consistent to methods below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well as info level for file visits is too high.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made consistent. Not sure that I understood correctly about too high level. Does it mean that this information is not so important to be logged in info? I peeped this code in EntitySearcher implementation, it is exactly like there. If it is really too high level then we need to correct it in EntitySearcher also.

LOGGER.info("Properties calculated");
LOGGER.info("Generated " + classes.size() + " class entities");
LOGGER.info("Generated " + methods.size() + " method entities");
LOGGER.info("Generated " + 0 + " field entities. Fields are not supported.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this line make any sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I report that this algorithm does not work with fields. I think it serves for consistency, because log information in EntitySearcher is structured the same.

double differenceWithSourceClassCoefficient = (1 - minDistance) * differenceWithSourceClass;
double accuracy = (0.7 * differenceWithSourceClassCoefficient + 0.3 * difference) *
sourceClassCoefficient * targetClassCoefficient;
// accuracy = 1;
if (accuracy >= 0.01 && !targetClassName.equals(entity.getClassName())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accuracy >= 0.01
is it just a comparison with epsilon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to final static constant.

RamSaw and others added 16 commits May 21, 2018 21:37
…mize and rearrange code, fix todo in code and add stack for classes.
…d stack for nested classes entity searching. Also changed ArrayList<Double> to double[] to avoid boxing and unboxing, so now calculation runs in acceptable time.
…nces on collections returned by method values() of the map. It is backed by the map, so it is more optimized in terms of memory than previous version.
…mming/ArchitectureReloaded into RMMR_implementation

Because PorterStemmer.jar was added externally.
…"as", "i" (as variable in for loop) by length, now accept only with length > 2.
…us vector in one entity became small and memory usage became small, this accelerated algorithm significantly.
RamSaw added 18 commits July 10, 2018 12:20
…gic moved to PropertiesCalculator, so now we walk through PSI tree only two times, not three times.
…t so different, but from my point of view a little bit better.
…ySearcherTest. Reason: it tightly depends on RMMR configuration.
…acy mostly was lower than 0.7 and accuracy between 0.5 and 0.7 was giving pretty reasonable refactorings. But other algorithms have normal accuracy
# Conflicts:
#	src/main/java/org/jetbrains/research/groups/ml_methods/algorithm/entity/OldEntity.java
#	src/main/java/org/jetbrains/research/groups/ml_methods/refactoring/RefactoringExecutionContext.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AlgorithmAbstractTest.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AriTest.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/CcdaTest.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/HacTest.java

Moved bag and contextualVector to RelevantProperties. Architecture design must be reconsidered.
# Conflicts:
#	src/main/java/org/jetbrains/research/groups/ml_methods/algorithm/entity/OldEntity.java
#	src/main/java/org/jetbrains/research/groups/ml_methods/refactoring/RefactoringExecutionContext.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AlgorithmAbstractTest.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AriTest.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/CcdaTest.java
#	src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/HacTest.java

Moved bag and contextualVector to RelevantProperties. Architecture design must be reconsidered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants