-
Notifications
You must be signed in to change notification settings - Fork 3
Implemented draft of RMMR algorithm. #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Добрый день, Давайте напишем описание для пул реквеста, как мини-документацию. Со ссылкой на документ, описывающий алгоритм, чтобы можно было в будущем (и текущем) к нему обратиться и понять, что здесь сделано верно, а что нет. |
…eak and wrong variable name (attribute instead of attributeType). Also added several test cases and changed accuracy formula to be more accurate.
…re not correct (it doesn't supports fields), so I added exact list of algos to run.
public class RMMR extends Algorithm { | ||
public static final String NAME = "RMMR"; | ||
|
||
private static final Logger LOGGER = Logging.getLogger(MRI.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
private double getDistance(MethodEntity methodEntity1, MethodEntity methodEntity2) { | ||
// TODO: Maybe add to methodEntity2 source class where it is located? | ||
int sizeOfIntersection = intersection(methodEntity1.getRelevantProperties().getClasses(), | ||
methodEntity2.getRelevantProperties().getClasses()).size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might like to extract 2 variables there and these 4 lines will become more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
methodEntity2.getRelevantProperties().getClasses()).size(); | ||
int sizeOfUnion = union(methodEntity1.getRelevantProperties().getClasses(), | ||
methodEntity2.getRelevantProperties().getClasses()).size(); | ||
return sizeOfIntersection == 0 ? 1 : 1 - (double) sizeOfIntersection / sizeOfUnion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't hesitate to use parenthesis around conditional statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood you correctly, you suggested to write so: (sizeOfIntersection == 0). If yes, then I've corrected it.
String className = null; | ||
PsiType type = expression.getType(); | ||
if (type instanceof PsiClassType) { | ||
className = ((PsiClassType) expression.getType()).getClassName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(PsiClassType) type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected.
progressCount.set(0); | ||
|
||
entities.getMethods().forEach(methodEntity -> { | ||
List<ClassEntity> methodClass = entities.getClasses().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entities -> List of Entity
classEntities -> List of ClassEntity
methodClass -> ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to methodClassEntity. This list must be size of 1 and consist of class entity in which method is located.
} | ||
final String targetClassName = targetClass.getName(); | ||
double accuracy = (1 - minDistance) * difference; // TODO: Maybe consider amount of entities? | ||
if (!targetClassName.equals(entity.getClassName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit odd check looking on the code. if CLASS whose method is being analyzed there is taken into account when calculating the distance -- algorithm above might give a closer metric and you'll lose correct result. Probably more correct approach is to perform this check before the call to getDistance. Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it seems to me to be correct. We measure distance between CLASS and method, and distances between other classes and method. getDistance considers if it is calculating distance between CLASS and method so it will be counted as described in the article (without measuring method with itself which would be always 0 distance). And we suggest refactoring if there is less distance than with CLASS.
public void visitFile(PsiFile file) { | ||
indicator.checkCanceled(); | ||
if (strategy.acceptFile(file)) { | ||
LOGGER.info("Indexing " + file.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look consistent to methods below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as well as info level for file visits is too high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made consistent. Not sure that I understood correctly about too high level. Does it mean that this information is not so important to be logged in info? I peeped this code in EntitySearcher implementation, it is exactly like there. If it is really too high level then we need to correct it in EntitySearcher also.
LOGGER.info("Properties calculated"); | ||
LOGGER.info("Generated " + classes.size() + " class entities"); | ||
LOGGER.info("Generated " + methods.size() + " method entities"); | ||
LOGGER.info("Generated " + 0 + " field entities. Fields are not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this line make any sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I report that this algorithm does not work with fields. I think it serves for consistency, because log information in EntitySearcher is structured the same.
…bugging that is why single thread version is commented not deleted.
… it is called by class name but IDEA doesn't considers it.
double differenceWithSourceClassCoefficient = (1 - minDistance) * differenceWithSourceClass; | ||
double accuracy = (0.7 * differenceWithSourceClassCoefficient + 0.3 * difference) * | ||
sourceClassCoefficient * targetClassCoefficient; | ||
// accuracy = 1; | ||
if (accuracy >= 0.01 && !targetClassName.equals(entity.getClassName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accuracy >= 0.01
is it just a comparison with epsilon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to final static constant.
…d accuracy formula, added annotations @NotNull.
…mize and rearrange code, fix todo in code and add stack for classes.
…d stack for nested classes entity searching. Also changed ArrayList<Double> to double[] to avoid boxing and unboxing, so now calculation runs in acceptable time.
…o speed up calculations.
…nces on collections returned by method values() of the map. It is backed by the map, so it is more optimized in terms of memory than previous version.
…mming/ArchitectureReloaded into RMMR_implementation Because PorterStemmer.jar was added externally.
…"as", "i" (as variable in for loop) by length, now accept only with length > 2.
…us vector in one entity became small and memory usage became small, this accelerated algorithm significantly.
…gic moved to PropertiesCalculator, so now we walk through PSI tree only two times, not three times.
…thod test started pass.
…t so different, but from my point of view a little bit better.
…: it tightly depends on RMMR configuration.
…ySearcherTest. Reason: it tightly depends on RMMR configuration.
…: ClassA.methodFromClassA()
…acy mostly was lower than 0.7 and accuracy between 0.5 and 0.7 was giving pretty reasonable refactorings. But other algorithms have normal accuracy
# Conflicts: # src/main/java/org/jetbrains/research/groups/ml_methods/algorithm/entity/OldEntity.java # src/main/java/org/jetbrains/research/groups/ml_methods/refactoring/RefactoringExecutionContext.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AlgorithmAbstractTest.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AriTest.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/CcdaTest.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/HacTest.java Moved bag and contextualVector to RelevantProperties. Architecture design must be reconsidered.
# Conflicts: # src/main/java/org/jetbrains/research/groups/ml_methods/algorithm/entity/OldEntity.java # src/main/java/org/jetbrains/research/groups/ml_methods/refactoring/RefactoringExecutionContext.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AlgorithmAbstractTest.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/AriTest.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/CcdaTest.java # src/test/java/org/jetbrains/research/groups/ml_methods/algorithm/HacTest.java Moved bag and contextualVector to RelevantProperties. Architecture design must be reconsidered.
First draft of an RMMR algotrithm. Algorithm is based on calculating distance between a method and a class where it can be moved. Distance between the method and the class is mean of distances between the method and methods in the class. Distance between two methods is calculated by the following formula: 1 - size_of_intersection_of_conceptual_sets / size_of_union_of_conceptual_sets. Conceptual set consists of classes (only classes that are in specified scope) whose methods are called in the body of given method and classes of method arguments.
In code the following was done. I created my classes for searching entities which my algorithm needs. They are RmmrStrategy and RmmrEntitySearcher. Implementation is very similar to the ones that are implemented already (NewStrategy and EntitySearcher). The difference is that my implementation do not consider any fields and additionally processes new statements in java language. Also I added RMMR algorithm implementation. As a basis I took ARI algorithm implementation so they must very be similar.
I added switch in RefactoringExecutionContext to run different entity searchers for RMMR and others algorithms. And changed RefactoringDetectionTest: it ran all algos but assertions do not fit for RMMR.
Further actions are: test algorithm (I began it and added some test classes) and check results with these that are described in the article, think about the best accuracy formula.