forked from fiskra-zz/HtmlParser
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
feride
authored and
feride
committed
Apr 3, 2017
1 parent
92822ba
commit cd57c9e
Showing
3 changed files
with
266 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
#!/bin/bash | ||
mvn clean package |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<groupId>com.fiskra</groupId> | ||
<artifactId>technical.challenge</artifactId> | ||
<version>0.0.1-SNAPSHOT</version> | ||
<packaging>jar</packaging> | ||
|
||
<name>SampleParser</name> | ||
<description>Scout24 Technical Challenge</description> | ||
|
||
<parent> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-starter-parent</artifactId> | ||
<version>1.5.2.RELEASE</version> | ||
<relativePath /> | ||
</parent> | ||
|
||
<properties> | ||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> | ||
<java.version>1.8</java.version> | ||
</properties> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-starter-actuator</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-starter-security</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.springframework.security.oauth</groupId> | ||
<artifactId>spring-security-oauth2</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-starter-web</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-starter-test</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-configuration-processor</artifactId> | ||
<optional>true</optional> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.webjars</groupId> | ||
<artifactId>angularjs</artifactId> | ||
<version>1.4.3</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.webjars</groupId> | ||
<artifactId>jquery</artifactId> | ||
<version>2.1.1</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.webjars</groupId> | ||
<artifactId>bootstrap</artifactId> | ||
<version>3.2.0</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.webjars</groupId> | ||
<artifactId>webjars-locator</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>org.jsoup</groupId> | ||
<artifactId>jsoup</artifactId> | ||
<version>1.8.3</version> | ||
</dependency> | ||
|
||
</dependencies> | ||
|
||
<build> | ||
<plugins> | ||
<plugin> | ||
<groupId>org.springframework.boot</groupId> | ||
<artifactId>spring-boot-maven-plugin</artifactId> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# Html Parser | ||
|
||
> The simplicity is a key. | ||
### How to build | ||
|
||
execute from the terminal build.sh which is located in the project root. | ||
|
||
### How to run | ||
execute the above command in target directory in the project root | ||
|
||
java -jar technical.challenge-0.0.1-SNAPSHOT.jar | ||
|
||
### Simple HTML Parser Form | ||
|
||
This application aims to parse a given link by user and show the required information which is described below: | ||
|
||
* Version of Document | ||
* Page title | ||
* Number of headings grouped by heading level | ||
* Number of hypermedia links grouped by internal/external | ||
|
||
The application has also a login form by using GitHub, Facebook and Spiegel login form. | ||
|
||
|
||
## Technical Infrastructure | ||
|
||
### Language | ||
Java 8 | ||
|
||
### Runtime | ||
JDK 8 | ||
|
||
### Build | ||
Maven is the preferential build manager. Just run the build.sh in the project file | ||
|
||
### HTML Parser | ||
JSoup | ||
|
||
### Framework | ||
In this application Spring framework was selected with these features: Spring Boot, Spring MVC, Spring Security, Angular.js | ||
|
||
### Why Spring? | ||
It provides a strong infrastructure and to get us productivity as qucikly as possible. | ||
It is simple to integrate with other Java frameworks | ||
Spring's dependency injection provides us to write testable, robust code. | ||
|
||
### Spring Boot | ||
Spring Boot motto is "just run". You don't need to worry about server side part because Tomcat/Jetty are embedded and no need to deploy WAR externally. | ||
It is easy to manage dependency : adding "springboot-starter-web" in a maven pom.xml, it will pull all used libraries while developing MVC application. | ||
As we know, Tomcat is a servlet container application server. In Spring Boot Application class is a simple java class annotated with @SpringApplication | ||
When we run it,[http://localhost:8080/]will be able to reachable. | ||
|
||
``` | ||
@SpringBootApplication | ||
public class Application | ||
{ | ||
public static void main(String[] args) | ||
{ | ||
SpringApplication.run(Application.class, args); | ||
} | ||
} | ||
``` | ||
|
||
Just this code block is enough for our application to get up and run. Simple and cool way. | ||
|
||
|
||
### Spring MVC | ||
It is flexible instead og other Java frameworks like Struts. It provides multiple view technologies except JSP(XSLT,XL,Velocity ) | ||
It provides Model-View-Controller architecture and there are many advantages to implement enterprise applicatons via Spring MVC | ||
|
||
### Spring Security | ||
We need a security framework to make our application secure.It focuses on authentication and authorization. In this application | ||
github and facebook(social login) with oauth2 authentication and der spiegel login. | ||
There are two common approach for authentication : Session/Cookie based, Oauth2/Api and some stateless ones like JSonWebTokens,Http Basic Auth | ||
I implemented oauth2 authentication for social login part using Spring Security and Spring Boot just add the dependencies in maven pom.xml | ||
|
||
|
||
### JSoup | ||
It is java library to extract and manupulate HTML data. | ||
|
||
### Test | ||
Spring boot test library | ||
|
||
### Angular.js | ||
Angular.js is a javascript framework and it is well formed with Spring MVC. | ||
P.S: I am not good at Angular.js, this is my first :) | ||
|
||
Default port: 8080 | ||
Application url : localhost:/8080 | ||
|
||
### Application Flow | ||
|
||
If there is no user is logged in the application redirect to login page. | ||
After logged in there is a form has a textbox to query desired url(HTML page). | ||
There is a logout button on the page and username is shown next to it. After submit form, url analyse will be shown in the below. | ||
|
||
To make the application secure with Spring security, we create a class which extends WebSecurityConfigurerAdapter and override configure method: | ||
|
||
``` | ||
@Configuration | ||
@EnableWebSecurity | ||
public class WebSecurityConfiguration extends WebSecurityConfigurerAdapter { | ||
@Override | ||
protected void configure(HttpSecurity http) throws Exception { | ||
http.antMatcher("/**") | ||
.authorizeRequests().antMatchers("/", "/login**", "/webjars/**").permitAll().anyRequest() | ||
.authenticated().and().exceptionHandling() | ||
.authenticationEntryPoint(new LoginUrlAuthenticationEntryPoint("/")) | ||
.and().logout().logoutSuccessUrl("/").permitAll().and(). | ||
csrf().csrfTokenRepository(CookieCsrfTokenRepository.withHttpOnlyFalse()).and() | ||
.addFilterBefore(ssoFilter(), BasicAuthenticationFilter.class); | ||
} | ||
``` | ||
|
||
@EnableWebSecurity annoatation enables the Spring security suppport and Spring MVC integration. | ||
I also add csrf protection in configure method. Angular.js is slightly different from Spring security. | ||
csrfTokenRepository(CookieCsrfTokenRepository.withHttpOnlyFalse()) is added for Angular.js | ||
|
||
Some configuration methods in this class helps to login process. | ||
application.yml file keeps required token keys,token uris, oauth url related to social login accounts | ||
and some logging configuration properties like log level and log format. | ||
|
||
The method which is in the above, handles the redirections from facebook,github to the application | ||
via servlet Filter | ||
|
||
``` | ||
@Bean | ||
public FilterRegistrationBean oauth2ClientFilterRegistration(OAuth2ClientContextFilter filter) { | ||
FilterRegistrationBean registration = new FilterRegistrationBean(); | ||
registration.setFilter(filter); | ||
registration.setOrder(-100); | ||
return registration; | ||
} | ||
``` | ||
|
||
|
||
### Exception Mechanism | ||
I implemented exception mechanism by using @ControllerAdvice annotation which is based on Global Exception Handling. | ||
@ExceptionHandler annotation is used in the handler methods. Even if I handled one custom exception type, | ||
it would be capable of handling all the global exception scenarios. | ||
|
||
|
||
### Logging | ||
I activated spring security and spring web logging in error level so I did not add extra log code here. | ||
I prefer to add log info/error codes into business part. | ||
|
||
### Test | ||
There are some useful methods were implemented to test analyzing functions. I just implemented unit test cases but some scenario based cases should be considered in real. There are some qualified tools(like Selenium) to perform these scenarios and provide end to end scalable application. | ||
Spring boot offers also integration test support. Integration test is very helpful if your application is | ||
distributed. | ||
|
||
|
||
### Form Validation | ||
Angular.js makes forms easy and provides simple/quick form validation. I used Angular form validation. | ||
|
||
### UnImplemented Part | ||
I did redirection to login page with javascript, the main idea of login part here would be getting html document elements via some javascript | ||
calls and check if some login id/name attributes are in the page. If it does not include any login related attributes, just close the popup and | ||
get username as an authenticated user. Github provides oauth authentication but Spiegel does not. So two different way should be considered. | ||
|
||
### Optinal Part | ||
Validation if each link is available via HTTPS would be slow, especially if we have too many links in requested page. | ||
There are two ways to check links: | ||
First we can do this to implement some java code on server side. But performance is the issue, we can enhance the performance by multithreading. | ||
Second way would be Asynchronous Javascript requests. Running the request asynchronously will ensure our page is responsive and | ||
the rest of code continues to run while HTTP request is taking its time. | ||
|
||
|
||
|