Skip to content

Commit 36a91a5

Browse files
authored
Merge pull request #27 from rog-golang-buddies/feature/parse_yml
Parsing of the open API to API Spec document.
2 parents 61627f5 + d842773 commit 36a91a5

33 files changed

+2152
-185
lines changed

README.md

+74-10
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,86 @@
11
# Data Scraping Service
2+
23
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://github.com/pre-commit/pre-commit)
34
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/rog-golang-buddies/api-hub_data-scraping-service/main.svg)](https://results.pre-commit.ci/latest/github/rog-golang-buddies/api-hub_data-scraping-service/main)
45

56
## Description
6-
Service asynchronously process user request to add new Open API.
7-
In other words this service processes content of Open API file, transforms it to the ASD (API Specification Document) model and sends next to the storage and update service.
87

9-
### Main functions (To Do)
10-
1. Listen to queue events (links to open API yaml/json files)
11-
2. Check link availability
12-
3. Retrieve file content
13-
4. Validate content
14-
5. Parse content into an ASD model
15-
6. Put ASD model with metadata to the storage and update service queue
8+
Service asynchronously process user request to add new Open API.
9+
In other words, this service processes the content of the Open API file and transforms it into the ASD (API
10+
Specification Document) model and sends it next to the storage and update service.
1611

1712
### Starting service
13+
1814
The easiest way to start an application is to do it with docker.
1915
If you have docker you just need to run a command from the project root
2016
`docker-compose -f ./docker/docker-compose-dev.yml up -d --build`.
2117
And `docker-compose -f ./docker/docker-compose-dev.yml down` to stop.
22-
You can observe queues, and send and retrieve messages from queues using the web interface available by address http://localhost:15672 .
18+
You can observe queues, and send and retrieve messages from queues via the web interface available by
19+
the address http://localhost:15672.
20+
login/password = guest/guest.
21+
22+
### MVP version
23+
24+
1. Listen for the events with the static links to the open API specification files.
25+
2. Download & parse openapi specification into a common API specification document(ASD) (view for the UI part).
26+
3. Send notification to the API gateway if required (depends on the flag; look 'How it works' section)
27+
4. Post ASD to the result queue.
28+
29+
#### Communication model
30+
31+
Consume requests with the file urls and notification flag
32+
Default listen queue name: data-scraping-asd
33+
Request:
34+
35+
```json5
36+
{
37+
"file_url": "https://developer.atlassian.com/cloud/trello/swagger.v3.json",
38+
"is_notify_user": true
39+
}
40+
```
41+
42+
If "is_notify_user" is true then this service must post notifications to the separate queue. A notification contains one
43+
field with an error model. If an error happens it will contain an error otherwise nil.
44+
Default notification queue name: gateway-scrape-notifications
45+
Example:
46+
47+
```json5
48+
{
49+
"error": {
50+
"cause": "file exceed the limit: 5242880",
51+
"message": "error while processing url"
52+
}
53+
}
54+
```
55+
56+
If the parsing process has been completed correctly then the result will be posted to the result queue and delivered to
57+
the 'storage and update service'
58+
Default result queue name: storage-update-asd
59+
The model is too big, so I don't give its description here - see the code for details.
60+
61+
#### How to check functionality manually using the RabbitMQ management page
62+
63+
1. Start service as mentioned in the 'Start service' section
64+
2. Go to http://localhost:15672 and login as guest/guest
65+
3. Go to the Queue tab.
66+
4. Check that data-scraping-asd queue has been already presented here
67+
5. Expand 'Add a new queue' section under the 'Overview' and add 2 queues: 'gateway-scrape-notifications' and
68+
'storage-update-asd'
69+
6. Go into the data-scraping-asd queue and expand the 'Publish message' section under the charts
70+
7. Add request body and publish a message
71+
8. You can check service logs with `docker logs dss`, return to the Queues tab and check result messages in the queues
72+
using the "Get messages" section
73+
74+
### Known current limitations (TO DO)
75+
76+
1. Supported only swagger 3.0 version.
77+
2. Ignore field constraints (max length and etc.)
78+
79+
### Main functions
80+
81+
1. Listen to queue events (links to open API yaml/json files)
82+
2. Check link availability
83+
3. Retrieve file content (there is a limit of file size - by default it's 5 Mb)
84+
4. Validate content
85+
5. Parse content into an ASD model
86+
6. Put ASD model with metadata to the storage and update service queue

docker/docker-compose-dev.yml

+9
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,18 @@
11
version: '3.9'
22

3+
volumes:
4+
rabbit-data:
5+
driver: local
6+
37
services:
48
rabbit:
59
image: rabbitmq:3-management #you may open management UI via http://localhost:15672/#/ login&password == guest
610
container_name: rabbit
11+
#hostname required here to work with the volume on persistent queues.
12+
#Rabbit saves data by folders whose names are generated from the host. To have data restored on container restart we need to commit the host.
13+
hostname: rabbit
14+
volumes:
15+
- rabbit-data:/var/lib/rabbitmq
716
ports:
817
- "5672:5672"
918
- "15672:15672"

go.mod

+6
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ module github.com/rog-golang-buddies/api-hub_data-scraping-service
33
go 1.18
44

55
require (
6+
github.com/getkin/kin-openapi v0.98.0
67
github.com/golang/mock v1.6.0
78
github.com/kelseyhightower/envconfig v1.4.0
89
github.com/rabbitmq/amqp091-go v1.4.0
@@ -13,8 +14,13 @@ require (
1314

1415
require (
1516
github.com/davecgh/go-spew v1.1.1 // indirect
17+
github.com/go-openapi/jsonpointer v0.19.5 // indirect
18+
github.com/go-openapi/swag v0.19.5 // indirect
19+
github.com/invopop/yaml v0.1.0 // indirect
20+
github.com/mailru/easyjson v0.0.0-20190626092158-b2ccc519800e // indirect
1621
github.com/pmezard/go-difflib v1.0.0 // indirect
1722
go.uber.org/atomic v1.7.0 // indirect
1823
go.uber.org/multierr v1.6.0 // indirect
24+
gopkg.in/yaml.v2 v2.4.0 // indirect
1925
gopkg.in/yaml.v3 v3.0.1 // indirect
2026
)

go.sum

+17
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,27 @@ github.com/benbjohnson/clock v1.1.0 h1:Q92kusRqC1XV2MjkWETPvjJVqKetz1OzxZB7mHJLj
22
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
33
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
44
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
5+
github.com/getkin/kin-openapi v0.98.0 h1:lIACvCG9cxmFsEywz+LCoVhcZHFLUy+Nv5QSkb43eAE=
6+
github.com/getkin/kin-openapi v0.98.0/go.mod h1:w4lRPHiyOdwGbOkLIyk+P0qCwlu7TXPCHD/64nSXzgE=
7+
github.com/go-openapi/jsonpointer v0.19.5 h1:gZr+CIYByUqjcgeLXnQu2gHYQC9o73G2XUeOFYEICuY=
8+
github.com/go-openapi/jsonpointer v0.19.5/go.mod h1:Pl9vOtqEWErmShwVjC8pYs9cog34VGT37dQOVbmoatg=
9+
github.com/go-openapi/swag v0.19.5 h1:lTz6Ys4CmqqCQmZPBlbQENR1/GucA2bzYTE12Pw4tFY=
10+
github.com/go-openapi/swag v0.19.5/go.mod h1:POnQmlKehdgb5mhVOsnJFsivZCEZ/vjK9gh66Z9tfKk=
511
github.com/golang/mock v1.6.0 h1:ErTB+efbowRARo13NNdxyJji2egdxLGQhRaY+DUumQc=
612
github.com/golang/mock v1.6.0/go.mod h1:p6yTPP+5HYm5mzsMV8JkE6ZKdX+/wYM6Hr+LicevLPs=
13+
github.com/gorilla/mux v1.8.0/go.mod h1:DVbg23sWSpFRCP0SfiEN6jmj59UnW/n46BH5rLB71So=
14+
github.com/invopop/yaml v0.1.0 h1:YW3WGUoJEXYfzWBjn00zIlrw7brGVD0fUKRYDPAPhrc=
15+
github.com/invopop/yaml v0.1.0/go.mod h1:2XuRLgs/ouIrW3XNzuNj7J3Nvu/Dig5MXvbCEdiBN3Q=
716
github.com/kelseyhightower/envconfig v1.4.0 h1:Im6hONhd3pLkfDFsbRgu68RDNkGF1r3dvMUtDTo2cv8=
817
github.com/kelseyhightower/envconfig v1.4.0/go.mod h1:cccZRl6mQpaq41TPp5QxidR+Sa3axMbJDNb//FQX6Gg=
918
github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI=
1019
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
1120
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
1221
github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE=
1322
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
23+
github.com/mailru/easyjson v0.0.0-20190614124828-94de47d64c63/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
24+
github.com/mailru/easyjson v0.0.0-20190626092158-b2ccc519800e h1:hB2xlXdHp/pmPZq0y3QnmWAArdw9PqbmotexnWx/FU8=
25+
github.com/mailru/easyjson v0.0.0-20190626092158-b2ccc519800e/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
1426
github.com/pkg/errors v0.8.1 h1:iURUrRGxPUNPdy5/HRSm+Yj6okJ6UtLINN0Q9M4+h3I=
1527
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
1628
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
@@ -19,6 +31,7 @@ github.com/rabbitmq/amqp091-go v1.4.0/go.mod h1:JsV0ofX5f1nwOGafb8L5rBItt9GyhfQf
1931
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
2032
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
2133
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
34+
github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA=
2235
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
2336
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
2437
github.com/stretchr/testify v1.8.0 h1:pSgiaMZlXftHpm5L7V1+rVB+AZJydKsMxsQBIJw4PKk=
@@ -63,6 +76,10 @@ golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8T
6376
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
6477
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127 h1:qIbj1fsPNlZgppZ+VLlY7N33q108Sa+fhmuc+sWQYwY=
6578
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
79+
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
80+
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
81+
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
6682
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
83+
gopkg.in/yaml.v3 v3.0.0/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
6784
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
6885
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

internal/app.go

+7-6
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/load"
88
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/logger"
99
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/parse"
10+
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/parse/openapi"
1011
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/process"
1112
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/queue"
1213
"github.com/rog-golang-buddies/api-hub_data-scraping-service/internal/queue/handler"
@@ -29,7 +30,7 @@ func Start() int {
2930
return 1
3031
}
3132

32-
proc, err := createDefaultProcessor()
33+
proc, err := createDefaultProcessor(log, conf)
3334
if err != nil {
3435
log.Error("error while creating processor: ", err)
3536
return 1
@@ -65,11 +66,11 @@ func Start() int {
6566
return 0
6667
}
6768

68-
func createDefaultProcessor() (process.UrlProcessor, error) {
69-
recognizer := recognize.NewRecognizer()
70-
parsers := []parse.Parser{parse.NewJsonOpenApiParser(), parse.NewYamlOpenApiParser()}
71-
converter := parse.NewConverter(parsers)
72-
loader := load.NewContentLoader()
69+
func createDefaultProcessor(log logger.Logger, config *config.ApplicationConfig) (process.UrlProcessor, error) {
70+
recognizer := recognize.NewRecognizer(log)
71+
parsers := []parse.Parser{openapi.NewOpenApi(log)}
72+
converter := parse.NewConverter(log, parsers)
73+
loader := load.NewContentLoader(log, &config.Web)
7374

7475
return process.NewProcessor(recognizer, converter, loader)
7576
}

internal/config/application.go

+5-4
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@ import (
66
)
77

88
type ApplicationConfig struct {
9-
Env Environment `default:"dev"`
10-
Logger LoggerConfig
11-
Queue QueueConfig
9+
Env Environment `default:"dev"`
10+
Logger LoggerConfig
11+
Queue QueueConfig
12+
Web Web
1213
}
1314

14-
//ReadConfig reads configuration from the environment and populates the structure with it
15+
// ReadConfig reads configuration from the environment and populates the structure with it
1516
func ReadConfig() (*ApplicationConfig, error) {
1617
var conf ApplicationConfig
1718
if err := envconfig.Process("", &conf); err != nil {

internal/config/web.go

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
package config
2+
3+
// Web is a web-related properties configuration
4+
type Web struct {
5+
//RespLimBytes represents the maximum file size (in bytes) to download.
6+
RespLimBytes int64 `default:"5242880"`
7+
}

0 commit comments

Comments
 (0)