|
1 | 1 | # Data Scraping Service
|
| 2 | + |
2 | 3 | [](https://github.com/pre-commit/pre-commit)
|
3 | 4 | [](https://results.pre-commit.ci/latest/github/rog-golang-buddies/api-hub_data-scraping-service/main)
|
4 | 5 |
|
5 | 6 | ## Description
|
6 |
| -Service asynchronously process user request to add new Open API. |
7 |
| -In other words this service processes content of Open API file, transforms it to the ASD (API Specification Document) model and sends next to the storage and update service. |
8 | 7 |
|
9 |
| -### Main functions (To Do) |
10 |
| -1. Listen to queue events (links to open API yaml/json files) |
11 |
| -2. Check link availability |
12 |
| -3. Retrieve file content |
13 |
| -4. Validate content |
14 |
| -5. Parse content into an ASD model |
15 |
| -6. Put ASD model with metadata to the storage and update service queue |
| 8 | +Service asynchronously process user request to add new Open API. |
| 9 | +In other words, this service processes the content of the Open API file and transforms it into the ASD (API |
| 10 | +Specification Document) model and sends it next to the storage and update service. |
16 | 11 |
|
17 | 12 | ### Starting service
|
| 13 | + |
18 | 14 | The easiest way to start an application is to do it with docker.
|
19 | 15 | If you have docker you just need to run a command from the project root
|
20 | 16 | `docker-compose -f ./docker/docker-compose-dev.yml up -d --build`.
|
21 | 17 | And `docker-compose -f ./docker/docker-compose-dev.yml down` to stop.
|
22 |
| -You can observe queues, and send and retrieve messages from queues using the web interface available by address http://localhost:15672 . |
| 18 | +You can observe queues, and send and retrieve messages from queues via the web interface available by |
| 19 | +the address http://localhost:15672. |
| 20 | +login/password = guest/guest. |
| 21 | + |
| 22 | +### MVP version |
| 23 | + |
| 24 | +1. Listen for the events with the static links to the open API specification files. |
| 25 | +2. Download & parse openapi specification into a common API specification document(ASD) (view for the UI part). |
| 26 | +3. Send notification to the API gateway if required (depends on the flag; look 'How it works' section) |
| 27 | +4. Post ASD to the result queue. |
| 28 | + |
| 29 | +#### Communication model |
| 30 | + |
| 31 | +Consume requests with the file urls and notification flag |
| 32 | +Default listen queue name: data-scraping-asd |
| 33 | +Request: |
| 34 | + |
| 35 | +```json5 |
| 36 | +{ |
| 37 | + "file_url": "https://developer.atlassian.com/cloud/trello/swagger.v3.json", |
| 38 | + "is_notify_user": true |
| 39 | +} |
| 40 | +``` |
| 41 | + |
| 42 | +If "is_notify_user" is true then this service must post notifications to the separate queue. A notification contains one |
| 43 | +field with an error model. If an error happens it will contain an error otherwise nil. |
| 44 | +Default notification queue name: gateway-scrape-notifications |
| 45 | +Example: |
| 46 | + |
| 47 | +```json5 |
| 48 | +{ |
| 49 | + "error": { |
| 50 | + "cause": "file exceed the limit: 5242880", |
| 51 | + "message": "error while processing url" |
| 52 | + } |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +If the parsing process has been completed correctly then the result will be posted to the result queue and delivered to |
| 57 | +the 'storage and update service' |
| 58 | +Default result queue name: storage-update-asd |
| 59 | +The model is too big, so I don't give its description here - see the code for details. |
| 60 | + |
| 61 | +#### How to check functionality manually using the RabbitMQ management page |
| 62 | + |
| 63 | +1. Start service as mentioned in the 'Start service' section |
| 64 | +2. Go to http://localhost:15672 and login as guest/guest |
| 65 | +3. Go to the Queue tab. |
| 66 | +4. Check that data-scraping-asd queue has been already presented here |
| 67 | +5. Expand 'Add a new queue' section under the 'Overview' and add 2 queues: 'gateway-scrape-notifications' and |
| 68 | + 'storage-update-asd' |
| 69 | +6. Go into the data-scraping-asd queue and expand the 'Publish message' section under the charts |
| 70 | +7. Add request body and publish a message |
| 71 | +8. You can check service logs with `docker logs dss`, return to the Queues tab and check result messages in the queues |
| 72 | + using the "Get messages" section |
| 73 | + |
| 74 | +### Known current limitations (TO DO) |
| 75 | + |
| 76 | +1. Supported only swagger 3.0 version. |
| 77 | +2. Ignore field constraints (max length and etc.) |
| 78 | + |
| 79 | +### Main functions |
| 80 | + |
| 81 | +1. Listen to queue events (links to open API yaml/json files) |
| 82 | +2. Check link availability |
| 83 | +3. Retrieve file content (there is a limit of file size - by default it's 5 Mb) |
| 84 | +4. Validate content |
| 85 | +5. Parse content into an ASD model |
| 86 | +6. Put ASD model with metadata to the storage and update service queue |
0 commit comments