-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTAHUB-1974] Monitoring Data #1803
Conversation
I'm not seeing too much to complain about with the code. If you could add instructions for local dev, I can attempt to set this up Monday morning and start testing it. We probably also want to deploy it to sandbox or dev ASAP now that CI is passing. |
Is there a need to release all this in one go? Can this be broken out into manageable stages/ segments? From the TLDR these seem like obvious segments.
|
@kcirtapfromspace That would have been a good way to split this into stories had this been done before we got to this point. But now that we need this delivered already. Its too late to split now. |
[TTAHUB-2059] CLASS/monitoring FE. CLASS BE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked the places where we need to remove comments.
Co-authored-by: kryswisnaskas <[email protected]>
Co-authored-by: kryswisnaskas <[email protected]>
Co-authored-by: kryswisnaskas <[email protected]>
Co-authored-by: kryswisnaskas <[email protected]>
…Head-Start-TTADP into TTAHUB-1974/monitoring-data
Description of change
This adds three major areas of functionality:
The immediate implementation uses cron to fetch the job definition for Monitoring files out of the database. It starts by streaming a zip file on Monitoring's SFTP to a file on the TTA Hub S3. From there we stream decompress component files into memory*. We stream-reencode them from UTF-16 to UTF-8 encoded text streams, then to parsed XML, then to rows, which we upsert into the database.
The resulting tables are mirrors of the current state of the relevant columns of the relevant tables within the Monitoring source data. In addition, there is one link table added for each of the joining data elements: 1)grant numbers, 2) review IDs, and 3) status IDs. The link tables are kept populated by hooks on the main tables and do nothing but provide Sequelize with tables for which the join elements are primary keys.
The overall implementation goals were to minimize the memory footprint, to minimize transformation of data between the source and TTA Hub's representation, and to allow easy reconfiguration of the process.
*This is the only time we have to hold a whole file in memory, and is due to zip handling limitations that would be onerous to work around.
Additional Introduced Functionality to meet goals above:
LockManager
class inlockManager.ts
manages distributed locks with Redis, allowing execution of callbacks within these locks. It initializes with a lock key, TTL, and Redis configuration, and provides methods to acquire and release locks, check lock ownership, and renew lock TTL. The class ensures that only one instance can execute a callback with the lock at a time and handles lock renewal automatically. Errors during lock operations are logged and managed within the class methods.BufferStream
class extends theWritable
stream class from Node.js and is designed to collect data chunks into an internal buffer. It provides a mechanism to retrieve aReadable
stream representing the collected data, either immediately if the stream has finished or via a promise that resolves once the stream finishes. The class overrides the_write
method to push incoming data chunks into an array after converting them toBuffer
objects. It also has afinish
event listener that marks the stream as finished and resolves a promise with aReadable
stream created from the buffered data. ThegetReadableStream
method returns a promise that resolves with aReadable
stream, which is created immediately if the stream has finished or deferred until the stream finishes. ThegetSize
method returns the number of chunks in the buffer. ThegetBuffer
method concatenates all chunks into a singleBuffer
object, which can throw aTypeError
if any chunk is not a valid type for concatenation.EncodingConverter
class extends the Node.jsTransform
stream to facilitate the conversion of text data between different encodings. It is initialized with a target encoding and an optional source encoding, both of which must be among the supportedBufferEncoding
types. If an unsupported encoding is provided, an error is thrown. The class uses thechardet
library to detect the encoding of input data when the source encoding is not specified. It implements the_transform
and_flush
methods from theTransform
class to handle the streaming and conversion of data, and it includes helper methodsconvertBuffer
andconvertChunk
to perform the encoding conversion. The class maintains a private buffer for accumulating data and a static set of supported encodings to check the validity of the encodings used.'Hasher' class extends the 'PassThrough' stream to provide a streaming interface for generating cryptographic hashes. It includes a private 'hash' object from the crypto module, a promise 'hashPromise' to handle the asynchronous result, and resolver/rejector functions 'resolveHash' and 'rejectHash'. The constructor accepts an 'Algorithm' type with a default value and sets up the hash object and promise. Event listeners are attached to the stream for 'data', 'end', and 'error' events to update the hash, resolve the promise with the final hash value in hexadecimal format, or reject the promise in case of an error. The 'getHash' public method returns the promise that resolves to the hash value.
S3Client
class in TypeScript is designed to interact with AWS S3 services. It provides methods to upload files as streams, download files as streams, retrieve file metadata, delete files, and list all files in a specified S3 bucket. The class constructor accepts a configuration object with an optional S3 configuration and a bucket name. TheuploadFileAsStream
method takes a file key and a readable stream, uploading the file to the S3 bucket. ThedownloadFileAsStream
method returns a readable stream of the file content from the S3 bucket, given a file key. ThegetFileMetadata
method retrieves metadata for a specified file in the S3 bucket. ThedeleteFile
method deletes a file from the S3 bucket using the file key. Lastly, thelistFiles
method returns a list of all objects in the S3 bucket. All methods are asynchronous and return a promise. If any errors occur during the execution of these methods, they log the error usingauditLogger.error
and rethrow the error.SftpClient
class for interacting with an SFTP server using thessh2
library. It includes methods for connecting (connect
), disconnecting (disconnect
), checking connection status (isConnected
), listing files (listFiles
), and downloading files as streams (downloadAsStream
). Theconnect
method establishes a connection and handles errors by logging them and updating the connection state. Thedisconnect
method closes the connection and removes event listeners. ThelistFiles
method asynchronously retrieves a list of files from the server, with options to filter by file name, list files after a certain file, and include read streams for each file. ThedownloadAsStream
method initiates a download of a remote file as a readable stream, with an option to use compression. The class also handles system signals to ensure proper disconnection and cleanup. The code includes interfaces for connection configuration (ConnectConfig
), file listing options (ListFileOptions
), file information (FileInfo
), and the file listing itself (FileListing
). A utility function (modeToPermissions
) is provided to convert numeric file modes to human-readable permission strings. TheSftpClient
class and related interfaces are exported for use in other modules.XMLStream
class is designed to parse XML data from aReadable
stream using a SAX parser. It provides methods to initialize the parser, count parsed objects, check if parsing is complete, retrieve the next parsed object, and obtain the JSON schema of the parsed XML. The class constructor accepts an XML stream and an optional flag for a virtual root node. Theinitialize
method sets up event handlers for parsing and returns a promise that resolves when parsing is complete or rejects on error. ThegetObjectCount
method returns the number of parsed objects, whileprocessingComplete
checks if the stream has been fully read. ThegetNextObject
method asynchronously retrieves the next object from the parsed data, optionally simplifying it, andgetObjectSchema
returns a promise that resolves to the schema of the parsed data. The class handles XML node construction, schema generation, and text content processing, and it supports virtual root nodes to encapsulate the XML data.ZipStream
class, designed to handle ZIP file streams for reading and extracting file information and contents. The class constructor accepts a ZIP file stream, an optional password for encrypted archives, and an array specifying which files need their streams extracted. It provides methods to list file paths (listFiles
), get details of a specific file (getFileDetails
), retrieve details for all files (getAllFileDetails
), and obtain a stream for a particular file (getFileStream
). Each method returns aPromise
, ensuring that the requested data is available after the ZIP processing is complete. The class utilizes theunzipper
library for parsing ZIP contents and manages errors and logging through anauditLogger
. Additionally, theFileInfo
interface is defined to structure the file details, and both theZipStream
class andFileInfo
interface are exported for external use.A collection of utilities for data manipulation, including object type checks, undefined value removal, object property remapping and pruning, deep value comparison, object merging, and key transformation. Functions handle equality checks for numbers and dates, collect changes between objects, flatten nested structures, cast values to their detected types, and convert object keys to lowercase. Errors are thrown for invalid inputs or operations.
A collection of utility functions for working with Sequelize models and data types. It defines a custom type
SequelizeDataTypes
that represents all available data types in Sequelize. A mapping between Sequelize data types and TypeScript data types is established indataTypeMapping
. ThemodelForTable
function retrieves a model for a given table name from a database object and throws an error if no model is found.getColumnInformation
asynchronously retrieves column information from a model, including column names, data types, and nullability.getColumnNamesFromModelForType
fetches column names from a model that match a specific Sequelize data type.filterDataToModel
filters a data object based on a model's column information and returns objects with matched and unmatched properties.includeToFindAll
is a function that includes a model and its associations, retrieves all records matching given conditions, and supports additional conditions and attribute selection. Lastly,nestedRawish
transforms a nested object or array of objects by stripping out Sequelize model instance metadata, retaining only the raw data values, and preserving the input structure.How to test
Issue(s)
Checklists
Every PR
Production Deploy
After merge/deploy