Integity in data science applications is crucial, because data science applications exist to provide data-driven insights. As soon as the integrity of a data science application breaks down, people lose trust in the output and, as a result, will refuse to make decisions based on the output. Go helps us maintain integrity in terms of reproducibility and deployment, which are common struggles for data scientists.
- Data science applications should consider integrity before performance or sophistication.
- A lack of reproducibility destroys the credibility of a data science application.
- Integrity cannot be maintained with a complicated deploy.
- If errors and edge cases are handled gracefully in Go, you can have confidence in how your application will behave.
- There are ways of deploying Go that maintain integrity, even if you utilize various dependencies for your statistics, ML, etc.
Example python data science Dockerfile
Example Go Dockerfile
Parse a clean CSV with python
Parse a clean CSV with Go
Force Integrity breakdown with python CSV parsing
Maintain integrity in Go CSV parsing
Implement another way of handling the CSV parsing error we encountered above. That is, handle the missing value in a way other than throwing an error.
All material is licensed under the Apache License Version 2.0, January 2004.