Skip to content

Commit

Permalink
Minor updates - adding Visual Studio references, updating links & sla…
Browse files Browse the repository at this point in the history
…ck channels (#480)

* Update index.md

* Update index.md

* update get started page

* add vscode to create-a-derived-table guidance

* standardise capitalisation in modelling guidance

* add vscode to git guidance

* more minor updates

* make more succinct

---------

Co-authored-by: Holly Furniss <[email protected]>
  • Loading branch information
hollyfurniss-moj and Holly Furniss authored Jan 13, 2025
1 parent 261e21c commit 45f7b7f
Show file tree
Hide file tree
Showing 10 changed files with 25 additions and 22 deletions.
4 changes: 2 additions & 2 deletions source/documentation/data-docs/data-faqs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ Data on the Analytical Platform can largely be split into four categories:

- `Raw`: data that has been uploaded to the Analytical Platform without any changes made to it
- `Curated`: data that has been validated, deduplicated and versioned by the Data Engineers ready to be used by Analytical Platform users
- `Derived`: data that has been [denormalized], aggregated, and turned into a data model to fit specific needs of Analytical Platform users
- `Processed`: data that has been processed by Analytical Platform users to fit their own needs
- `Derived`: data that has been [denormalized] and turned into a data model to fit the general needs of Analytical Platform users
- `Processed`: data that has been processed by Analytical Platform users to fit their specific needs

## Where do I find out what data is already on the Platform?

Expand Down
5 changes: 3 additions & 2 deletions source/documentation/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This guide provides the instructions to set up the main accounts and services yo

- access the Analytical Platform Control Panel
- explore data on the Analytical Platform
- begin developing your application in either JupyterLab or RStudio
- begin developing your application in JupyterLab, RStudio or Visual Studio Code
- contribute to the Analytical Platform User Guidance

## Before you begin
Expand All @@ -26,7 +26,7 @@ A member of the Analytical Platform team will contact you.
For Analytical Platform best practice, you need to follow certain guidelines. Bookmark the following pages and ensure you follow them before you begin using the platform:

- [Acceptable use policy](aup.html): covers the way you should use the Analytical Platform and its associated tools and services
- [Data and Analytical Services Directorate's (DASD) coding standards](https://moj-analytical-services.github.io/our-coding-standards/): principles outlining how you should write and review code
- [Data and Analysis Directorates' coding standards](https://moj-analytical-services.github.io/our-coding-standards/): principles outlining how you should write and review code
- [MoJ Analytical IT Tools Strategy](https://moj-analytical-services.github.io/moj-analytical-it-tools-strategy/): describes recommended ways of working on the Analytical Platform

## 2. Create Slack account
Expand Down Expand Up @@ -57,6 +57,7 @@ Conversations in Slack are organised into channels, which each have a specific t

- the [#analytical-platform-support](https://asdslack.slack.com/archives/C4PF7QAJZ) channel is used for general discussion of the Analytical Platform -- it is also monitored by the Analytical Platform team, who can help out with any technical queries or requests. Also used to request new or existing apps and app data sources
- the [#ask-data-engineering](https://asdslack.slack.com/archives/C8X3PP1TN) channel is used for general discussion of data engineering and for getting in touch with the data engineering team with any technical queries or requests (such as airflow DAG reviews, database access, data discovery tool, etc).
- the [#ask-data-modelling](https://moj.enterprise.slack.com/archives/C03J21VFHQ9) channel is used for general discussion of data modelling and for getting in touch with the analytics engineering team with any technical queries or requests (such as create-a-derived-table model reviews).
- the [#git](https://asdslack.slack.com/archives/C4VF9PRLK), [#r](https://asdslack.slack.com/archives/C1PUCG719) and [#python](https://asdslack.slack.com/archives/C1Q09V86S) channels can be used to get support from other users with any technical queries or questions -- the #[intro_r channel](https://asdslack.slack.com/archives/CGKSJV9HN) is aimed specifically at new users of R
- the [#data_science](https://asdslack.slack.com/archives/C1Z8Q18LS) channel is used for general discussion of data science tools and techniques
- the [#general](https://asdslack.slack.com/archives/C1PTUTC3F) channel is used for any discussions that don’t fit anywhere else
Expand Down
4 changes: 2 additions & 2 deletions source/documentation/github/command-line-git.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Work with git on the command line

The command line is the text interface to your Analytical Platform tools. When googling, it may also be referred to as the shell, terminal, or console (and perhaps other names). In Jupyter, you can get the command line by selecting 'Terminal' from the launcher screen (the + button in the top left of JupyterLab). You can also use all these commands in RStudio by going to Tools -> Terminal -> New Terminal.
The command line is the text interface to your Analytical Platform tools. When googling, it may also be referred to as the shell, terminal, or console (and perhaps other names). In Jupyter, you can get the command line by selecting 'Terminal' from the launcher screen (the + button in the top left of JupyterLab). You can also use all these commands in RStudio by going to Tools -> Terminal -> New Terminal. In Visual Studio Code, it's Terminal -> New Terminal.

Once you are comfortable using the Terminal (in either R Studio or Jupyter) you can run all Git commands from the command line. If you are quite new to the command line, there are a few commands you may find useful to know, in addition to the git commands described later in this section:
Once you are comfortable using the Terminal (in RStudio, Visual Studio Code or Jupyter) you can run all Git commands from the command line. If you are quite new to the command line, there are a few commands you may find useful to know, in addition to the git commands described later in this section:

- `mkdir`: create a new directory/folder
- `cd`: change directory
Expand Down
10 changes: 5 additions & 5 deletions source/documentation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ This site provides instructions on how to configure and use the Analytical Platf

## Intended users

Primarily intended for Data Analysts, in the Data and Analytical Services Directorate, the Analytical Platform also hosts users from:
Primarily intended for Data Analysts and Data Scientists in the Data and Analysis Directorates, the Analytical Platform also hosts users from:
- Criminal Injury Claims (CICA)
- HM Courts & Tribunals Service (HMCTS)
- HM Prison and Probation Service (HMPPS)
- Legal Aid Agency (LAA)
- Office of the Public Guardian (OPG)

If you would like to use the Analytical Platform please contact us via the relevant (support)[https://github.com/ministryofjustice/data-platform-support/issues/new/choose] route.
If you would like to use the Analytical Platform please contact us via the relevant [support](https://github.com/ministryofjustice/data-platform-support/issues/new/choose) route.

### Knowledge requirements

Expand All @@ -39,14 +39,14 @@ In additional to Python and R compatibility, benefits of using the Analytical Pl

- our Data Engineering team converts raw data from operational systems into structures and excerpts
- we hold data files in Amazon S3 for ease of use, to load into your code or run SQL queries directly using Amazon Athena
- users can also upload data to the Analytial Platform from other sources and share them with granular access controls, subject to normal data protection processes; for more information, see [Information governance][information-governance.md]
- users can also upload data to the Analytial Platform from other sources and share them with granular access controls, subject to normal data protection processes; for more information, see [Information governance](https://user-guidance.analytical-platform.service.justice.gov.uk/information-governance.html)

### Reproducible Analysis

The Analytical Platform provides tools to develop reproducible analytical pipelines (RAnalytical Platforms) to automate time–consuming and repetitive tasks, allowing you to focus on interpreting the results with the following elements:
- when datasets are imported into the Analytical Platform, snapshots of them are taken and versioned
- standardised system libraries in GitHub
- a standardised virtual machine that can run RStudio or Jupyter, or code running in an explicitly defined Dockerfile
- a standardised virtual machine that can run RStudio, Visual Studio Code or Jupyter, or code running in an explicitly defined Dockerfile

### Secure Environments

Expand All @@ -67,4 +67,4 @@ The Analytical Platform does not _currently_ provide the following:
- pure data archival: Amazon S3, which the Analytical Platform uses for data storage, does not offer index or search facilities
- we can set up a custom bucket policy to archive data to S3-IA or Glacier but recommend exploring SaaS alternatives, such as SharePoint or Google Drive

If you would like to raise a feature request this can be done (here)[https://github.com/ministryofjustice/data-platform/issues/new/choose].
If you would like to raise a feature request this can be done [here](https://github.com/ministryofjustice/data-platform/issues/new/choose).
2 changes: 1 addition & 1 deletion source/documentation/support.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Summary

- The Analytical Platform team in MOJ Digital & Technology is responsible for providing access to software like R Studio and Jupyter.
- The Analytical Platform team in MOJ Digital & Technology is responsible for providing access to software like R Studio, Visual Studio Code and Jupyter.
- Analysts themselves are responsible for the code they write in these tools, and the Platform team is not responsible for assisting with problems with this code. As a rule of thumb, if the problem you're experiencing would also occur with R Studio or Python installed on a standalone computer, then the platform team don't offer support.
- If you've read through the user guidance and are still stuck, the best place to go for support is the [#analytical-platform-support](https://app.slack.com/client/T1PU1AP6D/C4PF7QAJZ) Slack channel for issues with the platform, or the [#r](https://app.slack.com/client/T1PU1AP6D/C1PUCG719) channel and [#python](https://app.slack.com/client/T1PU1AP6D/C1Q09V86S) channel to get support from peers in your code.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Rstudio Set Up
# Interactive Development Environment (IDE) Set Up

You'll need an interactive development environment (IDE) to interact with the repository and write your SQL and YAML code, and a Python virtual environment for dbt to run in. The following sections will show you how to set that up. It's worth noting at this point that you'll just be using RStudio as an IDE to interact with the repository (and git), write SQL and YAML code, and to run dbt commands from the terminal in a Python virtual environment. There is no R programming going on. We're currently not planning to get Create a Derived Table up and running with JupyterLab, as the RStudio IDE is sufficient.
You'll need an interactive development environment (IDE) to interact with the repository and write your SQL and YAML code, and a Python virtual environment for dbt to run in. The following sections will show you how to set that up in RStudio or Visual Studio Code.

It's worth noting at this point that you'll just be using RStudio or Visual Studio Code as an IDE to interact with the repository (and git), write SQL and YAML code, and to run dbt commands from the terminal in a Python virtual environment. There is no R programming going on. We're currently not planning to get Create a Derived Table up and running with JupyterLab, as the RStudio and Visual Studio Code IDEs are sufficient.


## Clone the repository using the RStudio GUI
Expand Down
2 changes: 1 addition & 1 deletion source/documentation/tools/create-a-derived-table/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Create a Derived Table is a tool for creating persistent derived tables in Athen
## Getting Started

- [Database Access](/tools/create-a-derived-table/database-access)
- [RStudio Set Up](/tools/create-a-derived-table/rstudio-set-up)
- [RStudio Set Up](/tools/create-a-derived-table/ide-set-up)
- [Collaborating with Git](/tools/create-a-derived-table/collaborating-with-git)

## Planning Models
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

# How we structure our create-a-derived-table projects

## 1-Guide-overview
## 1-guide-overview

### What is Analytics Engineering?

Expand Down Expand Up @@ -96,7 +96,7 @@ These concepts can be confusing, so if you have any further questions or just wa

Finally, it is worth flagging that create-a-derived-table has been evolving since its conception and for that reason project structure guidance, style guidance and best practice have all changed on several occasions. So when you start looking around the code base on create-a-derived-table you may find that code style, project structure or naming conventions don't match the guidance here. It will be a continuos effort from all that use it to slowly conform create-a-derived-table.

## 2-Staging
## 2-staging

The staging layer is where our journey begins. This is the foundation of our project, where we bring all the individual components we're going to use to build our more complex and useful models into the project.

Expand Down Expand Up @@ -484,7 +484,7 @@ select * from pivot
>**💡 TIP**
>Narrow the DAG, widen the tables. Until we get to the marts layer and start building our various outputs, we ideally want our DAG to look like an arrowhead pointed right. As we move from source-conformed to business-conformed, we’re also moving from numerous, narrow, isolated concepts to fewer, wider, joined concepts. We’re bringing our components together into wider, richer concepts, and that creates this shape in our DAG. This way when we get to the marts layer we have a robust set of components that can quickly and easily be put into any configuration to answer a variety of questions and serve specific needs. One rule of thumb to ensure you’re following this pattern on an individual model level is allowing multiple _inputs_ to a model, but **not** multiple _outputs_. Several arrows going _into_ our post-staging models is great and expected, several arrows coming _out_ is a red flag. There are absolutely situations where you need to break this rule, but it’s something to be aware of, careful about, and avoid when possible.

## 4-Datamarts
## 4-datamarts

### Datamarts: Introduction

Expand Down Expand Up @@ -593,7 +593,7 @@ select * from select_business_area

- **Troubleshoot via tables.** While stacking views and ephemeral models up until our marts — only building data into the warehouse at the end of a chain when we have the models we really want end users to work with — is ideal in production, it can present some difficulties in development. Particularly, certain errors may seem to be surfacing in our later models that actually stem from much earlier dependencies in our model chain (ancestor models in our DAG that are built before the model throws the errors). If you’re having trouble pinning down where or what a database error is telling you, it can be helpful to temporarily build a specific chain of models as tables so that the warehouse will throw the error where it’s actually occurring.

## 5-Derived
## 5-derived

### Derived: Introduction

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@ This list comprises everything you need to do and consider to get set up and rea

9. If an MoJ Analytical Platform database is not listed as a source in [source_database_names.txt](https://github.com/moj-analytical-services/create-a-derived-table/blob/main/scripts/source_database_names.txt) then you can add it, see [Adding a new source](/tools/create-a-derived-table/source-and-ref-functions#adding-a-new-source).

10. Set up the RStudio IDE; set up a project and clone the repo into it. See [Set up the RStudio working environment](/tools/create-a-derived-table/rstudio-set-up) for GUI instructions. Using Terminal navigate to where you want the `create-a-derived-table` project to sit and run `git clone [email protected]:moj-analytical-services/create-a-derived-table.git`.
10. Set up an Interactive Development Environment (IDE); set up a project and clone the repo into it. See [Set up an IDE](/tools/create-a-derived-table/ide-set-up) for GUI instructions. Using Terminal navigate to where you want the `create-a-derived-table` project to sit and run `git clone [email protected]:moj-analytical-services/create-a-derived-table.git`.

11. Navigate to the `create-a-derived-table` directory in Terminal and set up a Python virtual environment; activate it, upgrade pip, and install requirements. See [Setting up a Python virtual environment](/tools/create-a-derived-table/rstudio-set-up#setting-up-a-python-virtual-environment) and [Virtual environment set up](#virtual-environment-set-up).
11. Navigate to the `create-a-derived-table` directory in Terminal and set up a Python virtual environment; activate it, upgrade pip, and install requirements. See [Setting up a Python virtual environment](/tools/create-a-derived-table/ide-set-up#setting-up-a-python-virtual-environment) and [Virtual environment set up](#virtual-environment-set-up).


12. Use Github Workflow method to collaborate on a project. Branch off `main` and create a main branch for your project, `project-name-main`; all subsequent developers should branch off `project-name-main` to create feature branches for this project. When raising a PR ensure you merge into this branch, before merging into `main`; the PR summary should read something like "`github-user` wants to merge *X* commits into `project-name-main` from `project-name-feature-branch`". See also [Collaborating with Git](/tools/create-a-derived-table/collaborating-with-git#collaborating-with-git) and [Git commands](#git-commands).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ owner_slack: "#ask-data-modelling"
owner_slack_workspace: "mojdt"
---

<%= partial 'documentation/tools/create-a-derived-table/rstudio-set-up' %>
<%= partial 'documentation/tools/create-a-derived-table/ide-set-up' %>

0 comments on commit 45f7b7f

Please sign in to comment.