Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
dluc committed Apr 13, 2023
1 parent fdb3890 commit 655c583
Show file tree
Hide file tree
Showing 98 changed files with 9,201 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Auto-detect text files, ensure they use LF.
* text=auto eol=lf working-tree-encoding=UTF-8
* text=auto eol=lf

# Bash scripts
*.sh text eol=lf
2 changes: 1 addition & 1 deletion docs/PLANNER.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ like "I want a job promotion."
The planner will operate within the skills it has available. In the event that a
desired skill does not exist, the planner can suggest you to create the skill.
Or, depending upon the level of complexity the kernel can help you write the missing
skill.
skill.
22 changes: 22 additions & 0 deletions python/.conf/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: mixed-line-ending
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
args: ["--config=python/.conf/flake8.cfg"]
3 changes: 3 additions & 0 deletions python/.conf/flake8.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
max-line-length = 88
extend-ignore = E203
29 changes: 29 additions & 0 deletions python/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# To learn more about .editorconfig see https://aka.ms/editorconfigdocs

# All files
[*]
indent_style = space
end_of_line = lf

# Docs
[*.md]
insert_final_newline = true
trim_trailing_whitespace = true

# Config/data
[*.json]
indent_size = 4
insert_final_newline = false
trim_trailing_whitespace = true

# Config/data
[*.yaml]
indent_size = 4
insert_final_newline = true
trim_trailing_whitespace = true

# Code
[*.py]
indent_size = 4
insert_final_newline = true
trim_trailing_whitespace = true
21 changes: 21 additions & 0 deletions python/.vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"python.analysis.extraPaths": [
"./src"
],
"explorer.compactFolders": false,
"prettier.enable": true,
"editor.formatOnType": true,
"editor.formatOnSave": true,
"editor.formatOnPaste": true,
"python.formatting.provider": "autopep8",
"python.formatting.autopep8Args": [
"--max-line-length=160"
],
"notebook.output.textLineLimit": 500,
"cSpell.words": [
"aeiou",
"nopep",
"OPENAI",
"skfunction"
],
}
87 changes: 87 additions & 0 deletions python/DEV_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# System setup

To get started, you'll need VSCode and a local installation of Python 3.x.

You can run:

python3 --version ; pip3 --version ; code -v

to verify that you have the required dependencies.

## If you're on WSL

Check that you've cloned the repository to `~/workspace` or a similar folder.
Avoid `/mnt/c/` and prefer using your WSL user's home directory.

Ensure you have the WSL extension for VSCode installed (and the Python extension
for VSCode installed).

You'll also need `pip3` installed. If you don't yet have a `python3` install in WSL,
you can run:

```bash
sudo apt-get update && sudo apt-get install python3 python3-pip
```

ℹ️ **Note**: if you don't have your PATH setup to find executables installed by `pip3`,
you may need to run `~/.local/bin/poetry install` and `~/.local/bin/poetry shell`
instead. You can fix this by adding `export PATH="$HOME/.local/bin:$PATH"` to
your `~/.bashrc` and closing/re-opening the terminal.\_

# LLM setup

Make sure you have an
[Open AI API Key](https://openai.com/api/) or
[Azure Open AI service key](https://learn.microsoft.com/azure/cognitive-services/openai/quickstart?pivots=rest-api)

ℹ️ **Note**: Azure OpenAI support is work in progress, and will be available soon.

Copy those keys into a `.env` file like this:

```
OPENAI_API_KEY=""
OPENAI_ORG_ID=""
AZURE_OPENAI_API_KEY=""
AZURE_OPENAI_ENDPOINT=""
```

We suggest adding a copy of the `.env` file under these folders:

- [python/tests](tests)
- [samples/notebooks/python](../samples/notebooks/python).

# Quickstart with Poetry

Poetry allows to use SK from the current repo, without worrying about paths, as
if you had SK pip package installed. SK pip package will be published after
porting all the major features and ensuring cross-compatibility with C# SDK.

To install Poetry in your system:

pip3 install poetry

The following command install the project dependencies:

poetry install

And the following activates the project virtual environment, to make it easier
running samples in the repo and developing apps using Python SK.

poetry shell

To run the same checks that are run during the Azure Pipelines build, you can run:

poetry run pre-commit run -c .conf/.pre-commit-config.yaml -a

# VSCode Setup

Open any of the `.py` files in the project and run the `Python: Select Interpreter` command
from the command palette. Make sure the virtual env (venv) created by `poetry` is selected.
The python you're looking for should be under `~/.cache/pypoetry/virtualenvs/semantic-kernel-.../bin/python`.

If prompted, install `black` and `flake8` (if VSCode doesn't find those packages,
it will prompt you to install them).

# Tests

You should be able to run the example under the [tests](tests) folder.
199 changes: 199 additions & 0 deletions python/FEATURE_PARITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# Achieving Feature Parity in Python and C#

This is a high-level overview of where things stand towards reaching feature parity with the main
[C# codebase](https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel).

| | | |
|------|------| ------
| |Python| Notes |
|`./ai/embeddings`| 🔄| Using Numpy for embedding representation. Vector operations not yet implemented |
|`./ai/openai`| 🔄 | Makes use of the OpenAI Python package. AzureOpenAI* not implemented |
|`./configuration`|| Direct port. Check inline docs |
|`./core_skills`| 🔄 | `TextMemorySkill` implemented. Others not |
|`./diagnostics` || Direct port of custom exceptions and validation helpers |
|`./kernel_extensions` | 🔄 | Extensions take kernel as first argument and are exposed via `sk.extensions.*`
|`./memory`| 🔄 | Can simplify by relying on Numpy NDArray
|`./planning`| ❌ | Not yet implemented
|`./semantic_functions/partitioning`| ❌ | Not yet implemented


## Status of the Port

The port has a bulk of the Semantic Kernel C# code re-implemented, but is not yet fully complete. Major things like `tests` and `docs` are still missing.
Here is a breakdown by sub-module on the status of this port:

### `./ai/embeddings` (Partial)

For now, `VectorOperations` from the original kernel will be skipped. We can use
`numpy`'s `ndarray` as an efficient embedding representation. We can also use
`numpy`'s optimized vector and matrix operations to do things like cosine similarity
quickly and efficiently.

The `IEmbeddingIndex` interface has been translated to the `EmbeddingIndexBase` abstract
class. The `IEmbeddingGenerator` interface has been translated to the
`embedding_generator_base` abstract class.

The C# code makes use of extension methods to attach convenience methods to many interfaces
and classes. In Python we don't have that luxury. Instead, these methods are in the corresponding class definition.
(We can revisit this, but for good type hinting avoiding something fancy/dynamic works best.)

### `./ai/openai` (Partial)

The abstract clients (`(Azure)OpenAIClientAbstract`) have been ignored here. The `HttpSchema`
submodule is not needed given we have the `openai` package to do the heavy lifting (bonus: that
package will stay in-sync with OpenAI's updates, like the new ChatGPT API).

The `./ai/openai/services` module is retained and has the same classes/structure.

#### TODOs

The `AzureOpenAI*` alternatives are not yet implemented. This would be a great, low difficulty
task for a new contributor to pick up.

### `./ai` (Complete?)

The rest of the classes at the top-level of the `./ai` module have been ported
directly.

**NOTE:** here, we've locked ourselves into getting a _single_ completion
from the model. This isn't ideal. Getting multiple completions is sometimes a great
way to solve more challenging tasks (majority voting, re-ranking, etc.). We should look
at supporting multiple completions.

**NOTE:** Based on `CompleteRequestSettings` no easy way to grab the `logprobs`
associated with the models completion. This would be huge for techniques like re-ranking
and also very crucial data to capture for metrics. We should think about how to
support this. (We're currently a "text in text out" library, but multiple completions
and logprobs seems to be fundamental in this space.)

### `./configuration` (Complete?)

Direct port, not much to do here. Probably check for good inline docs.

### `./core_skills` (Partial)

We've implemented the `TextMemorySkill` but are missing the following:

- `ConversationSummarySkill`
- `FileIOSkill`
- `HttpSkill`
- `PlannerSkill` (NOTE: planner is a big sub-module we're missing)
- `TextSkill`
- `TimeSkill`

#### TODOs

Any of these individual core skills would be create low--medium difficulty contributions
for those looking for something to do. Ideally with good docs and corresponding tests.

### `./diagnostics` (Complete?)

Pretty direct port of these few custom exceptions and validation helpers.

### `./kernel_extensions` (Partial)

This is difficult, for good type hinting there's a lot of duplication. Not having the
convenience of extension methods makes this cumbersome. Maybe, in the future, we may
want to consider some form of "plugins" for the kernel?

For now, the kernel extensions take the kernel as the first argument and are exposed
via the `sk.extensions.*` namespace.

### `./memory` (Partial)

This was a complex sub-system to port. The C# code has lots of interfaces and nesting
of types and generics. In Python, we can simplify this a lot. An embedding
is an `ndarray`. There's lots of great pre-built features that come with that. The
rest of the system is a pretty direct port but the layering can be a bit confusing.
I.e. What's the real difference between storage, memory, memory record,
data entry, an embedding, a collection, etc.?

#### TODOs

Review of this subsystem. Lots of good testing. Maybe some kind of overview
documentation about the design. Maybe a diagram of how all these classes and interfaces
fit together?

### `./orchestration` (Complete?)

This was a pretty core piece and another direct port. Worth double checking. Needs good docs and tests.

### `./planning` (TODO: nothing yet)

Completely ignored planning for now (and, selfishly, planning isn't a priority for
SK-based experimentation).

### `./reliability` (Complete?)

Direct port. Nothing much going on in this sub-module. Likely could use more strategies
for retry. Also wasn't quite sure if this was integrated with the kernel/backends?
(Like are we actually using the re-try code, or is it not hit)

#### TODOs

Implement a real retry strategy that has backoff perhaps. Make sure this code is integrated
and actually in use.

### `./semantic_functions` (Complete?)

Another core piece. The different config classes start to feel cumbersome here
(func config, prompt config, backend config, kernel config, so so much config).

### `./semantic_functions/partitioning` (TODO: nothing yet)

Skipped this sub-sub-module for now. Good task for someone to pick up!

### `./skill_definition` (Complete?)

Another core piece, another pretty direct port.

**NOTE:** the attributes in C# become decorators in Python. We probably could
make it feel a bit more pythonic (instead of having multiple decorators have just
one or two).

**NOTE:** The skill collection, read only skill collection, etc. became a bit
confusing (in terms of the relationship between everything). Would be good to
double check my work there.

### `./template_engine` (Complete?)

Love the prompt templates! Have tried some basic prompts, prompts w/ vars,
and prompts that call native functions. Seems to be working.

**NOTE:** this module definitely needs some good tests. There can be see some
subtle errors sneaking into the prompt tokenization/rendering code here.

### `./text` (TODO: nothing yet)

Ignored this module for now.

### `<root>` (Partial)

Have a working `Kernel` and a working `KernelBuilder`. The base interface
and custom exception are ported. the `Kernel` in particular
is missing some things, has some bugs, could be cleaner, etc.

## Overall TODOs

We are currently missing a lot of the doc comments from C#. So a good review
of the code and a sweep for missing doc comments would be great.

We also are missing any _testing_. We should figure out how we want to test
(I think this project is auto-setup for `pytest`).

Finally, we are missing a lot of examples. It'd be great to have Python notebooks
that show off many of the features, many of the core skills, etc.


## Design Choices

We want the overall design of the kernel to be as similar as possible to C#.
We also want to minimize the number of external dependencies to make the Kernel as lightweight as possible.

Right now, compared to C# there are two key differences:

1. Use `numpy` to store embeddings and do things like vector/matrix ops
2. Use `openai` to interface with (Azure) OpenAI

There's also a lot of more subtle differences that come with moving to Python,
things like static properties, no method overloading, no extension methods, etc.
Loading

0 comments on commit 655c583

Please sign in to comment.