forked from microsoft/semantic-kernel
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
98 changed files
with
9,201 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Auto-detect text files, ensure they use LF. | ||
* text=auto eol=lf working-tree-encoding=UTF-8 | ||
* text=auto eol=lf | ||
|
||
# Bash scripts | ||
*.sh text eol=lf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v4.0.1 | ||
hooks: | ||
- id: check-toml | ||
- id: check-yaml | ||
- id: end-of-file-fixer | ||
- id: mixed-line-ending | ||
- repo: https://github.com/psf/black | ||
rev: 22.3.0 | ||
hooks: | ||
- id: black | ||
- repo: https://github.com/PyCQA/isort | ||
rev: 5.12.0 | ||
hooks: | ||
- id: isort | ||
args: ["--profile", "black"] | ||
- repo: https://github.com/pycqa/flake8 | ||
rev: 6.0.0 | ||
hooks: | ||
- id: flake8 | ||
args: ["--config=python/.conf/flake8.cfg"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[flake8] | ||
max-line-length = 88 | ||
extend-ignore = E203 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# To learn more about .editorconfig see https://aka.ms/editorconfigdocs | ||
|
||
# All files | ||
[*] | ||
indent_style = space | ||
end_of_line = lf | ||
|
||
# Docs | ||
[*.md] | ||
insert_final_newline = true | ||
trim_trailing_whitespace = true | ||
|
||
# Config/data | ||
[*.json] | ||
indent_size = 4 | ||
insert_final_newline = false | ||
trim_trailing_whitespace = true | ||
|
||
# Config/data | ||
[*.yaml] | ||
indent_size = 4 | ||
insert_final_newline = true | ||
trim_trailing_whitespace = true | ||
|
||
# Code | ||
[*.py] | ||
indent_size = 4 | ||
insert_final_newline = true | ||
trim_trailing_whitespace = true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
{ | ||
"python.analysis.extraPaths": [ | ||
"./src" | ||
], | ||
"explorer.compactFolders": false, | ||
"prettier.enable": true, | ||
"editor.formatOnType": true, | ||
"editor.formatOnSave": true, | ||
"editor.formatOnPaste": true, | ||
"python.formatting.provider": "autopep8", | ||
"python.formatting.autopep8Args": [ | ||
"--max-line-length=160" | ||
], | ||
"notebook.output.textLineLimit": 500, | ||
"cSpell.words": [ | ||
"aeiou", | ||
"nopep", | ||
"OPENAI", | ||
"skfunction" | ||
], | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# System setup | ||
|
||
To get started, you'll need VSCode and a local installation of Python 3.x. | ||
|
||
You can run: | ||
|
||
python3 --version ; pip3 --version ; code -v | ||
|
||
to verify that you have the required dependencies. | ||
|
||
## If you're on WSL | ||
|
||
Check that you've cloned the repository to `~/workspace` or a similar folder. | ||
Avoid `/mnt/c/` and prefer using your WSL user's home directory. | ||
|
||
Ensure you have the WSL extension for VSCode installed (and the Python extension | ||
for VSCode installed). | ||
|
||
You'll also need `pip3` installed. If you don't yet have a `python3` install in WSL, | ||
you can run: | ||
|
||
```bash | ||
sudo apt-get update && sudo apt-get install python3 python3-pip | ||
``` | ||
|
||
ℹ️ **Note**: if you don't have your PATH setup to find executables installed by `pip3`, | ||
you may need to run `~/.local/bin/poetry install` and `~/.local/bin/poetry shell` | ||
instead. You can fix this by adding `export PATH="$HOME/.local/bin:$PATH"` to | ||
your `~/.bashrc` and closing/re-opening the terminal.\_ | ||
|
||
# LLM setup | ||
|
||
Make sure you have an | ||
[Open AI API Key](https://openai.com/api/) or | ||
[Azure Open AI service key](https://learn.microsoft.com/azure/cognitive-services/openai/quickstart?pivots=rest-api) | ||
|
||
ℹ️ **Note**: Azure OpenAI support is work in progress, and will be available soon. | ||
|
||
Copy those keys into a `.env` file like this: | ||
|
||
``` | ||
OPENAI_API_KEY="" | ||
OPENAI_ORG_ID="" | ||
AZURE_OPENAI_API_KEY="" | ||
AZURE_OPENAI_ENDPOINT="" | ||
``` | ||
|
||
We suggest adding a copy of the `.env` file under these folders: | ||
|
||
- [python/tests](tests) | ||
- [samples/notebooks/python](../samples/notebooks/python). | ||
|
||
# Quickstart with Poetry | ||
|
||
Poetry allows to use SK from the current repo, without worrying about paths, as | ||
if you had SK pip package installed. SK pip package will be published after | ||
porting all the major features and ensuring cross-compatibility with C# SDK. | ||
|
||
To install Poetry in your system: | ||
|
||
pip3 install poetry | ||
|
||
The following command install the project dependencies: | ||
|
||
poetry install | ||
|
||
And the following activates the project virtual environment, to make it easier | ||
running samples in the repo and developing apps using Python SK. | ||
|
||
poetry shell | ||
|
||
To run the same checks that are run during the Azure Pipelines build, you can run: | ||
|
||
poetry run pre-commit run -c .conf/.pre-commit-config.yaml -a | ||
|
||
# VSCode Setup | ||
|
||
Open any of the `.py` files in the project and run the `Python: Select Interpreter` command | ||
from the command palette. Make sure the virtual env (venv) created by `poetry` is selected. | ||
The python you're looking for should be under `~/.cache/pypoetry/virtualenvs/semantic-kernel-.../bin/python`. | ||
|
||
If prompted, install `black` and `flake8` (if VSCode doesn't find those packages, | ||
it will prompt you to install them). | ||
|
||
# Tests | ||
|
||
You should be able to run the example under the [tests](tests) folder. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
# Achieving Feature Parity in Python and C# | ||
|
||
This is a high-level overview of where things stand towards reaching feature parity with the main | ||
[C# codebase](https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel). | ||
|
||
| | | | | ||
|------|------| ------ | ||
| |Python| Notes | | ||
|`./ai/embeddings`| 🔄| Using Numpy for embedding representation. Vector operations not yet implemented | | ||
|`./ai/openai`| 🔄 | Makes use of the OpenAI Python package. AzureOpenAI* not implemented | | ||
|`./configuration`|✅ | Direct port. Check inline docs | | ||
|`./core_skills`| 🔄 | `TextMemorySkill` implemented. Others not | | ||
|`./diagnostics` | ✅ | Direct port of custom exceptions and validation helpers | | ||
|`./kernel_extensions` | 🔄 | Extensions take kernel as first argument and are exposed via `sk.extensions.*` | ||
|`./memory`| 🔄 | Can simplify by relying on Numpy NDArray | ||
|`./planning`| ❌ | Not yet implemented | ||
|`./semantic_functions/partitioning`| ❌ | Not yet implemented | ||
|
||
|
||
## Status of the Port | ||
|
||
The port has a bulk of the Semantic Kernel C# code re-implemented, but is not yet fully complete. Major things like `tests` and `docs` are still missing. | ||
Here is a breakdown by sub-module on the status of this port: | ||
|
||
### `./ai/embeddings` (Partial) | ||
|
||
For now, `VectorOperations` from the original kernel will be skipped. We can use | ||
`numpy`'s `ndarray` as an efficient embedding representation. We can also use | ||
`numpy`'s optimized vector and matrix operations to do things like cosine similarity | ||
quickly and efficiently. | ||
|
||
The `IEmbeddingIndex` interface has been translated to the `EmbeddingIndexBase` abstract | ||
class. The `IEmbeddingGenerator` interface has been translated to the | ||
`embedding_generator_base` abstract class. | ||
|
||
The C# code makes use of extension methods to attach convenience methods to many interfaces | ||
and classes. In Python we don't have that luxury. Instead, these methods are in the corresponding class definition. | ||
(We can revisit this, but for good type hinting avoiding something fancy/dynamic works best.) | ||
|
||
### `./ai/openai` (Partial) | ||
|
||
The abstract clients (`(Azure)OpenAIClientAbstract`) have been ignored here. The `HttpSchema` | ||
submodule is not needed given we have the `openai` package to do the heavy lifting (bonus: that | ||
package will stay in-sync with OpenAI's updates, like the new ChatGPT API). | ||
|
||
The `./ai/openai/services` module is retained and has the same classes/structure. | ||
|
||
#### TODOs | ||
|
||
The `AzureOpenAI*` alternatives are not yet implemented. This would be a great, low difficulty | ||
task for a new contributor to pick up. | ||
|
||
### `./ai` (Complete?) | ||
|
||
The rest of the classes at the top-level of the `./ai` module have been ported | ||
directly. | ||
|
||
**NOTE:** here, we've locked ourselves into getting a _single_ completion | ||
from the model. This isn't ideal. Getting multiple completions is sometimes a great | ||
way to solve more challenging tasks (majority voting, re-ranking, etc.). We should look | ||
at supporting multiple completions. | ||
|
||
**NOTE:** Based on `CompleteRequestSettings` no easy way to grab the `logprobs` | ||
associated with the models completion. This would be huge for techniques like re-ranking | ||
and also very crucial data to capture for metrics. We should think about how to | ||
support this. (We're currently a "text in text out" library, but multiple completions | ||
and logprobs seems to be fundamental in this space.) | ||
|
||
### `./configuration` (Complete?) | ||
|
||
Direct port, not much to do here. Probably check for good inline docs. | ||
|
||
### `./core_skills` (Partial) | ||
|
||
We've implemented the `TextMemorySkill` but are missing the following: | ||
|
||
- `ConversationSummarySkill` | ||
- `FileIOSkill` | ||
- `HttpSkill` | ||
- `PlannerSkill` (NOTE: planner is a big sub-module we're missing) | ||
- `TextSkill` | ||
- `TimeSkill` | ||
|
||
#### TODOs | ||
|
||
Any of these individual core skills would be create low--medium difficulty contributions | ||
for those looking for something to do. Ideally with good docs and corresponding tests. | ||
|
||
### `./diagnostics` (Complete?) | ||
|
||
Pretty direct port of these few custom exceptions and validation helpers. | ||
|
||
### `./kernel_extensions` (Partial) | ||
|
||
This is difficult, for good type hinting there's a lot of duplication. Not having the | ||
convenience of extension methods makes this cumbersome. Maybe, in the future, we may | ||
want to consider some form of "plugins" for the kernel? | ||
|
||
For now, the kernel extensions take the kernel as the first argument and are exposed | ||
via the `sk.extensions.*` namespace. | ||
|
||
### `./memory` (Partial) | ||
|
||
This was a complex sub-system to port. The C# code has lots of interfaces and nesting | ||
of types and generics. In Python, we can simplify this a lot. An embedding | ||
is an `ndarray`. There's lots of great pre-built features that come with that. The | ||
rest of the system is a pretty direct port but the layering can be a bit confusing. | ||
I.e. What's the real difference between storage, memory, memory record, | ||
data entry, an embedding, a collection, etc.? | ||
|
||
#### TODOs | ||
|
||
Review of this subsystem. Lots of good testing. Maybe some kind of overview | ||
documentation about the design. Maybe a diagram of how all these classes and interfaces | ||
fit together? | ||
|
||
### `./orchestration` (Complete?) | ||
|
||
This was a pretty core piece and another direct port. Worth double checking. Needs good docs and tests. | ||
|
||
### `./planning` (TODO: nothing yet) | ||
|
||
Completely ignored planning for now (and, selfishly, planning isn't a priority for | ||
SK-based experimentation). | ||
|
||
### `./reliability` (Complete?) | ||
|
||
Direct port. Nothing much going on in this sub-module. Likely could use more strategies | ||
for retry. Also wasn't quite sure if this was integrated with the kernel/backends? | ||
(Like are we actually using the re-try code, or is it not hit) | ||
|
||
#### TODOs | ||
|
||
Implement a real retry strategy that has backoff perhaps. Make sure this code is integrated | ||
and actually in use. | ||
|
||
### `./semantic_functions` (Complete?) | ||
|
||
Another core piece. The different config classes start to feel cumbersome here | ||
(func config, prompt config, backend config, kernel config, so so much config). | ||
|
||
### `./semantic_functions/partitioning` (TODO: nothing yet) | ||
|
||
Skipped this sub-sub-module for now. Good task for someone to pick up! | ||
|
||
### `./skill_definition` (Complete?) | ||
|
||
Another core piece, another pretty direct port. | ||
|
||
**NOTE:** the attributes in C# become decorators in Python. We probably could | ||
make it feel a bit more pythonic (instead of having multiple decorators have just | ||
one or two). | ||
|
||
**NOTE:** The skill collection, read only skill collection, etc. became a bit | ||
confusing (in terms of the relationship between everything). Would be good to | ||
double check my work there. | ||
|
||
### `./template_engine` (Complete?) | ||
|
||
Love the prompt templates! Have tried some basic prompts, prompts w/ vars, | ||
and prompts that call native functions. Seems to be working. | ||
|
||
**NOTE:** this module definitely needs some good tests. There can be see some | ||
subtle errors sneaking into the prompt tokenization/rendering code here. | ||
|
||
### `./text` (TODO: nothing yet) | ||
|
||
Ignored this module for now. | ||
|
||
### `<root>` (Partial) | ||
|
||
Have a working `Kernel` and a working `KernelBuilder`. The base interface | ||
and custom exception are ported. the `Kernel` in particular | ||
is missing some things, has some bugs, could be cleaner, etc. | ||
|
||
## Overall TODOs | ||
|
||
We are currently missing a lot of the doc comments from C#. So a good review | ||
of the code and a sweep for missing doc comments would be great. | ||
|
||
We also are missing any _testing_. We should figure out how we want to test | ||
(I think this project is auto-setup for `pytest`). | ||
|
||
Finally, we are missing a lot of examples. It'd be great to have Python notebooks | ||
that show off many of the features, many of the core skills, etc. | ||
|
||
|
||
## Design Choices | ||
|
||
We want the overall design of the kernel to be as similar as possible to C#. | ||
We also want to minimize the number of external dependencies to make the Kernel as lightweight as possible. | ||
|
||
Right now, compared to C# there are two key differences: | ||
|
||
1. Use `numpy` to store embeddings and do things like vector/matrix ops | ||
2. Use `openai` to interface with (Azure) OpenAI | ||
|
||
There's also a lot of more subtle differences that come with moving to Python, | ||
things like static properties, no method overloading, no extension methods, etc. |
Oops, something went wrong.