-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mirascope v2.0
Roadmap
#896
Comments
I like this roadmap, and its clear focus on making Mirascope the standard interface for building on top of LLMs. Especially since intense competition and fast-follow dynamics are pushing foundational models towards commoditization, there's very clear value in building standards one layer up. It positions users of Mirascope as seamless beneficiaries of that competition and commoditization. If I'm understanding correctly, there's a key shift in focus between 1.x and 2.0, in 1.x was still "provider centric", but focused on providing a consistent patterns for working with those providers in a way that facilitates portability. Whereas in 2.0, we're writing a generic interface for LLMs in general, and the providers are an implementation detail that should mostly live under the hood. Building on that (and on some offline conversations), I propose that Mirascope 2.0 reorient from being "provider agnostic" to "model agnostic", treating the model (and their capabilities) as the core abstraction, and the provider as an implementation detail. This makes it natural to do things like mix and match models models from different providers. Thus, you don't care whether something is an "openai model" or an "anthropic model": you care whether it implements the specific Mirascope interfaces you depend on, like At an API level, we would stop having pairs of The key benefits of focusing on models not providers are realized if we can encode model capabilities into the type signature. That way you can swap between models confidently, knowing the type system will warn you if your new model choice is missing capabilities that your app needs. It would enable users to very precisely manage their location on the cross model cost/performance tradeoff spectrum, since they could potentially shop across providers for the cheapest image model, the cheapest audio model, etc. (And on the Lilypad / managed generation side, we could automatically do that for them, while ensuring that their performance on the automated evals stays high.) Tracking per-model type signatures may be painful on the type declaration side (a la lines 47-3484 of llm/_override.py). However I think it's addressable with code generation, and worthwhile if we can manage it. Since the general 2.0 roadmap seems oriented on "improve 1.X in preparation for landing 2.0", I think we could start by auto-generating type signatures as needed for the providers in 1.X, and see how we feel about it. Beyond that, just reflecting on what I see (and like) in the roadmap you've laid out:
|
I'm glad this is well received! I will compile these notes into the main description and add an issue to the list of This is certainly non-trivial work that I believe is worthwhile as you've mentioned. |
After some additional thought, I believe we should view the goal for Mirascope
I imagine moving from dataclasses import dataclass
from mirascope import lilypad, llm
lilypad.configure()
@dataclass
class Deps:
user_name: str
def escalate(ctx: llm.AgentContext[Deps]) -> str: ...
@llm.agent(
deps_type=Deps,
model="google:gemini-2.0-flash",
tools=[escalate],
)
def support_bot(ctx: llm.AgentContext[Deps]) -> str:
return f"You are a customer support bot helping {ctx.deps.user_name}"
deps = Deps(user_name="William")
response = support_bot("I'm unable to access my account", deps=deps)
print(response.content) Here, We could also then just use Mirascope type directly for managed generations: from mirascope import lilypad, llm
@lilypad.generation(managed=True)
def recommend_book(genre: str) -> llm.CallResponse: ...
response = recommend_book("fantasy")
print(response.content) This way we don't have to build Lilypad wrappers around Mirascope stuff and can instead just use Mirascope directly. We could also then likely implement things like Managed Agents that enable building Agents in a fully no-code way. Of course, I also imagine that lilypad deploy --agent support_bot.py and it would deploy the bot to the platform. For integrations, we would support e.g. MCP. Here, I imagine using MCP Community (or any other MCP server) with the Mirascope MCP Client lilypad deploy --mcp duckduckgo where this would deploy the MCP server to the Lilypad platform. We could also support deploying the server directly from the platform rather than the CLI if that makes sense. It may also make sense to push MCP Community inside of the from mirascope import llm, mcp
@llm.agent(
model="google:gemini-2.0-flash",
tools=[mcp.DuckDuckGo],
)
def bot() -> str:
return "You are a web search agent."
response = support_bot("Any recent news about LLMs?")
print(response.content) This structure (imo) makes everything cleaner and much more clear. Furthermore, the tight coupling would open the door for some really interesting functionality e.g.
Curious to hear what people think about this. |
When I first read this, I thought the plan was to migrate the full lilypad codebase into First, we keep all the Mirascope 2.0 goals from the top of the issue:
Then, we build on that minimzed footprint by integrating the Mirascope-specific Lilypad SDK directly into
Under the hood, this implies a restructure for Lilypad too—which refactors it to no longer depend on Mirascope itself, but instead to provide the underlying APIs needed for the Mirascope-Lilypad SDK that lives in This has the effect of more clearly positioning Mirascope-the-library as the top of the funnel into (optional, but very well supported) Lilypad usage, and folding Lilypad into the Mirascope brand, rather than it appearing to users as kind of a separate thing. So Lilypad is a key part of "the Mirascope platform" essentially. |
This hits the nail on the head and provides necessary additional clarity. Thank you! I imagine we'll do something like:
The last bullet here is key. The Lilypad API would not be able to provide versioning beyond accepting the code for versioning. The actual closure computation is language-specific, so even if using provider SDKs directly (e.g. OpenAI), I think it makes sense for the Also, tightly integrating e.g. automatic versioning with the |
Why
v2.0
major version bump?There are some items on my laundry list of TODOs for Mirascope that are breaking changes, so I've been ignoring them / pushing them off until the time felt right to do a big push and release a new major version (which is a lot of work).
Namely:
core
in favor ofllm.call
as the defaultBase
from anything where users are not subclassingBaseMessageParam
but not subclassing it, soMessageParam
should sufficeBaseDynamicConfig
etc.pydantic
an optional dependencyWhy push for
llm
as the default?Originally we implemented Mirascope such that it would be easy to switch providers, but the interface was not truly agnostic. You would need to use higher-order functions (decorators) dynamically on
@prompt_template
decorated functions. This is not great.We are making strides towards a truly provider-agnostic interface with
llm.call
andllm.override
etc. but still rely on the original provider-specific calls followed by conversion when requested or on construction. This results in a bunch of unnecessary compute time spent on all of the provider-specific class creation rather than just always converting everything to a common type.If we're hoping to build the standardized base interface for building with LLMs, then everything must be provider-agnostic by default.
Accessing provider-specific features should be possible with minimal changes to existing code and only necessary when Mirascope does not natively support such a feature (e.g. OpenAI releases a new feature that only they support).
To me, this means that for almost all use-cases,
llm.call
and otherllm
module methods (e.g.override
) should be sufficient (and in fact be the right solution). We can then implement new/different interfaces that work withllm
or are separate for things that are provider-specific.For example, right now if a user wants to use a custom client (e.g. to access Vertex through the
google
module), they need to learn about thegenai
package. Instead, we could do something likellm.client(provider="google")
that overloads based onprovider
and provides any additional arguments they may need (such asvertexai = True
). When using thellm.override
orllm.context
methods, we would setclient
toNone
if the provider is different and no client is provided (which would use the default client internally for the given provider).There are only a few places where there are currently provider-specific features that we support that don't really fit into the provider-agnostic interfaces as currently designed, namely custom messages, strict structured outputs, and Anthropic prompt caching. For things like images/audio/video, I think our current approach is good (namely support a provider-agnostic interface but raise errors for providers that don't support it).
The purpose of custom messages is to enable accessing newly released provider-specific features while still being able to take advantage of the rest of the Mirascope eco-system. For example, when OpenAI released GPT-4-Vision, users could write prompts with images using custom messages but still take advantage of e.g. Response Models. I still think it's really important to support this, but I think users would still want provider-agnostic support downstream as mentioned. We could do something like allow provider-specific config return types (e.g.
OpenAIDynamicConfig
) that allow provider-specific messages, and then if we detect anllm.context
with a different provider we raise a runtime user error saying that the function is provider-specific and cannot use a different provider. This would then ensure that all of the other override features (e.g.json_mode
orstream
etc.) are still available even in the provider-specific case.For strict structured outputs, I think we could solve this by implementing an additional
structured
orparse
decorator that implements provider-agnostic support for strict structured outputs (and only allows forprovider
settings that support it, such as OpenAI, Gemini, potentially Outlines in the future, etc.). Another option would be to differentiate betweenResponseModel
andStrictResponseModel
and only allow certain providers to acceptStrictResponseModel
in the type hint overloads. This requires some of the stuff around making Pydantic optional, so see below for more details.For Anthropic prompt caching, I think we can just keep what we have (i.e.
CacheControlPart
) as a provider agnostic way of implementing cache controls, and then we can raise a runtime user error if it's used with a non-Anthropic model. I feel this is the right path since there are other providers (such as Bedrock and Vertex) that also technically support this when using Claude models on their platforms, so it's not truly Anthropic-only (it's Claude-only).I haven't yet figured everything out here, and there's a lot to figure out, but this is the general direction I want to take the library.
Performance matters
LLM API calls are slow. Mirascope should not make them any slower than necessary. As part of this major version push, we should strive for the best performance we can. Constructing a call should be as fast as possible. It should only happen once. Data should only be validated when absolutely necessary. Data should be restructured / formatted only when necessary. Classes should be created only when necessary. Why should we create an
OpenAICallResponse
through theopenai.call
decorator under the hood that ultimately becomes aCallResponse
instance when using thellm.call
decorator. We should just start fromCallResponse
.In fact, why is
CallResponse
a Pydantic model at all? What are we validating? Information that's already been validated by the provider-specific models. We should be using something likeattrs
instead to provide the same interface without the overhead. We could then provide additional support for e.g. serialization throughcattrs
, which would also make it much easier for users to implement their own custom serialization logic on top ofCallResponse
.Why the shift toward "only having to learn Mirascope"?
I think it's extremely important that users who are learning Mirascope only need to learn Mirascope unless otherwise absolutely necessary. The fewer concepts needed to get started -- the less a user has to learn to find Mirascope valuable -- the better.
For example, why should a user who wants to use a custom client have to learn what client to import from a provider-specific package? Users should not need to learn about provider-specific SDK types unless accessing truly provider-specific features that require them. A good example of this would be using provider-specific call parameters (e.g.
OpenAICallParams
) rather thanCommonCallParams
because a certain call parameter is only supported by a certain provider (e.g. Google's safety configuration options). Here, it's necessary that a user learn this, and they have already likely learned it since they discovered it was possible most likely by reading the provider-specific documentation.And that's just for LLM providers. What about a user who wants to structure their outputs? Why should they have to learn about Pydantic if they've never heard about it before and just want to use a dataclass? Sure, if a user knows about Pydantic and wants to take advantage of certain validation or serlialization features for Response Models, by all means that should be supported and possible. But it should be optional and not the default requirement that you learn Pydantic. We should opt for the default to use the Python everyone already knows and loves. Everything else should be opt-in.
What's wrong with the current naming conventions?
There's nothing inherently wrong with them. But I care a lot about semantics. Things should be immediately clear just from their naming. For example, the
Tool
class in thellm
module is used only for the actual structured tool output. Users should useBaseTool
for defining tools (which makes the use ofBase
make sense as a parent class users should subclass).In this vein, it makes sense to remove
Base
from all things that we generally don't expect or recommend subclassing. For example, we could haveBaseDynamicConfig
internally for supporting provider specific configs likeOpenAIDynamicConfig
, but a user using thellm.call
decorator with dynamic configuration should just usellm.DynamicConfig
. Same is true for e.g.CommonCallParams
->CallParams
.While not necessarily a huge deal for people, naming matters to me and I think it's worth the additional thought.
Why make Pydantic an optional dependency?
I think a better first question to ask is whether or not Pydantic is necessary. Don't get me wrong. Pydantic is great. The library has done a tremendous amount of good for Python and especially LLM-powered Python.
But it's a lot of overhead. As mentioned earlier, we shouldn't be using Pydantic for validation when we don't need validation. That's just additional compute spent for no reason.
Everything we receive from the LLM provider APIs has already been validated. We don't need to validate it again. For things like
model_dump
and other serialization, we can implement more native support through libraries such asattrs
andcattrs
without the additional overhead. This also means that users can more easily implement their own custom serialization without the overhead.If, for some reason, users really want things like
CallResponse
to support Pydantic, we can just implement something likePydanticCallResponse
that can be easily constructed from aCallResponse
instance, but again I would only want to implement this if it's actually useful / desired.Response Models are a different story. For most cases, Pydantic is overkill. If we're just validating the types,
attrs
andcattrs
are sufficient, and we can easily push those under-the-hood through something like anllm.response_model
decorator that converts an object into aResponseModel[OriginalClass]
type. This would also support something likeStrictResponseModel[OriginalClass]
through e.g.llm.response_model(strict=True)
as mentioned before.This is in line with the principle of only having to learn Mirascope. If we support
llm.response_model
, then Response Models become a Mirascope-specific thing (and not a Pydantic thing).We would then of course add opt-in support for using Pydantic with Response Models so that users could take advantage of their additional validation features. For example, we could add the
mirascope[pydantic]
and then just allowresponse_model
to accept a PydanticBaseModel
type definition. In the Learn section, we could put this at the very end with a link to Pydantic for those who want to learn more, but the core Response Model features would be Mirascope-specific from the user's perspective.Roadmap
All-in-all I'm excited about this direction. There are a few items we should complete first as part of
v1.x
that are not breaking changes. Implementing them as part ofv1.x
will also give us the opportunity to see if we like the interfaces, and if there are breaking changes we want to make we can do so as part of thev2.0
push once we've identified them.I think the roadmap for this work breaks down as follows:
Remaining
v1.x
Implementation Goalsstrict=True
response models)mirascope.tools
-> MCP Community #904 I'm thinking that instead of premade tools we should implement them as MCP servers. This would make the tools useable even if not using Mirascope, which I think is important. We might even want to make this a separate library (e.g.mirascope-mcp
). I will mark this item as done once I have had a chance to create a new issue around this idea (which I will then add to thev2.0
goals)v1.x
is important also since it gives us the freedom to implement breaking changes as part ofv2.0
if necessary.tools
andresponse_model
together #756 and Update documentation to usellm.call
as the default everywhere that makes sense #811 will provide nearly all of the documentation changes we will want forv2.0
but are not blocked by it. We can leave any additional documentation updates (such as renaming updates) to thev2.0
goals, which I've included as an item below.v1.x
in case we identify breaking changes we want.llm.context
: ContextManager for overriding llm call parameters #884 is another key step towardv2.0
but is not blocked by it. There are still a lot of design decisions to make here and things to figure out. For example, it does not currently seem possible to usellm.context
andctx.apply
to properly update type hint overrides because we do not have access to the original return type (which we need if no structural overrides are applied). We could probably do something like allow sending the original function as an optional argument tollm.context
such that we can properly type it, and if the user callsctx.apply
on their function but don't provide it tollm.context
then the return type will beUnknown
or something.logfire
with Mirascope, they should be importing the instrumentation fromlogfire
and notmirascope
. If something is wrong with Mirascope, then we can make that update to Mirascope and all other things will benefit from the change/fix. If something is wrong with e.g.logfire
, then we can make that update to Logfire and all other things will benefit from the change/fix.:document
tag and parts for Anthropic. It's worth making sure this can be provider-agnostic like other features (e.g. images).Once we complete these items, we will freeze the
v1.x
minor version while we start work onv2.0
. Of course, we will continue to implement bug fixes as necessary, which we will then merge over into thev2
development branch.v2.0
Implementation GoalsAs things progress and become more clear, I will likely convert the below roadmap into sub-issues that are individually fleshed out and implemented. I think this will provide necessarily clarity around the individual components we implement (and also make reviewing the work easier).
llm
module as the default, updatingcore
to provide utilities rather than provider-specific call decorators. I imagine this would be somewhat similar to ourcosts
module where instead of having provider-specific modules we have utility specific modules that accept aprovider
argument and then route to the correct provider-specific utility (such as message conversion). This will of course require import suppression etc.Base
etc.)v1.x
and then run onv2.0
to ensure we're actually optimizing performance. It's important that we build the benchmark around the performance metrics that really matter.llm.call
as the default everywhere that makes sense #811). Push all things provider-specific into it's own Provider Specific Usage page or something.attrs
andcattrs
.llm.response_model
decorator is the default.BaseModel
when implementing an eval or agent -- just usellm.response_model
or implement an__init__
.gemini
andvertex
providers in favor of thegoogle
provider only.I will further flesh out the above
v2.0
roadmap into sub-issues once things become more clear around specifics of implementation details and plan.Feedback
Any and all feedback, comments, questions, etc. is welcome with open arms!
I will compile everything here as makes sense and update the roadmap in accordance.
Compiled Notes
v2.0
should be "model-agnostic" (rather than "provider-agnostic"). This distinction is important. The provider is an implementation detail (e.g. the same exact model can be hosted on various providers). From the user's perspective, it's really the model's capabilities that matter, and we need to make sure we properly handle and type things on that (e.g. enable passing in reasoning options if the model supports reasoning).v1.x
.Final Notes
Let's make Mirascope the standard interface for building with LLMs! I'm really excited about the library and direction, so I hope everyone else is too.
The text was updated successfully, but these errors were encountered: