This repository contains a Python script (1repo_to_text.py`) designed to process a local GitHub repository by concatenating the content of its files into text files, with a specified word limit per file. The script creates a consolidated view of a repository's content while excluding certain directories and file types.
Once the text files generated, you can upload them to your notebook Google NotebookLM, and start interacting with the code and asking questions about implementations and other details.
- Processes all files in a specified repository.
- Skips specified directories and file patterns
- Concatenates file contents into text files with a maximum word limit.
- Outputs the processed content into numbered text files.
- Provides command-line options for customization.
-
Clone the repository:
git clone https://github.com/jmlb/repoClerk.git cd repoClerk
-
Install the required Python packages:
pip install -r requirements.txt
Run the script using the following command:
python repo_to_text.py --cfg <path -to-config-yaml-file> --repo_path <path-to-repo> [options]
--cfg
: Path to the config file with list of file extension or directory to skip from the processing--repo_path
: Path to the local GitHub repository (absolute or relative).-w
,--max_words
: Maximum number of words per output file (default: 200,000).-o
,--output_dir
: Directory to save the output files (default: current directory).--skip_patterns
: Additional file patterns to skip (e.g.,"*.md" "*.txt"
).--skip_dirs
: Additional directories to skip.-v
,--verbose
: Enable verbose output.
here is a long example:
python repo_concat.py --repo_path ./my_repo -w 100000 -o ./output --skip_patterns ".md" ".txt" --skip_dirs "tests" -v
- Clone repo:
git clone https://github.com/crewAIInc/crewAI.git
- run text repoTotext code on the repo dir:
python repo_to_text.py --cfg config.yaml --repo_path repos/crewai
- Upload the files to NotebookLM https://notebooklm.google.com/
You are now ready to talk with the repo. Here are a few examples of interactions with NotebookLM
Prompt / Response
User Prompt
Implement a random example of a crew of agents.
Gemini Response
Here is an example of a simple crew of agents, drawn from the provided sources:
- The example uses the
crewai
framework, specifically demonstrating how to create a basic crew with two agents: aresearcher
and awriter
. - The
researcher
agent is assigned the task of conducting research on the latest advancements in AI in 2024, identifying trends, technologies, and industry impacts. They are expected to compile their findings into a detailed report. - The
writer
agent uses the research findings from theresearcher
to create a blog post about the advancements. - This example utilizes a sequential process, meaning the tasks are completed one after the other.
- The
Crew
object is initialized with the agents, tasks, and additional settings.
Code Snippet:
import os
from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool
os.environ["SERPER_API_KEY"] = "Your Key" # serper.dev API key
os.environ["OPENAI_API_KEY"] = "Your Key"
# Loading Tools
search_tool = SerperDevTool()
# Define your agents with roles, goals, tools, and additional attributes
researcher = Agent(
role='Senior Research Analyst',
goal='Provide in-depth research and analysis of the AI landscape',
backstory="An experienced researcher with a passion for uncovering insights.",
verbose=True,
allow_delegation=True,
tools=[search_tool],
cache=False, # Disable cache for this agent
)
writer = Agent(
role='AI Content Writer',
goal='Craft engaging and informative content about AI',
backstory="With a deep understanding of the tech industry, you transform complex concepts into compelling narratives.",
verbose=True,
allow_delegation=True,
tools=[search_tool],
cache=False, # Disable cache for this agent
)
# Create tasks for your agents
task1 = Task(
description=(
"Conduct a comprehensive analysis of the latest advancements in AI in 2024. "
"Identify key trends, breakthrough technologies, and potential industry impacts. "
"Compile your findings in a detailed report."
),
expected_output='A detailed report on the latest AI advancements in 2024',
agent=researcher,
)
task2 = Task(
description=(
"Craft a captivating blog post based on the provided research report. "
"Highlight the most significant AI advancements, their implications, and future potential. "
"Ensure the post is engaging, informative, and accessible to a broad audience."
),
expected_output='A compelling 3 paragraphs blog post formatted as markdown about the latest AI advancements in 2024',
agent=writer,
context=[task1] # Pass the output of task1 as context for task2
)
# Instantiate your crew with a sequential process
crew = Crew(
agents=[researcher, writer],
tasks=[task1, task2],
verbose=True,
memory=True,
planning=True # Enable planning feature for the crew
)
# Get your crew to work!
result = crew.kickoff()
print("######################")
print(result)
This example showcases the fundamental structure of creating a simple crew using the crewai
framework. By defining roles, goals, and tasks for each agent, and configuring the Crew
object, you can establish a basic multi-agent system for task automation.
Prompt / Response
User Prompt
I want each agent within my multi-agent setup to log their individual token usage during interactions. How can I implement this functionality to monitor and print each agent's token usage after every interaction? Additionally, where in the existing code should this feature be integrated for optimal tracking and minimal performance impact? Please provide guidance on which methods or modules would be most appropriate to modify or extend to capture token usage data effectively.
Gemini Response
While the sources don't explicitly outline how to print the token usage of each agent individually, they do provide insights into how token usage is tracked and accessed within the CrewAI framework. Based on this information, we can devise a potential implementation strategy.
The Crew
class in CrewAI offers a usage_metrics
attribute, which stores a summary of token usage across all tasks and agents. You can access this data after the crew's execution using:
crew = Crew(agents=[agent1, agent2], tasks=[task1, task2])
crew.kickoff()
print(crew.usage_metrics)
This provides the total token usage for the entire crew but doesn't break it down per agent.
The sources suggest that token usage is tracked at the agent level using the _token_process
attribute. This attribute is an instance of the TokenProcess
class, which contains methods for summing prompt tokens, completion tokens, and successful requests. You can find this in:
- [Source 2:
TokenProcess
Class Definition] - [Source 2:
calculate_usage_metrics
Method]
To achieve your goal of printing each agent's token usage, you'd need to:
a) Access the _token_process
Attribute: After the crew's execution, iterate through the agents
list in the Crew
object and access the _token_process
attribute of each agent.
b) Call the get_summary()
Method: Invoke the get_summary()
method on the _token_process
object to retrieve a UsageMetrics
object containing the agent's token usage summary.
c) Print the Summary: Print the relevant information from the UsageMetrics
object, such as total_tokens
, prompt_tokens
, and completion_tokens
.
crew = Crew(agents=[agent1, agent2], tasks=[task1, task2])
crew.kickoff()
for agent in crew.agents:
token_summary = agent._token_process.get_summary()
print(f"Agent: {agent.role}")
print(f" Total Tokens: {token_summary.total_tokens}")
print(f" Prompt Tokens: {token_summary.prompt_tokens}")
print(f" Completion Tokens: {token_summary.completion_tokens}")
-
Private Attribute: The
_token_process
attribute is prefixed with an underscore, indicating that it's intended for internal use within the class. Accessing it directly might be considered a violation of encapsulation principles.- Potential Solution: You could consider contributing to the CrewAI project by proposing a public method or property that exposes this information in a controlled manner.
-
Framework Updates: Keep in mind that software frameworks evolve. The specific implementation details might change in future versions of CrewAI.
-
The
calculate_usage_metrics
method in theCrew
class ([Source 2]) demonstrates how to aggregate token usage data from multiple agents. This might be useful for understanding how token usage is accumulated at the crew level. -
The
TokenCalcHandler
class ([Source 2]) is used as a callback to track token usage during agent execution. This highlights how CrewAI integrates token tracking into its execution pipeline.
By combining the insights from the sources and the proposed implementation strategy, you should be able to enhance the CrewAI framework to provide more granular visibility into agent-level token usage, allowing for better resource monitoring and optimization.
wav2vid.py
is a Python script that generates a video using a fixed image and an audio file (WAV format), overlaying today's date on the image. The resulting video displays the image throughout the entire duration of the audio.
- Overlays today's date on the input image.
- Synchronizes the image display with the audio duration.
- Outputs a high-quality video in MP4 format.
- Command:
python wav2vid.py path/to/image.jpg path/to/audio.wav path/to/output_video.mp4
- Example:
python wav2vid.py sample_image.jpg background_audio.wav output_video.mp4