QA agent #375

slavingia · 2025-03-02T16:00:21Z

slavingia
Mar 2, 2025
Maintainer

From some conversations last week, I was interested in exploring more the idea of not having QA at all as part of the code base / build step. Instead, only have it as maintaining a high-quality production experience–business logic will still be tested by unit tests.

This is how a QA agent could work for production deployments.

Commit is merged into main
5-20+ minutes of testing is done to find breakages in the UX. These tests are written in real-time, based on the git diff. It's a custom test suite every time.
If any bugs / issues / diffs are found, they are filed to GitHub automatically with a description of the found issue and screenshots

For feature deployments, bugs are left as comments on the PR instead.

Shortest generator prompt:

You are a QA agent tasked with creating and executing real-time tests based on recent code changes. Your goal is to identify any UX breakages, bugs, or regressions that may have been introduced, then report them appropriately.

## Process
1. Analyze the provided git diff to understand what has changed
2. Create a custom test suite in natural language specifically targeting these changes

## Input Data
- The git diff of recently committed/pushed changes
- The branch name (main or feature branch)
- Application context (summary of product, can be deduced from marketing homepage)

## Testing Approach
For each changed component/feature:
1. Test the happy path to ensure basic functionality works
2. Test edge cases and potential failure scenarios
4. Test interactions with dependent components
5. Verify visual consistency and UX flows, noting any changes

## Examples
[Include examples of git diffs and corresponding test plans/bug reports]

## Limitations
- Focus only on changes indicated in the git diff
- Prioritize user-facing issues over internal implementation details
- Some features may not be available in the deployed environment

Shortest test runner prompt:

## Tests
[tests specified]

## Workflow
1. Execute these tests and document any issues found
2. Return list of issues in the appropriate format; from these, GitHub issues for the main branch, PR comments for feature branches will be generated

## Issue Reporting
For main branch deployments:
- Create GitHub issues with the following structure:
  - Title: Clear, concise description of the issue
  - Description: Detailed explanation including steps to reproduce
  - Screenshots: Visual evidence of the issue
  - Severity: Critical/High/Medium/Low
  - Affected components: List of impacted areas
  - Related code: Link to relevant code changes

For feature branch deployments:
- Add comments directly to the PR, preferably on specific lines of code
- Include screenshots and steps to reproduce
- Suggest potential fixes when possible

## Additional Guidelines

- Capture clear screenshots that highlight the issue
- For visual regressions, include before/after comparisons when possible
- Prioritize critical user flows over edge cases
- Consider accessibility implications of changes

The resulting issues would be recorded with a single tool: note issue, which has a text note and screenshots of the UX (ideally, we would support something like instant reply somehow).

rmarescu · 2025-03-03T18:19:21Z

rmarescu
Mar 3, 2025
Maintainer

The approach suggested focuses on the ongoing flow of an app, leveraging git diff for real-time test generation. However, for git diff to be meaningful, the agent needs initial context or pre-existing test coverage. Without this, the agent wouldn't know what the changes are affecting or what's expected behavior versus a genuine issue.

For example, if we assigned this task to a human QA, they would first need to understand what the app does, establish a baseline of test cases (with an instance of the app before the merge/commit), and then analyze what the diff changes.

For AI to generate relevant tests, it needs a baseline—a snapshot of the app’s expected behavior. This means:

Pre-existing tests or context

Even without a fixed test suite, the system needs a way to understand:

core user flows
expected UI structure
important interaction patterns

This could come from:

historical test runs
auto-discovered UI behaviours
application metadata / application context as you mentioned (marketing site, docs, etc)

Understanding changes in context

A git diff alone only shows code changes—it doesn't inherently map to UX impact
AI needs to correlate the diff with:
- affected UI components
- related API endpoints
- user-facing flows

Generating targeted test cases

Once the AI understands the context of the change, it can dynamically:

identify which UX flows should be tested
determine if visual changes need validation
decide what critical user paths may be impacted

To summarize, a first-time setup flow is important for helping the agent succeed in ongoing flows.
Without it, the AI is operating without a reference point.

This step doesn't have to be manual (how it is currently); it can be automated. I've outlined this in a scoping doc last week, summarizing:

AI scans the app (code base, crawls key pages, learns UI structure, etc)
Observers real users interactions (maybe through a 3rd party analytics / logs, if available)
Defines a baseline test suite from these observations

This way, every subsequent git diff is relative to something—not just an isolated patch of code.

0 replies

rmarescu · 2025-03-20T06:53:28Z

rmarescu
Mar 20, 2025
Maintainer

A first MVP towards this is having tools that can detect the framework used (e.g. NextJS), analyze the code, write a test plan, write tests, and then execute the tests.

High-level steps (actual command may be different)

shortest detect
- Identifies if the project is Next.js or a generic setup.
- Writes a small JSON, e.g. .shortest/framework.json.
shortest analyze
- Scans the code and summarizes most important components, so that it can be used by AI to recommend test scenarios (next step)
- Stores the output in .shortest/analysis.json.
shortest plan
- Feeds analysis.json to AI, asking for recommended test scenarios.
- Writes a structured outline to .shortest/test-plan.json.
shortest generate tests
- Uses the plan to generate actual *.test.ts files with Shortest syntax.
shortest
- Executes all relevant tests (including newly generated ones).
Update shortest init to a wizard-like experience to execute all steps sequentially

0 replies

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA agent #375

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

This comment was marked as off-topic.

{{title}}

{{title}}

Select a reply

QA agent #375

slavingia Mar 2, 2025 Maintainer

Replies: 3 comments

This comment was marked as off-topic.

rmarescu Mar 3, 2025 Maintainer

rmarescu Mar 20, 2025 Maintainer

slavingia
Mar 2, 2025
Maintainer

rmarescu
Mar 3, 2025
Maintainer

rmarescu
Mar 20, 2025
Maintainer