Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize journal table indices #496

Closed

Conversation

Arkatufus
Copy link
Contributor

Fixes #495

Changes

Optimize journal table query speed by adding indices on (peristence_id) and (persistence_id, ordering)

@Arkatufus
Copy link
Contributor Author

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5131/22H2/2022Update)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.111
  [Host]     : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX2

dev branch tag table - csv benchmark

Method TagMode Mean Error StdDev Gen0 Gen1 Gen2 Allocated
QueryByTag10 Csv 2,422.617 ms 48.3860 ms 59.4224 ms - - - 526.11 KB
QueryByTag100 Csv 2,386.379 ms 28.3730 ms 25.1520 ms - - - 1355.14 KB
QueryByTag1000 Csv 2,399.557 ms 45.5216 ms 42.5809 ms 1000.0000 - - 10571.98 KB
QueryByTag10000 Csv 2,514.846 ms 34.4976 ms 30.5812 ms 12000.0000 3000.0000 - 101477.38 KB
QueryByTag10 TagTable 4.303 ms 0.1250 ms 0.3647 ms 39.0625 - - 347.56 KB
QueryByTag100 TagTable 5.713 ms 0.1133 ms 0.2558 ms 148.4375 31.2500 - 1240.29 KB
QueryByTag1000 TagTable 23.715 ms 0.4728 ms 0.9332 ms 1281.2500 468.7500 - 10461.22 KB
QueryByTag10000 TagTable 209.548 ms 4.1742 ms 11.4268 ms 12500.0000 3000.0000 500.0000 101584.23 KB

This PR benchmark

Method TagMode Mean Error StdDev Gen0 Gen1 Gen2 Allocated
QueryByTag10 Csv 2,412.107 ms 44.5888 ms 41.7083 ms - - - 527.56 KB
QueryByTag100 Csv 2,456.264 ms 48.5973 ms 69.6968 ms - - - 1355.56 KB
QueryByTag1000 Csv 2,413.896 ms 42.7739 ms 37.9180 ms 1000.0000 - - 10510.35 KB
QueryByTag10000 Csv 2,567.690 ms 50.4452 ms 51.8035 ms 12000.0000 3000.0000 - 101474.84 KB
QueryByTag10 TagTable 4.054 ms 0.0805 ms 0.1662 ms 39.0625 - - 347.65 KB
QueryByTag100 TagTable 5.465 ms 0.1061 ms 0.2307 ms 148.4375 31.2500 - 1250.44 KB
QueryByTag1000 TagTable 23.638 ms 0.4622 ms 0.7464 ms 1281.2500 562.5000 - 10441.6 KB
QueryByTag10000 TagTable 206.246 ms 4.0650 ms 7.2256 ms 12500.0000 3000.0000 500.0000 101577.28 KB

@Arkatufus
Copy link
Contributor Author

MS SQL Server dev VS new benchmark comparison

Test dev Optimized Optimized VS dev
Persist 2,068.46 2,043.64 -1.20%
PersistAsync 239,005.74 216,919.74 -9.24%
PersistAll 58,146.30 54,936.00 -5.52%
PersistAllAsync 239,291.70 210,748.16 -11.93%
PersistGroup10 10,799.25 10,547.86 -2.33%
PersistGroup25 23,002.78 20,963.04 -8.87%
PersistGroup50 38,626.44 33,891.41 -12.26%
PersistGroup100 51,856.46 52,222.05 0.70%
PersistGroup200 41,349.65 36,129.78 -12.62%
Recovering 67,114.09 64,267.35 -4.24%
RecoveringTwo 43,224.55 42,140.75 -2.51%
RecoveringFour 50,352.47 50,511.43 0.32%
Recovering8 56,927.35 55,031.99 -3.33%
Recovering 100 63,678.04 64,110.37 0.68%
Recovering 500 64,248.77 64,826.29 0.90%
Recovering 1000 64,854.17 64,861.44 0.01%

@Arkatufus
Copy link
Contributor Author

PostgreSQL dev VS new benchmark comparison

Test dev Optimized Optimized VS dev
Persist 3,895.90 3,782.61 -2.91%
PersistAsync 124,906.32 123,609.39 -1.04%
PersistAll 129,466.60 117,439.81 -9.29%
PersistAllAsync 125,786.16 124,626.12 -0.92%
PersistGroup10 17,884.93 18,042.07 0.88%
PersistGroup25 36,952.18 33,708.62 -8.78%
PersistGroup50 56,593.10 50,735.67 -10.35%
PersistGroup100 92,293.49 86,482.75 -6.30%
PersistGroup200 70,487.07 59,269.80 -15.91%
Recovering 105,485.23 106,837.61 1.28%
RecoveringTwo 42,817.38 43,020.00 0.47%
RecoveringFour 51,314.95 51,092.09 -0.43%
Recovering8 57,016.61 57,240.98 0.39%
Recovering 100 63,678.04 63,829.65 0.24%
Recovering 500 64,248.77 64,380.64 0.21%
Recovering 1000 64,301.36 64,451.55 0.23%

@Arkatufus Arkatufus marked this pull request as ready for review December 10, 2024 22:30
@Arkatufus
Copy link
Contributor Author

To be honest, the numbers are not good. It gives a fraction of query/read speed at the cost of write speed.

@Aaronontheweb
Copy link
Member

To be honest, the numbers are not good. It gives a fraction of query/read speed at the cost of write speed.

Yes but these benchmarks aren't testing what happens with a large, pre-existing data set sitting inside the journal and tag tables, which is how 99% of successful Akka.NET applications run given a modest amount of time - so the measurements aren't realistic here.

@Aaronontheweb
Copy link
Member

Relevant issue: akkadotnet/akka.net#5503

@Arkatufus
Copy link
Contributor Author

Arkatufus commented Dec 19, 2024

Execution Plan Benchmarking

We've run all of the SQL queries generated by LinqToDb against a SQL Server 2022 running in Docker to observe the actual execution plan for each and these were the findings:

Observations

Actor recovery SQL query

/* Actor recovery */
DECLARE @take Int -- Int32
SET     @take = 1000
DECLARE @persistenceId NVarChar(255) -- String
SET     @persistenceId = N'PersistPid1'
DECLARE @fromSequenceNr BigInt -- Int64
SET     @fromSequenceNr = 1
DECLARE @toSequenceNr BigInt -- Int64
SET     @toSequenceNr = 1000

SELECT TOP (@take)
  [r].[ordering],
  [r].[created],
  [r].[deleted],
  [r].[persistence_id],
  [r].[sequence_number],
  [r].[message],
  [r].[manifest],
  [r].[identifier],
  [r].[writer_uuid]
FROM
  [journal] [r]
WHERE
  [r].[persistence_id] = @persistenceId AND
  [r].[sequence_number] >= @fromSequenceNr AND
  [r].[sequence_number] <= @toSequenceNr AND
  [r].[deleted] = 0
ORDER BY
  [r].[sequence_number]

image

GetHighestOrderingNr SQL Query

/* Highest ordering */
SELECT
  Max([r].[ordering])
FROM
  [journal] [r]

image

CurrentPersistenceIds SQL Query

/* CurrentPersistenceIds query */
DECLARE @take Int -- Int32
SET     @take = 2147483647
SELECT DISTINCT TOP (@take)
  [r].[persistence_id]
FROM
  [journal] [r]
WHERE
  [r].[deleted] = 0

image

CurrentPersistenceIds SQL Query With Forced Index

/* CurrentPersistenceIds query */
DECLARE @take Int -- Int32
SET     @take = 2147483647
SELECT DISTINCT TOP (@take)
  [r].[persistence_id]
FROM
  [journal] [r]
WITH(INDEX(IX_journal_persistence_id)) -- Force query to use the new index
WHERE
  [r].[deleted] = 0

image

CurrentEventsByTag SQL Query

DECLARE @take Int -- Int32
SET     @take = 500
DECLARE @Offset BigInt -- Int64
SET     @Offset = 0
DECLARE @MaxOffset BigInt -- Int64
SET     @MaxOffset = 0
DECLARE @Tag NVarChar(64) -- String
SET     @Tag = N'Tag1'
SELECT TOP (@take)
  [x].[ordering],
  [x].[created],
  [x].[deleted],
  [x].[persistence_id],
  [x].[sequence_number],
  [x].[message],
  [x].[manifest],
  [x].[identifier],
  [x].[writer_uuid],
    (
      SELECT STRING_AGG([r].[tag], N';')
      FROM [tags] [r]
      WHERE [r].[ordering_id] = [x].[ordering]
    )
FROM
  [journal] [x]
    LEFT JOIN [tags] [jtr] ON [jtr].[ordering_id] = [x].[ordering]
WHERE
  [jtr].[ordering_id] > @Offset AND
  [jtr].[ordering_id] <= @MaxOffset AND
  [x].[deleted] = 0 AND
  [jtr].[tag] = @Tag
ORDER BY
  [x].[ordering]

image

Findings

The new indices was not used in any of the generated SQL statements, and if we force them to use the new indices, it actually hurts performance (5 seconds execution time to 1 minute 25 seconds execution time).

Conclusion

The new indices does not help with actor recovery nor persistence query performance, we're dropping this PR and closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing indices on tables
2 participants