analyzer/block: Hardcode pre-Eden stale accounts #654

mitjat · 2024-03-07T07:42:44Z

This PR adds lists of accounts that had stale native runtime balances soon(ish) after start of Eden.

Those stale balances are considered unavoidable because

the runtimes used to not emit all relevant/required events before Eden, and
in testnet, we don't have very old archives set up, so nexus has never seen those old txs. (Applies to emerald only)
See in-code comments for more details.

The PR also fixes statecheck so it can run against testnet.

This PR is a continuation of #617 ; I just created a new PR because I'll be pushing it to completion and it's easier to do so if I own the PR, rather than ptrus.

Testing:

Ran locally with and without NEXUS_FORCE_MARK_STALE_ACCOUNTS=1.
Applied in staging mainnet+testnet with NEXUS_FORCE_MARK_STALE_ACCOUNTS=1. Even emerald testnet with 120k stale accounts was enqueued without problems in <10s.

Stats

Output of statechecks:

Mainnet sapphire:
    runtime_test.go:172: Number of discrepancies in account balances: 610 (out of: 12171).
                        - Does not include accounts that are only listed in Nexus or on the chain.
                        - Exempted accounts because of known stale balance in Nexus: 0
                
    runtime_test.go:176: Number of unexpected addresses found in Nexus: 0
    runtime_test.go:177: Number of expected addresses not found in Nexus: 459 (with non-zero balance: 340)

Mainnet emerald:
    runtime_test.go:172: Number of discrepancies in account balances: 12375 (out of: 98594).
                        - Does not include accounts that are only listed in Nexus or on the chain.
                        - Exempted accounts because of known stale balance in Nexus: 2
                
    runtime_test.go:176: Number of unexpected addresses found in Nexus: 0
    runtime_test.go:177: Number of expected addresses not found in Nexus: 3191 (with non-zero balance: 773)

Testnet sapphire:
    runtime_test.go:172: Number of discrepancies in account balances: 574 (out of: 29307).
                        - Does not include accounts that are only listed in Nexus or on the chain.
                        - Exempted accounts because of known stale balance in Nexus: 2
                
    runtime_test.go:176: Number of unexpected addresses found in Nexus: 0
    runtime_test.go:177: Number of expected addresses not found in Nexus: 2544 (with non-zero balance: 1464)
    
Testnet emerald:
    runtime_test.go:172: Number of discrepancies in account balances: 110 (out of: 4995).
                        - Does not include accounts that are only listed in Nexus or on the chain.
                        - Exempted accounts because of known stale balance in Nexus: 0
                
    runtime_test.go:176: Number of unexpected addresses found in Nexus: 0
    runtime_test.go:177: Number of expected addresses not found in Nexus: 168744 (with non-zero balance: 75477)

Number of stale accounts extracted from logs, before and after ignoring the ones with balance 0. This PR uses the "after" version.

Before:
  15566 analyzer/runtime/static/pre_eden/mainnet_emerald_9225025.addrs
   1069 analyzer/runtime/static/pre_eden/mainnet_sapphire_2680354.addrs
  17162 analyzer/runtime/static/pre_eden/testnet_emerald_4882110.addrs
   3118 analyzer/runtime/static/pre_eden/testnet_sapphire_5194684.addrs
After:
  13150 analyzer/runtime/static/pre_eden/mainnet_emerald_9225025.addrs
   1003 analyzer/runtime/static/pre_eden/mainnet_sapphire_2680354.addrs
  17162 analyzer/runtime/static/pre_eden/testnet_emerald_4882110.addrs
 129440 analyzer/runtime/static/pre_eden/testnet_emerald_4896940.addrs
   2046 analyzer/runtime/static/pre_eden/testnet_sapphire_5194684.addrs

Methodology of collecting addresses

This is for my records/reproducibility more so than anything else. I obtained the list of stale accounts by manually running the statecheck (in k8s). I ran commands like manual_cronjob mainnet emerald

manual_cronjob() {
  net=$1
  layer=$2
  
  cronjob=${net}-indexer-statecheck-${layer}
  job=oneoff-statecheck-${net}-${layer}-mitjat
  kubectl -n monitoring delete job $job
  kubectl -n monitoring create job --from=cronjob/$cronjob $job
}

Then I fetched the logs with fetch_logs mainnet emerald oneoff

fetch_logs() {
  net=$1
  layer=$2
  target=${3:-}

  if [[ "$target" == "" ]]; then
    cronjob=${net}-indexer-statecheck-${layer}
    job="$(kubectl -n monitoring get jobs | grep $cronjob | head -n1 | cut -d' ' -f1)"
    target=jobs/$job
  elif [[ "$target" == "oneoff" ]]; then
    target=jobs/oneoff-statecheck-${net}-${layer}-mitjat
  fi
  echo "Fetching logs from $target"

  base=/tmp/${net}_${layer}
  kubectl -n monitoring logs $target > $base.log
  echo "Dumped logs into $base.log"
  
  height="$(cat $base.log | grep -oE '(Fetching accounts information at height|Fetching state dump at height) [0-9]+' | grep -oE '[0-9]+')"
  addrs_file="${base}_${height}.addrs"
  cat $base.log \
    | awk '{if (!($0 ~ /runtime_test.go:162: Balance: 0/)) { print prev_line } prev_line=$0 }' `# Filter out addrs with balance 0` \
    | grep -oE 'oasis1[a-z0-9]{40}' | sort | uniq | tee $addrs_file | wc -l | echo "Extracted $(cat) addresses into $addrs_file"
  less $base.log
}

For testnet emerald, the logs were too large and were truncated by k8s. So I had to fetch them from GCP instead of using kubectl logs:

gcloud logging read 'resource.labels.pod_name="oneoff-statecheck-testnet-emerald-mitjat-b5sm5"' --project=oasis-infra-production --format='get(textPayload)' --order=asc

ptrus

Nice, thanks for finishing this up!

mitjat force-pushed the mitjat/pre-eden-stale-accounts branch from 3cf404a to 70e39e3 Compare March 8, 2024 11:56

mitjat marked this pull request as ready for review March 8, 2024 11:58

mitjat requested review from Andrew7234, pro-wh and ptrus as code owners March 8, 2024 11:58

mitjat changed the title ~~analyzer//block: Hardcode pre-Eden stale accounts~~ analyzer/block: Hardcode pre-Eden stale accounts Mar 8, 2024

mitjat force-pushed the mitjat/pre-eden-stale-accounts branch from 70e39e3 to e1761aa Compare March 8, 2024 12:35

ptrus mentioned this pull request Mar 11, 2024

wip: hardcode known stale accounts at Eden #617

Closed

ptrus approved these changes Mar 11, 2024

View reviewed changes

ptrus and others added 8 commits March 12, 2024 17:43

analyzer/runtime/static: hardcode known stale accounts at Eden

2f4434d

analyzer/block: log before long init check

07b1ccd

analyzer/runtime/static: load stale accts with go:embed

2b22b32

analyzer/runtime/static: call every runtime round

d3825d4

statecheck: Allow running against testnet

541c154

analyzer/runtime/static: force-apply stale accounts to the DB

8e2a942

analyzer/runtime/static: Add lists of known stale accts

617ea25

statecheck/runtime: Print balance diffs in human-readable format

fdb9e9a

mitjat force-pushed the mitjat/pre-eden-stale-accounts branch from 3fc6509 to fdb9e9a Compare March 12, 2024 16:43

mitjat enabled auto-merge March 12, 2024 16:43

mitjat merged commit ff929e6 into main Mar 12, 2024
10 checks passed

mitjat deleted the mitjat/pre-eden-stale-accounts branch March 12, 2024 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyzer/block: Hardcode pre-Eden stale accounts #654

analyzer/block: Hardcode pre-Eden stale accounts #654

mitjat commented Mar 7, 2024 •

edited

Loading

ptrus left a comment

analyzer/block: Hardcode pre-Eden stale accounts #654

analyzer/block: Hardcode pre-Eden stale accounts #654

Conversation

mitjat commented Mar 7, 2024 • edited Loading

Stats

Methodology of collecting addresses

ptrus left a comment

Choose a reason for hiding this comment

mitjat commented Mar 7, 2024 •

edited

Loading