You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't include the playground link because it's a race condition which is really hard to reproduce. In our case it happens once a few days, while we issue thousands of queries per seconds to our databases.
The issue shouldn't be difficult to understand taking into account the fact that the SQL API provided by Go is used in a wrong way in the scanning.
Can cause a dead lock. This method should be called only and only if we are sure that the underlying sql.Rows instance is closed. Unfortunately it's not always the case here, as it can be called just after the scan operation is done:
As the documentation says, every call to rows.Err() should be protected by a call to rows.Next() and only if it returns false, we can be sure that it's safe to call the rows.Err() method.
It's not safe, because every call to rows.Scan() may leave the rows instance in a read locked state, by acquiring a read lock on an internal RWMutex. The lock is held until rows.Close() or rows.Next() is called.
When rows.Err() is called it tries to acquire the read lock again before accessing the error instance. In most cases it's not an issue, because the lock is being acquired only for a moment and it's allowed to acquire it more than once for read operations. It just increments its internal number of calls.
The problem arises when a call to rows.Scan() leaves the rows instance in a locked state and before rows.Err() is called, another goroutine tries to close the rows instance due to a context cancellation, like here:
The rows.close() method tries to acquire a WRITE lock on the same RWMutex and ends up being locked on it (Remember that call to rows.Scan() didn't release the read lock to allow the scanning code to scan returned rows in a safe way - it prevents closes). Then the rows.Err() method is called trying to acquire the READ lock again:
and we end up with a dead lock, because the RWMutex is already locked for reading by the waiting close operation which asked for a write lock, so another read lock is not allowed anymore.
The simplest fix would be to wrap the Err() call with another Next() operation. The ScanRows looks like it was supposed to be called only once for a rows instance:
if !rows.Next() {
if err := rows.Err(); err != nil && err != db.Error {
db.AddError(err)
}
}
But for some reason, this is not the case in the tests, where it's called twice on a single Rows instance. While I have no clue what the author had in mind when implementing this operation, I won't provide a PR.
It's a really critical bug for loaded databases, because it causes DB locks which have to be closed manually.
edit:
Ok, I found that it's supposed to be called multiple times for one rows instance. It's a sample from the documentation:
rows, err := db.Model(&User{}).Where("name = ?", "jinzhu").Rows()
defer rows.Close()
for rows.Next() {
var user User
// ScanRows scans a row into a struct
db.ScanRows(rows, &user)
// Perform operations on each user
}
So it shouldn't call rows.Err() at all.
The text was updated successfully, but these errors were encountered:
I don't include the playground link because it's a race condition which is really hard to reproduce. In our case it happens once a few days, while we issue thousands of queries per seconds to our databases.
The issue shouldn't be difficult to understand taking into account the fact that the SQL API provided by Go is used in a wrong way in the scanning.
The
rows.Err()
method called here:gorm/scan.go
Line 353 in a9d2729
Can cause a dead lock. This method should be called only and only if we are sure that the underlying
sql.Rows
instance is closed. Unfortunately it's not always the case here, as it can be called just after the scan operation is done:https://github.com/swojtasiak/gorm/blob/a9d27293de2267a36fa6c9f8892977d3159cf8ea/scan.go#L346
As the documentation says, every call to
rows.Err()
should be protected by a call torows.Next()
and only if it returns false, we can be sure that it's safe to call therows.Err()
method.It's not safe, because every call to
rows.Scan()
may leave the rows instance in a read locked state, by acquiring a read lock on an internal RWMutex. The lock is held untilrows.Close()
orrows.Next()
is called.When
rows.Err()
is called it tries to acquire the read lock again before accessing the error instance. In most cases it's not an issue, because the lock is being acquired only for a moment and it's allowed to acquire it more than once for read operations. It just increments its internal number of calls.The problem arises when a call to
rows.Scan()
leaves the rows instance in a locked state and beforerows.Err()
is called, another goroutine tries to close the rows instance due to a context cancellation, like here:The
rows.close()
method tries to acquire a WRITE lock on the same RWMutex and ends up being locked on it (Remember that call torows.Scan()
didn't release the read lock to allow the scanning code to scan returned rows in a safe way - it prevents closes). Then therows.Err()
method is called trying to acquire the READ lock again:and we end up with a dead lock, because the RWMutex is already locked for reading by the waiting close operation which asked for a write lock, so another read lock is not allowed anymore.
The simplest fix would be to wrap the
Err()
call with anotherNext()
operation. TheScanRows
looks like it was supposed to be called only once for a rows instance:But for some reason, this is not the case in the tests, where it's called twice on a single Rows instance. While I have no clue what the author had in mind when implementing this operation, I won't provide a PR.
It's a really critical bug for loaded databases, because it causes DB locks which have to be closed manually.
edit:
Ok, I found that it's supposed to be called multiple times for one rows instance. It's a sample from the documentation:
So it shouldn't call
rows.Err()
at all.The text was updated successfully, but these errors were encountered: