Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading BLOB from SQLite3 no longer working #1173

Open
Tectu opened this issue Oct 22, 2024 · 7 comments · May be fixed by #1189
Open

Reading BLOB from SQLite3 no longer working #1173

Tectu opened this issue Oct 22, 2024 · 7 comments · May be fixed by #1189

Comments

@Tectu
Copy link
Contributor

Tectu commented Oct 22, 2024

Hi,

I've updated an application using SOCI from 9bcc5f8a9181886f4c73ea5b4671b35d8722cb3a to a55c80984a447bf8c788a7d6a23ff93c022069fa.

The only problem I encountered so far is with regards of reading a BLOB from an SQLite3 database.
Previously, the application did this (inside of a custom/user type conversion):

std::string str = v.get<std::string>("my_column");

Where:

  • my_column is an SQLite3 column of type BLOB
  • v is an lvalue reference to soci::base_type

This used to work well. However, after the update the code is throwing an std::bad_cast in soci::details::holder::get() (called by soci::row::get()).

According to the git log there have been some activities around blob support.

What is the new, correct way of reading (and writing!) a blob to/from SQLite3?

@vadz
Copy link
Member

vadz commented Oct 22, 2024

Thanks for reporting this, it's definitely a regression and should be fixed because I don't see anything wrong with this code.

I thought we had some testing code doing essentially the same thing, but I was clearly wrong. It would be great if you could please make a patch adding the (now) failing test case, this will help debug/bisect it and will ensure that it remains fixed in the future.

@Tectu
Copy link
Contributor Author

Tectu commented Oct 22, 2024

Sure, I'll look into adding a test. Will take a while to get familiar with your testing setup tho.

I think the underlying "issue" or "cause" is that soci::details::holder::get() is clearly intentionally throwing when the type is db_blob:

case db_double:
return soci_cast<T, double>::cast(*val_.d);
case db_date:
return soci_cast<T, std::tm>::cast(*val_.t);
case db_blob:
// blob is not copyable
break;
case db_xml:
case db_string:
return soci_cast<T, std::string>::cast(*val_.s);
}
throw std::bad_cast();
}

There's a comment indicating that this is due to blob being non-copyable.

Seems like this was introduced in commit ae07cd5.

@Krzmbrzl
Copy link
Contributor

Krzmbrzl commented Jan 1, 2025

You can't obtain a BLOB as a string anymore. You need to do

soci::blob blob = v.move_as<soci::blob>("my_column");

Relevant test case for this:

soci::blob intoBlob = currentRow.move_as<soci::blob>(0);

Quoting from #992:

When selecting a blob type via a rowset, they are now actually represented by a blob object instead of a std::string. The blob can be obtained from a given row via the new move_as<> function that complements the already existing get<> function for non-copyable types (such as blobs). Trying to use get<> for blob types will yield a compiler error stating that this is not supported.


What is the new, correct way of reading (and writing!) a blob to/from SQLite3?

see

TEST_CASE_METHOD(common_tests, "BLOB", "[core][blob]")
{
soci::session sql(backEndFactory_, connectString_);
auto_table_creator tableCreator(tc_.table_creator_blob(sql));
if (!tableCreator.get())
{
try
{
soci::blob blob(sql);
FAIL("BLOB creation should throw, if backend doesn't support BLOBs");
}
catch (const soci_error &)
{
// Throwing is expected if the backend doesn't support BLOBs
}
WARN("BLOB type not supported by the database, skipping the test.");
return;
}
const char dummy_data[] = "abcdefghijklmnopqrstuvwxyz";
// Cross-DB usage of BLOBs is only possible if the entire lifetime of the blob object
// is covered in an active transaction.
soci::transaction transaction(sql);
SECTION("Read-access on just-constructed blob")
{
soci::blob blob(sql);
CHECK(blob.get_len() == 0);
char buf[5];
std::size_t read_bytes = blob.read_from_start(buf, sizeof(buf));
// There should be no data that could be read
CHECK(read_bytes == 0);
// Reading from any offset other than zero is invalid
CHECK_THROWS_AS(blob.read_from_start(buf, sizeof(buf), 1), soci_error);
}
SECTION("validity & initialization")
{
soci::blob blob;
CHECK_FALSE(blob.is_valid());
blob.initialize(sql);
CHECK(blob.is_valid());
soci::blob other;
CHECK_FALSE(other.is_valid());
other = std::move(blob);
CHECK(other.is_valid());
CHECK_FALSE(blob.is_valid());
}
SECTION("BLOB I/O")
{
soci::blob blob(sql);
std::size_t written_bytes = blob.write_from_start(dummy_data, 5);
CHECK(written_bytes == 5);
CHECK(blob.get_len() == 5);
char buf[5];
static_assert(sizeof(buf) <= sizeof(dummy_data), "Underlying assumption violated");
std::size_t read_bytes = blob.read_from_start(buf, sizeof(buf));
CHECK(read_bytes == sizeof(buf));
for (std::size_t i = 0; i < sizeof(buf); ++i)
{
CHECK(buf[i] == dummy_data[i]);
}
written_bytes = blob.append(dummy_data + 5, 3);
CHECK(written_bytes == 3);
CHECK(blob.get_len() == 8);
read_bytes = blob.read_from_start(buf, sizeof(buf), 3);
CHECK(read_bytes == 5);
for (std::size_t i = 0; i < sizeof(buf); ++i)
{
CHECK(buf[i] == dummy_data[i + 3]);
}
blob.trim(2);
CHECK(blob.get_len() == 2);
read_bytes = blob.read_from_start(buf, sizeof(buf));
CHECK(read_bytes == 2);
for (std::size_t i = 0; i < read_bytes; ++i)
{
CHECK(buf[i] == dummy_data[i]);
}
// Reading from an offset >= the current length of the blob is invalid
CHECK_THROWS_AS(blob.read_from_start(buf, sizeof(buf), blob.get_len()), soci_error);
written_bytes = blob.append("z", 1);
CHECK(written_bytes == 1);
CHECK(blob.get_len() == 3);
read_bytes = blob.read_from_start(buf, 1, 2);
CHECK(read_bytes == 1);
CHECK(buf[0] == 'z');
// Writing more than one position beyond the blob is invalid
// (Writing exactly one position beyond is the same as appending)
CHECK_THROWS_AS(blob.write_from_start(dummy_data, 2, blob.get_len() + 1), soci_error);
}
SECTION("Inserting/Reading default-constructed blob")
{
{
soci::blob input_blob(sql);
sql << "insert into soci_test (id, b) values(5, :b)", soci::use(input_blob);
}
soci::blob output_blob(sql);
soci::indicator ind;
sql << "select b from soci_test where id = 5", soci::into(output_blob, ind);
CHECK(ind == soci::i_ok);
CHECK(output_blob.get_len() == 0);
}
SECTION("Ensure reading into blob overwrites previous contents")
{
soci::blob blob(sql);
blob.write_from_start("hello kitty", 10);
CHECK(blob.get_len() == 10);
{
soci::blob write_blob(sql);
write_blob.write_from_start("test", 4);
sql << "insert into soci_test (id, b) values (5, :b)", soci::use(write_blob);
}
sql << "select b from soci_test where id = 5", soci::into(blob);
CHECK(blob.get_len() == 4);
char buf[5];
std::size_t read_bytes = blob.read_from_start(buf, sizeof(buf));
CHECK(read_bytes == 4);
CHECK(buf[0] == 't');
CHECK(buf[1] == 'e');
CHECK(buf[2] == 's');
CHECK(buf[3] == 't');
}
SECTION("Blob-DB interaction")
{
soci::blob write_blob(sql);
static_assert(sizeof(dummy_data) >= 10, "Underlying assumption violated");
write_blob.write_from_start(dummy_data, 10);
const int first_id = 42;
// Write and retrieve blob from/into database
sql << "insert into soci_test (id, b) values(:id, :b)", soci::use(first_id), soci::use(write_blob);
// Append to write_blob - these changes must not reflect in the BLOB stored in the DB
write_blob.append("ay", 2);
CHECK(write_blob.get_len() == 12);
char buf[15];
std::size_t read_bytes = write_blob.read_from_start(buf, sizeof(buf));
CHECK(read_bytes == 12);
for (std::size_t i = 0; i < 10; ++i)
{
CHECK(buf[i] == dummy_data[i]);
}
CHECK(buf[10] == 'a');
CHECK(buf[11] == 'y');
soci::blob read_blob(sql);
sql << "select b from soci_test where id = :id", soci::use(first_id), soci::into(read_blob);
CHECK(sql.got_data());
CHECK(read_blob.get_len() == 10);
std::size_t bytes_read = read_blob.read_from_start(buf, sizeof(buf));
CHECK(bytes_read == read_blob.get_len());
CHECK(bytes_read == 10);
for (std::size_t i = 0; i < bytes_read; ++i)
{
CHECK(buf[i] == dummy_data[i]);
}
// Update original blob and insert new db-entry (must not change previous entry)
const int second_id = first_id + 1;
write_blob.trim(0);
static_assert(sizeof(dummy_data) >= 15 + 5, "Underlying assumption violated");
write_blob.write_from_start(dummy_data + 15, 5);
sql << "insert into soci_test (id, b) values (:id, :b)", soci::use(second_id), soci::use(write_blob);
// First, check that the original entry has not been changed
sql << "select b from soci_test where id = :id", soci::use(first_id), soci::into(read_blob);
CHECK(read_blob.get_len() == 10);
// Then check new entry can be read
sql << "select b from soci_test where id = :id", soci::use(second_id), soci::into(read_blob);
bytes_read = read_blob.read_from_start(buf, sizeof(buf));
CHECK(bytes_read == read_blob.get_len());
CHECK(bytes_read == 5);
for (std::size_t i = 0; i < bytes_read; ++i)
{
CHECK(buf[i] == dummy_data[i + 15]);
}
}
SECTION("Binary data")
{
const std::uint8_t binary_data[12] = {0, 1, 2, 3, 4, 5, 6, 7, 22, 255, 250 };
soci::blob write_blob(sql);
std::size_t bytes_written = write_blob.write_from_start(binary_data, sizeof(binary_data));
CHECK(bytes_written == sizeof(binary_data));
sql << "insert into soci_test (id, b) values (1, :b)", soci::use(write_blob);
soci::blob read_blob(sql);
sql << "select b from soci_test where id = 1", soci::into(read_blob);
CHECK(read_blob.get_len() == sizeof(binary_data));
std::uint8_t buf[20];
std::size_t bytes_read = read_blob.read_from_start(buf, sizeof(buf));
CHECK(bytes_read == sizeof(binary_data));
for (std::size_t i = 0; i < sizeof(binary_data); ++i)
{
CHECK(buf[i] == binary_data[i]);
}
}
SECTION("Rowset Blob recognition")
{
soci::blob blob(sql);
// Write and retrieve blob from/into database
int id = 1;
sql << "insert into soci_test (id, b) values(:id, :b)", soci::use(id), soci::use(blob);
soci::rowset< soci::row > rowSet = sql.prepare << "select id, b from soci_test";
bool containedData = false;
for (auto it = rowSet.begin(); it != rowSet.end(); ++it)
{
containedData = true;
const soci::row &currentRow = *it;
CHECK(currentRow.get_properties(1).get_data_type() == soci::dt_blob);
}
CHECK(containedData);
}
SECTION("Blob binding")
{
// Add data
soci::blob blob1(sql);
soci::blob blob2(sql);
static_assert(10 <= sizeof(dummy_data), "Underlying assumption violated");
blob1.write_from_start(dummy_data, 10);
blob2.write_from_start(dummy_data, 10);
const int id1 = 42;
const int id2 = 42;
sql << "insert into soci_test (id, b) values(:id, :b)", soci::use(id1), soci::use(blob1);
sql << "insert into soci_test (id, b) values(:id, :b)", soci::use(id2), soci::use(blob2);
SECTION("into")
{
soci::blob intoBlob(sql);
sql << "select b from soci_test where id=:id", soci::use(id1), soci::into(intoBlob);
char buffer[20];
std::size_t written = intoBlob.read_from_start(buffer, sizeof(buffer));
CHECK(written == 10);
for (std::size_t i = 0; i < 10; ++i)
{
CHECK(buffer[i] == dummy_data[i]);
}
}
SECTION("move_as")
{
soci::rowset< soci::row > rowSet = (sql.prepare << "select b from soci_test where id=:id", soci::use(id1));
bool containedData = false;
for (auto it = rowSet.begin(); it != rowSet.end(); ++it)
{
containedData = true;
const soci::row &currentRow = *it;
soci::blob intoBlob = currentRow.move_as<soci::blob>(0);
CHECK(intoBlob.get_len() == 10);
char buffer[20];
std::size_t written = intoBlob.read_from_start(buffer, sizeof(buffer));
CHECK(written == 10);
for (std::size_t i = 0; i < 10; ++i)
{
CHECK(buffer[i] == dummy_data[i]);
}
}
CHECK(containedData);
}
SECTION("reusing bound blob")
{
int secondID = id2 + 1;
sql << "insert into soci_test(id, b) values(:id, :b)", soci::use(secondID), soci::use(blob2);
// Selecting the blob associated with secondID should yield the same result as selecting the one for id
soci::blob intoBlob(sql);
sql << "select b from soci_test where id=:id", soci::use(secondID), soci::into(intoBlob);
char buffer[20];
std::size_t written = intoBlob.read_from_start(buffer, sizeof(buffer));
CHECK(written == 10);
for (std::size_t i = 0; i < 10; ++i)
{
CHECK(buffer[i] == dummy_data[i]);
}
}
}
SECTION("Statements")
{
unsigned int id;
soci::blob myBlob(sql);
soci::statement insert_stmt = (sql.prepare << "insert into soci_test (id, b) values (:id, :b)", soci::use(id), soci::use(myBlob));
id = 1;
myBlob.write_from_start(dummy_data + id, id);
insert_stmt.execute(true);
id = 5;
myBlob.write_from_start(dummy_data + id, id);
insert_stmt.execute(true);
soci::statement select_stmt = (sql.prepare << "select id, b from soci_test order by id asc", soci::into(id), soci::into(myBlob));
char contents[16];
select_stmt.execute();
CHECK(select_stmt.fetch());
CHECK(id == 1);
std::size_t blob_size = myBlob.get_len();
CHECK(blob_size == id);
std::size_t read = myBlob.read_from_start(contents, blob_size);
CHECK(read == blob_size);
for (unsigned int i = 0; i < blob_size; ++i)
{
CHECK(contents[i] == dummy_data[id + i]);
}
CHECK(select_stmt.fetch());
CHECK(id == 5);
blob_size = myBlob.get_len();
CHECK(blob_size == id);
read = myBlob.read_from_start(contents, blob_size);
CHECK(read == blob_size);
for (unsigned int i = 0; i < blob_size; ++i)
{
CHECK(contents[i] == dummy_data[id + i]);
}
CHECK(!select_stmt.fetch());
}
}


@vadz In order to restore backwards compatibility with the .get<std::string>() API, we could try to implement some mechanism to use the underlying soci::blob object to read its entire content into a string and then return that.
However, for that to work we would have to make .get<T>() work when T and the underlying type are in principle incompatible (std::string and soci::blob in this case). I don't think this can be done by means of a simple cast, can it?
Also, we would likely want to emit some sort of deprecation warning as this seems to be less efficient (in general) that using the soci::blob object directly (as using it does not necessarily require reading the entire blob content into a string, which might become huge).

@Tectu
Copy link
Contributor Author

Tectu commented Jan 1, 2025

@Krzmbrzl Thank you for chiming in on this.

move_as() does not seem to be an available function when implementing a specialization of type_conversion (i.e. there is no base_type::move_as()).
What am I missing here?

@vadz vadz added this to the 4.1.0 milestone Jan 2, 2025
@vadz
Copy link
Member

vadz commented Jan 2, 2025

@vadz In order to restore backwards compatibility with the .get<std::string>() API, we could try to implement some mechanism to use the underlying soci::blob object to read its entire content into a string and then return that.

Yes, I'd like to do this, the code in the original code example looks fine to me and I think it's perfectly reasonable to represent (C)LOBs with std::string internally (even if std::vector<byte> would be better).

However, for that to work we would have to make .get<T>() work when T and the underlying type are in principle incompatible (std::string and soci::blob in this case).

Do you understand how this worked before?

I don't think this can be done by means of a simple cast, can it? Also, we would likely want to emit some sort of deprecation warning as this seems to be less efficient (in general) that using the soci::blob object directly (as using it does not necessarily require reading the entire blob content into a string, which might become huge).

I don't think this warrants a warning, there are plenty of use cases when your BLOBs are not all that huge where this is fine. And anybody working with gigabyte-sized BLOBs should hopefully already understand that loading them in a string is not the brightest idea.

@Krzmbrzl
Copy link
Contributor

Krzmbrzl commented Jan 2, 2025

@Tectu

move_as() does not seem to be an available function when implementing a specialization of type_conversion (i.e. there is no base_type::move_as()).
What am I missing here?

The move_as functions are direct member functions of the row class and are implemented in terms of type_conversions<T>::move_as():
Note the template specialization for blobs though:

soci/src/core/row.cpp

Lines 112 to 125 in c5678e1

template <>
blob row::move_as<blob>(std::size_t pos) const
{
typedef typename type_conversion<blob>::base_type base_type;
base_type & baseVal = holders_.at(pos)->get<base_type>(value_reference_tag{});
blob ret;
type_conversion<blob>::move_from_base(baseVal, *indicators_.at(pos), ret);
// Re-initialize blob object so it can be used in further queries
baseVal.initialize(ret.get_backend()->get_session_backend().make_blob_backend());
return ret;
}


@vadz

Do you understand how this worked before?

I had something in mind about BLOBs being recognized as string types internally, but I don't think that is actually true... I'm going to look this up and see if I can write a fix for this issue.

Krzmbrzl added a commit to Krzmbrzl/soci that referenced this issue Jan 2, 2025
"Suitable" means that the container uses contiguous storage in memory
and holds entries of type T where sizeof(T) == 1. If this is given, then
the blob's data can be copied into a new instance of such a container,
which is then returned in the row::get<>() call.

This restores the previous possibility of reading blobs into objects of
type std::string but also generalizes the concept to any suitable
container (such as std::vector<unsigned char>).

Note: The check for whether the given container object uses contiguous
object storage is based on its iterators being classified as
random-access iterators. This is not a perfect check as in principle
there could be random-access iterators that don't iterate a contiguous
memory region. However, for typical containers this is very unexpected
as this would typically lead to rather inefficient access patterns when
making use of the random-access capability.
A container that is wrongly determined to be contiguous will likely
result in a program crash due to a segfault.

Fixes SOCI#1173
@Krzmbrzl Krzmbrzl linked a pull request Jan 2, 2025 that will close this issue
@Krzmbrzl
Copy link
Contributor

Krzmbrzl commented Jan 2, 2025

Turns out that this has been a lot more straight forward than I expected :D

The issue at hand is fixed via #1189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants