Description
Describe the bug
Some application create vCard files with non-standard labels for phone numbers (and possibly for other properties like addresses, too).
Here’s an example as created by the contacts app in some Android versions (see https://issuetracker.google.com/issues/37093253):
TEL;X-CUSTOM(CHARSET=UTF-8,ENCODING=QUOTED-PRINTABLE,=74=68=69=73=20=69=73=20=61=20=63=75=73=74=6F=6D=20=74=79=70=65):392571421
In contrast, a vCard 2.1 line that specifies a phone number without a type (i.e. with the default type) would look like this:
TEL:392571421
… or using a pre-defined type:
TEL;CELL:392571421
When ingesting a vCard file that has custom labels/types, ingesting the entire file fails, even if the vCard file contains multiple contacts.
While custom labels/types such as the one in the first example seem to be non-standard, it might make sense to handle them anyway if they are fairly common. However, as we’re using an (unmaintained) third-party library to parse vCards (and parsing is non-trivial), it might be difficult to actually implement support for this. In that case, it would be great if Aleph could ignore only the lines or contacts it cannot parse while still ingesting well-formatted contacts.
To Reproduce
Steps to reproduce the behavior:
- Create a new investigation.
- Create a file with the following contents:
BEGIN:VCARD VERSION:2.1 FN:John Doe PHONE;CELL:+49123456789 END:VCARD BEGIN:VCARD VERSION:2.1 FN:Max Mustermann TEL;X-CUSTOM(CHARSET=UTF-8,ENCODING=QUOTED-PRINTABLE,=74=68=69=73=20=69=73=20=61=20=63=75=73=74=6F=6D=20=74=79=70=65):+49123456789 END:VCARD
- Upload the file to the investigation.
- Wait until the file has been ingested.
- You should be able to confirm that the file failed to be ingested properly. No
Person
entities have been created.
Expected behavior
- Ideally, the file is ingested successfully and two
Person
entities (each with a phone number) are created. - Alternatively, if that is not possible, two
Person
entities should be created, but the entity for "Max Mustermann" may not have a phone number (as the phone number uses a non-standard custom type). - Alternatively, if even the previous option is not possible, at least one
Person
entity for "John Doe" should be created as this vCard component uses only standard features.
Aleph version
3.15.5
Additional context
- Relevant ingestor: https://github.com/alephdata/ingest-file/blob/main/ingestors/email/vcard.py
- As a workaround, source vCard files can be pre-processed before ingestion by manually removing custom types. Aleph doesn’t handle these custom types anyway.