Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No country information is being stored from the BOLD dataset #9

Open
jrdh opened this issue Feb 19, 2025 · 4 comments
Open

No country information is being stored from the BOLD dataset #9

jrdh opened this issue Feb 19, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@jrdh
Copy link
Member

jrdh commented Feb 19, 2025

We have 18186363 specimens in the database but none of them have any country data stored against them. I would guess they've changed something in their schema (e.g. the field name) so that'll need investigating and then fixing.

@jrdh jrdh added the bug Something isn't working label Feb 19, 2025
@jrdh jrdh self-assigned this Feb 19, 2025
@bwprice
Copy link
Member

bwprice commented Feb 19, 2025

either: "country/ocean" or "country.ocean"

@bwprice
Copy link
Member

bwprice commented Feb 19, 2025

@jrdh
Copy link
Member Author

jrdh commented Feb 19, 2025

Or country_iso? I'd probably prefer to use that as it's more standardised but happy to take your opinions 😀

For reference, here's some summary counts from the latest BOLD public export for both those fields to give you an idea of what kind of data is in there:

country/ocean value counts
value count
Costa Rica 5813223
Canada 2744355
None 1345649
United States 891430
South Africa 852597
Australia 487657
Unrecoverable 290989
China 287565
United Kingdom 278118
Germany 275500
Pakistan 254051
Thailand 222780
Mexico 189124
Malaysia 185234
Ecuador 165396
Argentina 165137
Indonesia 151655
Norway 148371
Tanzania 134498
Brazil 130624
Japan 122634
Peru 121187
Madagascar 108675
France 102796
New Zealand 99542
Russia 92981
Bangladesh 89634
Finland 88988
Gabon 87705
Italy 81907
Papua New Guinea 81515
Kenya 80159
Spain 76473
India 76238
Philippines 75895
Honduras 67903
Ghana 67367
Colombia 66628
Panama 60869
Portugal 58608
Sweden 49858
Bulgaria 49464
Egypt 47015
French Guiana 46420
Austria 45452
Suriname 45418
Singapore 39708
Lebanon 39299
Vietnam 38600
Israel 38283
Greenland 37851
Greece 36453
Cameroon 34854
Turkey 31555
Puerto Rico 30892
South Korea 29496
Poland 29444
Saudi Arabia 27270
Antarctica 25853
Chile 25253
Taiwan 24452
Atlantic Ocean 22931
Iran 22890
Pacific Ocean 22762
Mozambique 21375
Georgia 20526
Belarus 17733
Montenegro 17548
Switzerland 17503
Sao Tome and Principe 15548
Netherlands 14364
French Polynesia 13940
Czech Republic 13870
Hungary 13083
Democratic Republic of the Congo 12708
Uganda 12049
Croatia 11849
Morocco 11779
New Caledonia 11665
Bolivia 11039
Slovakia 10473
Guyana 10429
Namibia 9817
Belgium 9648
Romania 8990
Belize 8375
Ethiopia 8293
Southern Ocean 7740
Denmark 7729
Nigeria 7624
Zambia 7449
Myanmar 7319
Brunei 6652
Ukraine 6646
Mongolia 6629
Kyrgyzstan 6399
Kazakhstan 6226
Sri Lanka 6152
Slovenia 6086
Venezuela 5890
Guatemala 5726
North Sea 5683
Vanuatu 5498
Tunisia 5134
Laos 4837
Dominican Republic 4736
Cuba 4431
North Macedonia 4278
Cyprus 4036
Cambodia 3967
Serbia 3926
Central African Republic 3908
Reunion 3845
Lithuania 3779
Bhutan 3734
Algeria 3674
Malawi 3648
Antigua and Barbuda 3594
Nepal 3584
Bahamas 3573
Solomon Islands 3549
Indian Ocean 3519
Iraq 3511
Oman 3416
Senegal 3394
Ireland 3279
North Pacific Ocean 3223
Palau 3072
Albania 2974
Republic of the Congo 2918
Cote d'Ivoire 2897
Paraguay 2832
Iceland 2809
Liberia 2797
Yemen 2787
Fiji 2774
Seychelles 2760
Zimbabwe 2605
Guinea 2557
Estonia 2478
Uzbekistan 2470
Angola 2420
Martinique 2390
Armenia 2387
Jamaica 2306
Nicaragua 2279
Mauritius 2251
United Arab Emirates 2010
Uruguay 1987
Mali 1976
Guadeloupe 1946
Tajikistan 1903
Arctic Ocean 1835
Comoros 1741
Latvia 1702
Bosnia and Herzegovina 1663
North Atlantic Ocean 1628
Trinidad and Tobago 1623
Benin 1603
Malta 1584
Azerbaijan 1555
Sudan 1443
Luxembourg 1435
Exception - Zoological Park 1427
Cape Verde 1423
British Indian Ocean Territory 1340
Djibouti 1316
Bermuda 1290
Curacao 1268
Guam 1248
Jordan 1194
El Salvador 1176
Botswana 1037
Rwanda 1024
Tonga 944
Mediterranean Sea 939
Afghanistan 917
South Atlantic Ocean 916
Mayotte 869
United States Virgin Islands 867
Mauritania 858
Burkina Faso 848
Maldives 800
Kuwait 765
Micronesia 761
Somalia 760
South Sudan 754
Syria 739
Equatorial Guinea 684
Sierra Leone 665
Moldova 664
Togo 647
Burundi 619
Baltic Sea 590
Turkmenistan 586
Dominica 574
Samoa 559
Cayman Islands 507
Kiribati 481
Lesotho 468
Northern Mariana Islands 467
Saint Helena Ascension and Tristan da Cunha 464
Tasman Sea 451
South Georgia and the South Sandwich Islands 440
South China Sea 434
Timor-Leste 434
Niger 397
Qatar 396
South Pacific Ocean 332
Eswatini 318
British Virgin Islands 284
Haiti 282
Cook Islands 271
Gulf of Mexico 260
Gambia 259
Grenada 257
Saint Kitts and Nevis 244
Exception - Quarantine Capture 230
Kosovo 226
Libya 225
Andorra 200
Aruba 185
Saint Vincent and the Grenadines 181
Barbados 179
Falkland Islands 179
Chad 174
Guinea-Bissau 173
Tokelau 155
Saint Lucia 153
Marshall Islands 142
North Korea 137
Norfolk Island 113
Sint Maarten 103
Exception - Culture 102
Liechtenstein 92
Eritrea 80
Faeroe Islands 72
Tuvalu 59
Caribbean Sea 57
Bahrain 56
Exception - Laboratory Colony 48
Isle of Man 46
San Marino 41
Niue 38
Nauru 33
Exception - Cultivated 31
Monaco 28
Spratly Islands 26
Montserrat 25
Exception - Aquarium 25
Gibraltar 24
Jersey 17
Anguilla 15
Pitcairn Islands 12
Guernsey 8
506 1
Mu village 1
None 1
COI-5P 1
Caspian Sea 1
country_iso value counts
value count
CR 5813223
CA 2744355
None 1652608
US 891430
ZA 852597
AU 487657
CN 287565
GB 278118
DE 275500
PK 254051
TH 222780
MX 189124
MY 185234
EC 165396
AR 165137
ID 151655
NO 148371
TZ 134498
BR 130624
JP 122634
PE 121187
MG 108675
FR 102796
NZ 99542
RU 92981
BD 89634
FI 88988
GA 87705
IT 81907
PG 81515
KE 80159
ES 76473
IN 76238
PH 75895
HN 67903
GH 67367
CO 66628
PA 60869
59222
PT 58608
SE 49858
BG 49464
EG 47015
GF 46420
AT 45452
SR 45418
SG 39708
LB 39299
VN 38600
IL 38283
GL 37851
GR 36453
CM 34854
TR 31555
PR 30892
KR 29496
PL 29444
SA 27270
AQ 25853
CL 25253
TW 24452
IR 22890
MZ 21375
GE 20526
BY 17733
ME 17548
CH 17503
ST 15548
NL 14364
PF 13940
CZ 13870
HU 13083
CD 12708
UG 12049
HR 11849
MA 11779
NC 11665
BO 11039
SK 10473
GY 10429
NA 9817
BE 9648
RO 8990
BZ 8375
ET 8293
DK 7729
NG 7624
ZM 7449
MM 7319
BN 6652
UA 6646
MN 6629
KG 6399
KZ 6226
LK 6152
SI 6086
VE 5890
GT 5726
VU 5498
TN 5134
LA 4837
DO 4736
CU 4431
MK 4278
CY 4036
KH 3967
RS 3926
CF 3908
RE 3845
LT 3779
BT 3734
DZ 3674
MW 3648
AG 3594
NP 3584
BS 3573
SB 3549
IQ 3511
OM 3416
SN 3394
IE 3279
PW 3072
AL 2974
CG 2918
CI 2897
PY 2832
IS 2809
LR 2797
YE 2787
FJ 2774
SC 2760
ZW 2605
GN 2557
EE 2478
UZ 2470
AO 2420
MQ 2390
AM 2387
JM 2306
NI 2279
MU 2251
AE 2010
UY 1987
ML 1976
GP 1946
TJ 1903
KM 1741
LV 1702
BA 1663
TT 1623
BJ 1603
MT 1584
AZ 1555
SD 1443
LU 1435
CV 1423
IO 1340
DJ 1316
BM 1290
CW 1268
GU 1248
JO 1194
SV 1176
BW 1037
RW 1024
TO 944
AF 917
YT 869
VI 867
MR 858
BF 848
MV 800
KW 765
FM 761
SO 760
SS 754
SY 739
GQ 684
SL 665
MD 664
TG 647
BI 619
TM 586
DM 574
WS 559
KY 507
KI 481
LS 468
MP 467
SH 464
GS 440
TL 434
NE 397
QA 396
SZ 318
VG 284
HT 282
CK 271
GM 259
GD 257
KN 244
XK 226
LY 225
AD 200
AW 185
VC 181
BB 179
FK 179
TD 174
GW 173
TK 155
LC 153
MH 142
KP 137
NF 113
SX 103
LI 92
ER 80
FO 72
TV 59
BH 56
IM 46
SM 41
NU 38
NR 33
MC 28
MS 25
GI 24
JE 17
AI 15
PN 12
GG 8
Canada 1
None 1

@bwprice
Copy link
Member

bwprice commented Feb 19, 2025

I'm happy with country_iso :)

jrdh added a commit that referenced this issue Feb 20, 2025
At some point, BOLD changed the country fields so there's now "country/ocean" and "country_iso". Country ISO is more standard and easier to search so let's go with that (see issue for conversation with Ben).

Closes: #9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants