This extension is designed to simplify working with domains, URIs, and web paths directly within your database queries. Whether you're extracting top-level domains (TLDs), parsing URI components, or analyzing web paths, Netquack provides a suite of intuitive functions to handle all your network tasks efficiently. Built for data engineers, analysts, and developers.
With Netquack, you can unlock deeper insights from your web-related datasets without the need for external tools or complex workflows.
Table of Contents
- DuckDB Netquack Extension
- Installation π
- Usage Examples π
- Roadmap πΊοΈ
- Contributing π€
- Issues π
netquack is distributed as a DuckDB Community Extension and can be installed using SQL:
INSTALL netquack FROM community;
LOAD netquack;
If you previously installed the netquack
extension, upgrade using the FORCE command
FORCE INSTALL netquack FROM community;
LOAD netquack;
Once installed, the macro functions provided by the extension can be used just like built-in functions.
This function extracts the main domain from a URL. For this purpose, the extension will get all public suffixes from the publicsuffix.org list and extract the main domain from the URL.
The download process of the public suffix list is done automatically when the function is called for the first time. After that, the list is stored in the public_suffix_list
table to avoid downloading it again.
D SELECT extract_domain('a.example.com') AS domain;
βββββββββββββββ
β domain β
β varchar β
βββββββββββββββ€
β example.com β
βββββββββββββββ
D SELECT extract_domain('https://b.a.example.com/path') AS domain;
βββββββββββββββ
β domain β
β varchar β
βββββββββββββββ€
β example.com β
βββββββββββββββ
You can use the update_suffixes
function to update the public suffix list manually.
D SELECT update_suffixes();
βββββββββββββββββββββ
β update_suffixes() β
β varchar β
βββββββββββββββββββββ€
β updated β
βββββββββββββββββββββ
This function extracts the path from a URL.
D SELECT extract_path('https://b.a.example.com/path/path') AS path;
ββββββββββββββ
β path β
β varchar β
ββββββββββββββ€
β /path/path β
ββββββββββββββ
D SELECT extract_path('example.com/path/path/image.png') AS path;
ββββββββββββββββββββββββ
β path β
β varchar β
ββββββββββββββββββββββββ€
β /path/path/image.png β
ββββββββββββββββββββββββ
This function extracts the host from a URL.
D SELECT extract_host('https://b.a.example.com/path/path') AS host;
βββββββββββββββββββ
β host β
β varchar β
βββββββββββββββββββ€
β b.a.example.com β
βββββββββββββββββββ
D SELECT extract_host('example.com:443/path/image.png') AS host;
βββββββββββββββ
β host β
β varchar β
βββββββββββββββ€
β example.com β
βββββββββββββββ
This function extracts the schema from a URL. Supported schemas for now:
http
|https
ftp
mailto
tel
|sms
D SELECT extract_schema('https://b.a.example.com/path/path') AS schema;
βββββββββββ
β schema β
β varchar β
βββββββββββ€
β https β
βββββββββββ
D SELECT extract_schema('mailto:[email protected]') AS schema;
βββββββββββ
β schema β
β varchar β
βββββββββββ€
β mailto β
βββββββββββ
D SELECT extract_schema('tel:+123456789') AS schema;
βββββββββββ
β schema β
β varchar β
βββββββββββ€
β tel β
βββββββββββ
This function extracts the query string from a URL.
D SELECT extract_query_string('example.com?key=value') AS query;
βββββββββββββ
β query β
β varchar β
βββββββββββββ€
β key=value β
βββββββββββββ
D SELECT extract_query_string('http://example.com.ac/path/?a=1&b=2&') AS query;
ββββββββββββ
β query β
β varchar β
ββββββββββββ€
β a=1&b=2& β
ββββββββββββ
This function extracts the port from a URL.
D SELECT extract_port('https://example.com:8443/') AS port;
βββββββββββ
β port β
β varchar β
βββββββββββ€
β 8443 β
βββββββββββ
D SELECT extract_port('[::1]:6379') AS port;
βββββββββββ
β port β
β varchar β
βββββββββββ€
β 6379 β
βββββββββββ
This function extracts the file extension from a URL. It will return the file extension without the dot.
D SELECT extract_extension('http://example.com/image.jpg') AS ext;
βββββββββββ
β ext β
β varchar β
βββββββββββ€
β jpg β
βββββββββββ
This function extracts the top-level domain from a URL. This function will use the public suffix list to extract the TLD. Check the Extracting The Main Domain section for more information about the public suffix list.
D SELECT extract_tld('https://example.com.ac/path/path') AS tld;
βββββββββββ
β tld β
β varchar β
βββββββββββ€
β com.ac β
βββββββββββ
D SELECT extract_tld('a.example.com') AS tld;
βββββββββββ
β tld β
β varchar β
βββββββββββ€
β com β
βββββββββββ
This function extracts the sub-domain from a URL. This function will use the public suffix list to extract the TLD. Check the Extracting The Main Domain section for more information about the public suffix list.
D SELECT extract_subdomain('http://a.b.example.com/path') AS dns_record;
ββββββββββββββ
β dns_record β
β varchar β
ββββββββββββββ€
β a.b β
ββββββββββββββ
D SELECT extract_subdomain('test.example.com.ac') AS dns_record;
ββββββββββββββ
β dns_record β
β varchar β
ββββββββββββββ€
β test β
ββββββββββββββ
This function returns the Tranco rank of a domain. You have an update_tranco
function to update the Tranco list manually.
D SELECT update_tranco(true);
βββββββββββββββββββββββββββββββββββββββ
β update_tranco(CAST('f' AS BOOLEAN)) β
β varchar β
βββββββββββββββββββββββββββββββββββββββ€
β Tranco list updated β
βββββββββββββββββββββββββββββββββββββββ
This function will get the latest Tranco list and save it into the tranco_list
table. There will be a tranco_list_%Y-%m-%d.csv
file in the current directory after the function is called. The extension will use this file to prevent downloading the list again.
You can ignore the file and force the extension to download the list again by calling the function with true
as a parameter. If you don't want to download the list again, you can call the function with false
as a parameter.
D SELECT update_tranco(false);
As the latest Tranco list is for the last day, you can download your list manually and rename it to tranco_list_%Y-%m-%d.csv
to use it with the extension too.
You can use this function to get the ranking of a domain:
D SELECT get_tranco_rank('microsoft.com') AS rank;
βββββββββββ
β rank β
β varchar β
βββββββββββ€
β 2 β
βββββββββββ
D SELECT get_tranco_rank('cloudflare.com') AS rank;
βββββββββββ
β rank β
β varchar β
βββββββββββ€
β 13 β
βββββββββββ
You can use the get_tranco_rank_category
function to retrieve the category utility column that gives you the domain's rank category. The category
value is on a log10 scale with half steps (e.g., top 1k, top 5k, top 10k, top 50k, top 100k, top 500k, top 1M, top 5m, etc.), with each rank excluding the previous (e.g., top 5k is actually 4k domains, excluding top 1k).
D SELECT get_tranco_rank_category('microsoft.com') AS category;
ββββββββββββ
β category β
β varchar β
ββββββββββββ€
β top1k β
ββββββββββββ
This extension provides various functions for manipulating and analyzing IP addresses, including calculating networks, hosts, and subnet masks.
The ipcalc
function takes an IP address and netmask and calculates the resulting broadcast, network, wildcard mask, and host range.
It's a table function that provides various details about IP addresses, including:
- Address
- Netmask
- Wildcard
- Network / Hostroute
- HostMin
- HostMax
- Broadcast
- Hosts count
You can use this table function with your data easily:
D CREATE OR REPLACE TABLE ips AS SELECT '127.0.0.1' AS ip UNION ALL SELECT '192.168.1.0/22';
D SELECT i.IP,
(
SELECT hostsPerNet
FROM ipcalc(i.IP)
) AS hosts
FROM ips AS i;
ββββββββββββββββββ¬ββββββββ
β ip β hosts β
β varchar β int64 β
ββββββββββββββββββΌββββββββ€
β 127.0.0.1 β 254 β
β 192.168.1.0/22 β 1022 β
ββββββββββββββββββ΄ββββββββ
Warning
It's an experimental function.
You can use the netquack_version
function to get the extension version.
D select * from netquack_version();
βββββββββββ
β version β
β varchar β
βββββββββββ€
β v1.2.0 β
βββββββββββ
- Create a
TableFunction
forextract_query_parameters
that return each key-value pair as a row. - Implement
extract_custom_format
function - Implement
parse_uri
function - Save Tranco data as Parquet
- Implement GeoIP functionality
- Return default value for
get_tranco_rank
- Support internationalized domain names (IDNs)
Don't be shy and reach out to us if you want to contribute π
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request
Each project may have many problems. Contributing to the better development of this project by reporting them. π