Introduction
Welcome to Endatabas!
Endatabas is a SQL document database with complete history. It will store anything, forever.
The best way to understand Endatabas is to use it. Head on over to the Quickstart to try it out.
If you're not ready to run your first query yet, there's plenty of explanatory material in our What? and Why? documents.
Elsewhere
If you came to the Endatabas book directly, you may also be interested in:
Tutorial
This quick tutorial intends to walk you through the core aspects of Endb.
By the end, you will have endb
installed (if you want) and you will have
used it to learn some of what it can do.
Warning: Early Beta
Endb is still in early beta.
While in beta, new versions are not guaranteed to have binary-compatible storage with previous versions. We encourage you to experiment with Endb but please do not use it in production until a General Availability release.
Quickstart
Install Endb
mkdir -p endb_data
docker pull endatabas/endb
docker run --rm -p 3803:3803 -v endb_data:/app/endb_data docker.io/endatabas/endb
See Installation for more ways to install
and build endb
.
See Try It! for methods of connecting to Endb and running queries.
Try Online
If you do not want to install a full Endb instance, you can try the Live Demo instead.
The Wasm Console runs Endb directly in your web browser, so you can ignore the steps in Try It! and jump straight to Endb SQL Basics.
NOTE: The Wasm Console is not a full Endb instance. It does not provide APIs and has other performance and infrastructure limitations.
Try It!
You can send SQL statements to endb
over HTTP or WebSockets.
Any HTTP or WebSocket client will do, and Endb ships with a console
and client libraries.
curl
Our first couple examples will use curl, which you probably already have installed, to send queries directly to the API.
curl -d "INSERT INTO users (name) VALUES ('Tianyu')" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
curl -d "SELECT * FROM users" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
You don't need to use the API diretly if there is a client library available for your language of choice, but it is helpful to know that the underlying API is human-readable and based on open standards. Read more in the full HTTP API docs.
Console
Endb provides a small console for writing Endb SQL directly:
pip install endb_console
endb_console # connects to localhost by default
Assuming you inserted a user with curl above, you can query that table directly:
-> SELECT * FROM users;
Read more in the Console doc.
Client Libraries
Endb ships with permissively-licensed (MIT) client libraries:
pip install endb
npm install @endatabas/endb
You can copy and modify their source code for any purpose. Read more about how to use them in the client libraries docs.
Learning Endb SQL
You can use any of these tools (or any other HTTP client you prefer) for the rest of this tutorial.
Endb SQL Basics
If you know SQL, Endb SQL will feel instantly familiar. It is not "SQL-like". It is SQL. However, Endb SQL is dynamic, strongly-typed, time-aware, and shuns language embedding. Hopefully it is pleasant to use without feeling foreign.
Just Begin
Endb is a schemaless document database.
You do not need CREATE TABLE
— tables are dynamically created when you insert data.
The following SQL is valid as soon as you start endb
:
INSERT INTO posts (id, user_id, text) VALUES (123, 456, 'Hello World');
Try querying it out:
SELECT * FROM posts;
Immutable
Endb is immutable, so it does not permit destructive UPDATE
or DELETE
.
For example, if you run an UPDATE
, your previous INSERT
isn't lost.
Before we update the record, we'll make note of the current time, according to the database.
(Any time after the INSERT
and before the UPDATE
would suffice.)
-- make note of this time to use below
SELECT CURRENT_TIMESTAMP;
Multiple statements can be separated by semicolons. This time, we'll update the record and view it at once:
UPDATE posts SET text = 'Hello Immutable World' WHERE id = 123;
SELECT * from posts;
You'll note that Hello World
from your original insert isn't visible.
That's because it only exists in the past and, by default, SELECT
will show the state of the database as of now.
To see the old version, you can time-travel back to a time when the old record was visible.
Copy the timestamp you noted, without the quotes, something like
SELECT * from posts FOR SYSTEM_TIME AS OF 2024-01-01T00:00:00.000000Z;
:
SELECT * from posts FOR SYSTEM_TIME AS OF {YOUR_NOTED_TIMESTAMP};
NOTE: Although there is no destructive DELETE
in an immutable database,
there is ERASE
,
which exists to remove data for user safety and compliance with laws like GDPR.
Dynamic Joins
Relationships are also dynamic.
You can join any two tables on any two columns.
Adding a user with id 456
allows a join with the previous posts
table.
INSERT INTO users (id, name) VALUES (456, 'Vikram');
SELECT * from posts p JOIN users u ON p.user_id = u.id;
Semi-Structured Data
Endb allows you to insert asymmetrical, jagged data. Let's add another user with more columns.
INSERT INTO users (id, name, email) VALUES (789, 'Daniela', 'daniela@endatabas.com');
SELECT * from users;
Note that the SELECT *
is an implicitly dynamic query.
It doesn't have any difficulty with the previous user
document, even though it lacked an email
column.
In practice, most applications and SQL queries should specify exactly the columns they want to query.
SELECT *
is really only for exploratory queries, so it shows you everything visible in the table.
Data "Migration"
It may seem strange to leave jagged columns lying around. Endb doesn't discourage you from cleaning up your data, if you can:
UPDATE users SET email = 'vikram@stockholm.se' WHERE name = 'Vikram';
SELECT * from users;
The difference in Endb is that we haven't "migrated" the old data — it's still there.
If you query for Vikram's user
document as of 2 minutes ago, you will see the old record without an email
.
Queries in Endb always default to as-of-now, which is why the results of the query above shouldn't be surprising.
Nested Data
Endb eschews JSONB columns in favour of a native, strongly-typed, document-relational model.
INSERT INTO users (id, name, friends)
VALUES (123, 'Anastasia', [{name: 'Heikki', country: 'Finland'},
{name: 'Amit', country: 'Japan'}]);
SELECT users.friends[1] FROM users WHERE id = 123;
The users.friends[1]
expression above is a path expression.
A detailed explanation of Endb's path navigation is provided in the
SQL Reference Path Navigation docs.
Documents
Because of Endb's native document-relational model, rows are documents and vice-versa.
You can use an INSERT
statement to add a document directly to the database:
INSERT INTO users {id: 890,
name: 'Aaron',
friends: [{name: 'Jeff', country: 'Canada'},
{name: 'Kaia', country: 'Japan'}]};
Error Messages
Endb will always do its best to provide you with meaningful error messages that point you to a solution:
SELECT * FROM im_not_here;
Learn More
Much more detail on Endb SQL is provided in the SQL Reference.
SQL Reference
The SQL Reference contains details and edge cases about Endb SQL.
NOTE: The Endb SQL dialect is under active development.
While most major features of Endb SQL are stable, some more esoteric
features (MATCH
, path navigation, etc.) may experience naming and
semantics changes before Endb 1.0 is released.
- Intention
- Data Manipulation
- Queries
- SQL Data Types
- Operators
- Functions
- Time Queries
- Path Navigation
- Schema
- Views
- Assertions
Intention
The goal of Endb's SQL dialect is to be small, coherent, and powerful. The SQL specification is massive, with pages ordering in the thousands. Rather than implement the entire SQL specification from scratch (a gargantuan task) or mimic the SQL dialect of another database, Endb chooses a tiny core and builds powerful, composable features on top of that.
This tiny core draws inspiration from many sources, but SQLite in particular. If SQLite supports a particular operator or function, Endb SQL also tries to.
Endb SQL also draws strong inspiration from the SQL specification itself (and its predecessors) and from PostgreSQL. Endb SQL's nested data is also heavily inspired by JSONPath, SQL/JSON, and their derivatives found in major SQL databases.
Light inspiration is drawn from PartiQL, SQL++, and XQuery.
For more information on Endb's influences, please see our bibliography.
Data Manipulation
Creating, updating, and deleting data in Endb is done using standard SQL Data Manipulation Language (DML). Endb is also immutable and schemaless, so it contains a number of shorthands and document-oriented conveniences.
Endb does not require any Data Definition Language (DDL), such as CREATE TABLE
.
INSERT
To create a new document, you can use the standard SQL INSERT
command.
INSERT INTO products (product_no, name, price) VALUES (1, 'Tofu', 7.99);
To create multiple new documents at once, delimit their value lists with commas:
INSERT INTO products (product_no, name, price) VALUES (1, 'Butter', 5.99), (2, 'Margarine', 4.99);
It is also possible to insert a document directly using an
OBJECT
literal.
INSERT INTO products {product_no: 3, name: 'Tea', price: 3.99};
To insert multiple documents directly, delimit documents with commas:
INSERT INTO products {name: 'Coffee', price: 3.99}, {name: 'Croissant', price: 2.99};
It is possible to insert the results of a query:
INSERT INTO cheap_products SELECT * FROM products WHERE price < 4.00;
UPDATE
To update an existing row, you can use the standard SQL UPDATE
command:
UPDATE products SET price = 4.99 WHERE name = 'Coffee';
Set multiple columns by separating them with commads;
UPDATE products SET price = 4.99, name = 'Kaapi' WHERE name = 'Coffee';
Because Endb is schemaless, each document (or row) has its own schema.
As a result, you may want to remove a column from an individual row.
You can do this with the UNSET
operator:
UPDATE products UNSET product_no WHERE name = 'Coffee';
UPDATE products REMOVE product_no WHERE name = 'Coffee';
REMOVE
is an alias for UNSET
.
It is possible to set and unset values in a single update. Unsetting a column which doesn't exist is not an error:
UPDATE products SET price = 5.98 UNSET product_no WHERE name = 'Coffee';
UPDATE PATCH
Endb provides a PATCH
operator, similar to the PATCH
function.
The PATCH
operator is used in conjunction with UPDATE
to set fields on a document (columns on a row) in a declarative fashion.
UPDATE products PATCH {price: 1.98, product_no: products.product_no + 1000} WHERE price = 2.00;
PATCH
is based on RFC 7386: JSON Merge Patch.
UPDATE SET $path
The SET
operator permits paths
on its left-hand side.
The behaviour of the form UPDATE <table> SET <path> = <value>
is identical to that of the path_set
function.
UPDATE users SET $.addresses[0].city = 'Chicago' WHERE name = 'Steven';
UPDATE UNSET $path
The UNSET
(synonym: REMOVE
) operator permits
paths on its left-hand side.
The behaviour of the form UPDATE <table> UNSET <path> = <value>
is identical to that of the path_remove
function.
UPDATE users UNSET $.addresses[0].city WHERE name = 'Steven';
DELETE
To delete an existing row, use the standard SQL DELETE
command.
DELETE FROM products WHERE price = 5.98;
You may delete all rows from a table by eliding the WHERE
clause:
DELETE FROM products;
Note: In Endb, DELETE
does not remove any data.
It is always possible to view data prior to the DELETE
with
time queries.
If you need to remove data for compliance (with laws such as GDPR or PIPEDA),
use ERASE
.
ON CONFLICT (Upsert)
Endb provides flexible upserts with the common ON CONFLICT
clause.
When the INSERT
command detects a conflict, it will perform the instructions in the DO
clause.
The following command needs to be executed twice to see the upsert effect.
INSERT INTO products {name: 'Pepper', price: 9.99} ON CONFLICT (name, price) DO UPDATE SET v = 2;
To specify no operation on conflict, use DO NOTHING
:
INSERT INTO products {name: 'Pepper', price: 9.99} ON CONFLICT (name, price) DO NOTHING;
To reference the document currently being inserted, the DO
clause provides
a statement-local relation named excluded
.
INSERT INTO products {name: 'Salt', price: 6};
INSERT INTO products {name: 'Salt', price: 7} ON CONFLICT (name) DO UPDATE SET price = excluded.price;
Similarly, the existing table is still available in the DO
clause to provide further filtering:
INSERT INTO products {product_no: 99, name: 'Cumin', price: 3.00, v: 5};
INSERT INTO products {product_no: 99, name: 'Cumin', price: 5.00, v: 6} ON CONFLICT (product_no, name) DO UPDATE SET price = excluded.price, v = excluded.v WHERE products.v < 6;
ERASE
ERASE
completely removes documents (rows) from visibility to any queries.
Once a document has been erased, it is no longer possible to query for it at all.
ERASE FROM products WHERE name = 'Salt';
WITH (Common Table Expressions)
The WITH
keyword can create Common Table Expressions (CTEs) for DML, just as it can
for queries.
WITH top_margin_products AS (SELECT product_no, name FROM products WHERE (price - cost) > 5.00)
INSERT INTO banner_products SELECT product_no, name FROM top_margin_products;
Parameters
Parameters to DML are documented under the HTTP API.
Transactions
Transactions in Endb are implicit.
Run multiple DML statements in a single transaction by providing multiple statements
(delimited by semicolons) to a single POST
to the HTTP API.
Queries
Endb SQL strives to empower the query author. Basic SQL queries should be familiar. Advanced SQL queries should be simple.
SELECT *
The most common hand-written SQL query tends to be the easiest:
SELECT * FROM products;
Without a WHERE
clause (discussed below) the entire table is returned.
Because Endb is schemaless, *
has a special meaning.
Each document (row) in Endb carries with it its own schema.
As a result, *
refers to the widest possible set of columns, across all the rows returned.
In many SQL dialects, columns are strictly ordered.
Because Endb columns cannot have a strict order, *
returns them in alphabetical order.
You can select the widest set of columns for a specific table with <table>.*
:
SELECT p.* FROM products p JOIN coupons c ON p.price = c.price;
SELECT
In programmatic environments, it is almost always preferable to query for specific columns by name:
SELECT product_no, price FROM products;
Select a list of columns by delimiting them with commas.
SELECT product_no, v, price, name FROM products;
Select a column with spaces by delimiting it with backticks or square braces:
SELECT products['product no'] FROM products;
SELECT `product no` FROM products;
NOTE: Whether or not your Endb client respects column ordering is dependent on the content type it uses in Accept Headers. It is worth reading over the Accept Header documentation, in this regard.
Limit query results to distinct rows or expand to all rows with DISTINCT
and ALL
.
SELECT DISTINCT name FROM products;
SELECT ALL product_no FROM products p JOIN sales s ON p.name = s.name;
FROM
Alias Tables
For convenience, tables can be given aliases immediately following their name in the FROM
clause.
SELECT p.name FROM products p;
The AS
operator can also (optionally) be used to provide an alias for a table.
SELECT p.name FROM products AS p;
More usefully, it can give a temporary table name to an expression.
The temporary table name can either have anonymous columns or named columns.
(The VALUES
keyword is explained in the following
VALUES Lists section.)
SELECT p.column1 FROM (VALUES ('Paprika', 4.77)) AS p;
SELECT p.price FROM (VALUES ('Paprika', 4.77)) AS p(name, price);
Alias Columns
The AS
keyword is also used to alias columns.
This is useful when column names conflict in a join.
(Joins are explained below.)
If the same column is specified more than once, the last reference to that column name
is the one which will be returned:
SELECT c.price, p.price FROM products p JOIN coupons c ON p.name = c.name;
If both columns are required, AS
can be used to rename one or both of the columns:
SELECT p.price AS regular_price, c.price FROM products p JOIN coupons c ON p.name = c.name;
JOIN
Because Endb is schemaless, documents (rows) can be joined
on any fields (columns) which have equivalent values.
Joins are written in the form
<table1> JOIN <table2> ON <table1>.<column1> <operator> <table2>.<column2>
.
Join operators are generally
comparisons or
booleans.
INSERT INTO coupons {name: 'Salt', price: 3.0};
SELECT * FROM products p JOIN coupons c ON p.name = c.name;
LEFT JOIN
, LEFT OUTER JOIN
, INNER JOIN
, and CROSS JOIN
are all supported.
NOTE: Endb does not restrict name clashes in unqualified column selection or SELECT *
.
If two tables share the same column name, the results from the column of the second table
in the join will be visible in the returned result, but not the results of the first.
USING
When the columns to join share the same name between two tables, USING
is a shorthand
that permits joining on the equality of those two columns.
A USING
query also supports unambiguous use of the unqualified column name in the SELECT
clause.
SELECT project_name, users.name, projects.budget FROM users JOIN projects USING (project_name);
WHERE (Filtering)
Rather than returning the entire table, documents (rows) can be filtered with a WHERE
clause.
SELECT * FROM products WHERE price > 4;
Advanced Filtering
More advanced filters are documented in Operators and Functions.
ORDER BY (Sorting Results)
Results from queries can be ordered with standard SQL ORDER BY
.
SELECT * FROM products ORDER BY price;
By default, ordering is ascending.
For descending order, suffix the ORDER BY
clause with DESC
:
SELECT * FROM products ORDER BY price DESC;
To force ascending order, use ASC
:
SELECT * FROM products ORDER BY price ASC;
It is also possible to order by an expression:
SELECT * FROM products ORDER BY LENGTH(name);
SELECT * FROM products ORDER BY -price;
In the example above, LENGTH
is an example of a function.
A complete list of functions can be found in the Functions documentation.
GROUP BY
GROUP BY
accepts a list of columns and creates aggregated rows based on
each of those columns, in order.
Each aggregate is returned as a single row.
Each column returned must either be a column specified in the GROUP BY
clause or a column created with an aggregate function,
such as SUM
.
SELECT name, price FROM products GROUP BY name, price;
SELECT name, SUM(price) FROM products GROUP BY name;
HAVING
HAVING
adds a search condition to an aggregate query.
SELECT name, SUM(price) FROM products GROUP BY name HAVING LENGTH(name) > 4;
It is most often used with GROUP BY
(seen above), but it is also legal
with other aggregates:
SELECT SUM(products.price) FROM products HAVING SUM(products.price) = 13;
LIMIT
LIMIT
specifies the maximum number of rows to be returned by the query.
SELECT * FROM products LIMIT 2;
It always makes sense to control the order of returned rows so LIMIT
always returns the same rows for the same query
-- unless you don't care which rows are returned.
SELECT * FROM products ORDER BY price ASC LIMIT 2;
OFFSET
allows queries to skip rows before returning a limited set.
SELECT * FROM products ORDER BY price ASC LIMIT 2 OFFSET 2;
VALUES Lists
The VALUES
keyword is used to create a static table of documents (rows).
Each row is denoted by a pair of parentheses.
All rows must have Union Compatibility
which, for Endb, means they have the same number of columns.
VALUES (1, 'Salt'), (2, 'Pepper'), (3, 'Vinegar');
Endb assigns anonymous columns the names column1
, column2
, etc.
Columns can instead be given names with a
table alias:
SELECT * FROM (VALUES (1, 'Salt'), (2, 'Pepper'), (3, 'Vinegar')) AS t (product_no, name);
OBJECTS Lists
The OBJECTS
keyword is used to create a static table comprised of
object literals, each representing a document (row).
Each row is directly denoted by an object literal.
OBJECTS
lists do not require Union Compatibility,
so jagged lists are permitted.
OBJECTS {product_no: 1, name: 'Salt'}, {product_no: 2, name: 'Pepper', price: 3.99};
SELECT * FROM (OBJECTS {product_no: 1, name: 'Salt'}, {product_no: 2, name: 'Pepper'}) as t;
Set Operations: UNION, INTERSECT, EXCEPT
The set operations union, intersection, and difference are available to the results of two queries.
UNION
Append the results of one query to another. Duplicate rows are removed.
SELECT * FROM products UNION SELECT * FROM new_products;
To keep duplicate rows, use UNION ALL
:
SELECT * FROM products UNION ALL SELECT * FROM new_products;
INTERSECT
The intersection of two queries returns results which are found in both.
SELECT * FROM products INTERSECT SELECT * FROM new_products;
EXCEPT
The difference of two queries returns only results from the first query which are not found in the second. Another way of thinking about this is that results of the second query are removed from the first.
SELECT * FROM products EXCEPT SELECT * FROM ignored_products;
Union-Compatibility
"Union Compatibility" refers to the ability of two queries to be used in a union, intersection, or difference. Because Endb is dynamically-typed, the only constraint on union compatibility is the number of columns returned.
In general, it only makes sense to use set operations on two queries which return either: (1) explicit columns, so
order and naming are respected or (2) columns with the same names, so they are guaranteed to return in order, in
the case of *
queries.
When applying set operations to *
queries, keep in mind that the widest column set (across the entire history of
the table) will be returned.
If the queries return a different number of columns, set operations will result in an error:
-> SELECT * FROM products UNION SELECT * FROM new_products;
400 Bad Request
Number of UNION left columns: 3 does not match right columns: 2
WITH (Common Table Expressions)
The WITH
keyword is used to create Common Table Expressions, or CTEs.
CTEs act like temporary tables or views within the context of a query or
DML statement.
CTEs are used in place of a sub-select to simplify the appearance of a query.
WITH
clauses take the form WITH <cte-name> AS (<cte-select>)
.
WITH top_margin_products AS (SELECT product_no FROM products WHERE (price - cost) > 5.00)
SELECT name FROM products
WHERE product_no IN (SELECT product_no FROM top_margin_products);
WITH RECURSIVE
The RECURSIVE
keyword can be added to WITH
to create recursive CTEs
which can refer to themselves.
Recursive CTEs will always have the form <initial-select> UNION <recursive-select>
or <initial-select> UNION ALL <recursive-select>
.
Here is a naive example, demonstrating the recursive construction of a Fibonacci Sequence:
WITH RECURSIVE fib(previous, current) AS (
VALUES (0, 1)
UNION ALL
SELECT fib.current, fib.previous + fib.current
FROM fib
WHERE fib.previous + fib.current < 5000
)
SELECT * FROM fib;
The most beneficial uses for WITH RECURSIVE
are walking hierarchical and graph-shaped data sets
-- capabilities ordinary SQL lacks.
However, Endb recursive queries are also capable of solving Sudoku puzzles and constructing fractals,
as seen in the test suite.
(Credit goes to SQLite's delightful
Outlandish Recursive Query Examples.)
LATERAL
Lateral queries permit access to named values from the FROM
clause in the SELECT
clause.
The LATERAL
keyword is optional in Endb as all queries are lateral queries by default.
SELECT halved FROM products, LATERAL (SELECT price / 2.0 AS halved) h;
SELECT halved FROM products, (SELECT price / 2.0 AS halved) h;
Repeatable Reads: SAVEPOINT, ROLLBACK, RELEASE
Repeatable reads are achieved in Endb by creating savepoints, which queries can return to later. When returning to a savepoint in the future, queries will return results according to the state of the database when the savepoint was captured.
By default, savepoints have a 60-second lifespan, after which they expire.
Use of a savepoint during its lifespan will renew its lease, allowing it to be used for another
60 seconds.
Savepoints can be forcibly expired with RELEASE
.
Attempting to use a savepoint that has been expired or released will return
No active savepoint: <savepoint>
.
NOTE: Repeatable reads are only available to the HTTP API.
Minimal Example
Other SQL dialects have more transaction-specific semantics for SAVEPOINT
, ROLLBACK
, and
RELEASE
.
Because this difference in behaviour may be confusing to users familiar with other databases,
we provide the example below.
The result of the final query returns the first two dungeons, which were inserted prior to
the savepoint, but not "Tower of Hera"
.
INSERT INTO dungeons {name: 'Eastern Palace'};
INSERT INTO dungeons {name: 'Desert Palace'};
SAVEPOINT desert_palace;
INSERT INTO dungeons {name: 'Tower of Hera'};
ROLLBACK TO desert_palace; SELECT * FROM dungeons;
NOTE: The ROLLBACK
and SELECT
above must be executed together,
in the same stateless "transaction".
(Normally this will mean executing both statements in a single HTTP request.)
SAVEPOINT
The SAVEPOINT
operator captures a new savepoint at the time it executes.
Savepoints can be named or anonymous.
Anonymous savepoints are named with a UUID.
The name of the savepoint is returned as an attribute named result
.
SAVEPOINT desert_palace;
-- [{'result': 'desert_palace'}]
SAVEPOINT;
-- [{'result': '0b12de43-1c92-4d92-ab7c-51c5a5129074'}]
ROLLBACK
Inside the scope of a stateless "transaction" (normally a single HTTP request),
ROLLBACK
is used to return to a savepoint.
ROLLBACK TO <savepoint>
returns to a named savepoint (by name) or an anonymous savepoint
(by string UUID).
Queries executed inside such a transaction return results according to the state of the
database when the savepoint was created.
The use of ROLLBACK
renews the lease of the associated savepoint.
ROLLBACK TO desert_palace; SELECT * FROM dungeons;
ROLLBACK TO 'eab07765-de6f-4f74-8052-838dd29ee8e7'; SELECT * FROM dungeons;
RELEASE
The RELEASE
keyword expires a savepoint so it can no longer be used.
Anonymous savepoints can be released by string UUID.
RELEASE desert_palace;
RELEASE 'f7c314dd-47b9-4c85-9502-b8e35c82b935';
Transactions
Standard SQL keywords are used to begin, commit, and rollback explicit transactions.
NOTE: Explicit transactions are only available to the WebSocket API.
Minimal Example
This minimal example assumes a connection via the WebSocket API.
BEGIN;
INSERT INTO dungeons {name: 'Dark Palace'};
INSERT INTO dungeons {name: 'Swamp Palace'};
INSERT INTO dungeons {name: 'Skull Woods'};
ROLLBACK;
SELECT * FROM dungeons;
BEGIN [TRANSACTION]
The BEGIN
keyword starts a new transaction.
BEGIN TRANSACTION;
COMMIT [TRANSACTION]
The COMMIT
keyword commits the open transaction to disk.
END
is a synonym for COMMIT
.
COMMIT TRANSACTION;
ROLLBACK [TRANSACTION]
The ROLLBACK
keyword throws away any uncommitted changes
since the transaction began.
ROLLBACK TRANSACTION;
NOTE: The ROLLBACK
keyword should not be confused with the
ROLLBACK
keyword for savepoints.
Both can be called without mentioning a savepoint or transaction,
but their behaviour differs based on whether the client is in
an explicit transaction or not.
SQL Data Types
NULL
Null serves a variety of purposes in Endb.
- Explicit: You can provide an explicit
NULL
value - "Unknown": As with any SQL, you will receive a null when 3-Valued Logic cannot determine if a statement is true or false
- "Missing": Jagged rows will return
NULL
for columns projected for a document which does not contain them
TEXT (CHAR, VARCHAR)
Endb accepts unbounded, variable-length strings with either single or double quotes.
CHAR
and VARCHAR
are synonyms for TEXT
.
INSERT INTO users (name, title) VALUES ('River', "Prodigy");
When casting using the CHAR
synonym, an optional integer argument indicates the
desired length of the resulting string, padded with spaces:
SELECT CAST(123 AS CHAR(10));
-- [{'column1': '123 '}]
BOOLEAN
Boolean values can be TRUE
, FALSE
, or NULL
.
INTEGER (BIGINT)
64-bit integer capable of auto-promotion to 128-bit integer.
REAL (DOUBLE)
A 64-bit IEEE 754 floating point number, or double.
TIMESTAMP
Timestamps can be represented as either SQL timestamps, according to the SQL Specification, or ISO timestamps. The following are legal timestamp literals:
2007-01-01T00:00:00
2007-01-01T00:00:00.123Z
2007-01-01T00:00:00.000000Z
TIMESTAMP '2007-01-01 00:00:00'
TIMESTAMP '2007-01-01 00:00:00.000000Z'
DATE
Dates can be represented as either SQL dates, according to the SQL Specification, or ISO dates. The following are legal date literals:
2007-01-01
DATE '2007-01-01'
TIME
Times can be represented as either SQL times, according to the SQL Specification, or ISO times. The following are legal time literals:
23:59:12
23:59:12.12345
TIME '23:59:12'
TIME '23:59:12.12345'
PERIOD
A Period is a window of time between two date/time types. Periods are used with Period Predicates.
{start: 2001-01-01, end: 2001-04-01}
[2001-04-01T00:00:00Z, 2001-05-01]
It is also possible to create a Period with the PERIOD
constructor function.
INTERVAL (DURATION)
An interval (or duration) is created whenever two times are subtracted.
SELECT 2001-01-02 - 2001-01-01;
Interval literals can be constructed with ISO 8601 syntax:
PT12H30M5S
P1Y2M10DT2H30M
This is equivalent to the same ISO 8601 syntax
provided as a string to the DURATION
constructor:
DURATION('PT12H30M5S')
DURATION('P1Y2M10DT2H30M')
Interval literals can also be constructed with the classic SQL intervals DSL:
INTERVAL '1-2' YEAR TO MONTH
INTERVAL '0 12:34:56.789' DAY TO SECOND
BLOB (VARBINARY)
Binary Large Objects can be encoded as hexidecimal literals or cast from strings.
x'DEADBEEF'
CAST("hello" AS BLOB)
ARRAY
Endb SQL uses zero-based arrays. Arrays can be created with array literals similar to JSON.
["one", "two", "three"]
Alternatively, arrays can also be created using a literal syntax similar to that in the SQL Specification or a constructor function.
ARRAY ["one", "two", "three"]
ARRAY("one", "two", "three")
Array literals can contain the spread operator
SELECT [1, 2, ...[3, 4], 5];
-- [{'column1': [1, 2, 3, 4, 5]}]
Array equality is tested lexicographically.
OBJECT
Objects (which can also be thought of as documents, or rows) can be created with object literals enclosed in curly braces, similar to JSON. Keys in object literals can be quoted or unquoted.
{name: "Hanna", birthday: 1982-12-31}
{'name': "Hanna", 'birthday': 1982-12-31}
Alternatively, objects can be created using either an OBJECT
constructor keyword, similar to that in the SQL Specification.
OBJECT(name: 'Hanna', birthday: 1982-12-31)
Object literals can contain spreads, computed fields, shorthands, and row literals.
SELECT { a: 1, ...[2, 3] };
-- [{'column1': {'0': 2, '1': 3, 'a': 1}}]
SELECT { foo: 2, ['foo' || 2]: 5 };
-- [{'column1': {'foo': 2, 'foo2': 5}}]
SELECT {p.name, c.discounted} FROM products p JOIN coupons c ON p.name = c.name;
-- [{'column1': {'discounted': 2.99, 'name': 'Salt'}}]
SELECT {product: {p.*}, discounted: c.discounted} FROM products p JOIN coupons c ON p.name = c.name;
-- [{'column1': {'discounted': 2.99, 'product': {'name': 'Salt', 'price': 5.99}}}]
Object equality is tested by comparing each key-value pair as an array.
Dynamic Literals
Row Literals
It is possible return an entire document (row) as a single literal value.
The syntax is akin to Postgres
ROW
literals.
Unlike table.*
, which pads non-existent columns with NULL
,
a row literal returns exactly the schema specified for each individual row.
{ table.* }
Example usage:
SELECT { products.* } FROM products;
As a shorthand, a table's name may be used in the SELECT
clause to return
entire rows as documents:
-> SELECT users FROM users;
-- [{'users': {'email': 'patrick@oracle.com', 'name': 'Patrick'}},
-- {'users': {'email': 'preethi@shopify.ca', 'name': 'Preethi'}}]
NOTE: When a table contains a column of the same name, the column takes precedence and a single column is returned, as usual:
-> SELECT status FROM status;
-- [{'status': 'error'}, {'status': 'ok'}]
Spread
The Spread Operator (...
, sometimes known as "splat")
can be used to directly flatten/unnest one collection
(an array, object, or row literal) into another.
Strings are treated as character collections.
SELECT [1, 2, ...[3, 4], 5];
-- [{'column1': [1, 2, 3, 4, 5]}]
SELECT [1, 2, ..."foo", 5];
-- [{'column1': [1, 2, 'f', 'o', 'o', 5]}]
If an array is spread into an object, its ordinals will be used as properties:
SELECT { a: 1, ...{b: 2} };
-- [{'column1': {'a': 1, 'b': 2}}]
SELECT { a: 1, ...[2, 3] };
-- [{'column1': {'0': 2, '1': 3, 'a': 1}}]
Computed Fields
In the key/property position, square brackets are used to construct computed fields in object literals. Computed fields are implicitly cast to string.
SELECT { foo: 2, [2 + 2]: 5 };
-- [{'column1': {'4': 5, 'foo': 2}}]
SELECT { foo: 2, ['foo' || 2]: 5 };
-- [{'column1': {'foo': 2, 'foo2': 5}}]
Shorthands
Column names can be referred to in place of key-value pairs in object literals.
SELECT {p.name, c.discounted} FROM products p JOIN coupons c ON p.name = c.name;
-- [{'column1': {'discounted': 2.99, 'name': 'Salt'}}]
Note on timezones
Endb date/time data types currently only support times encoded as UTC.
Note on type widening
Operations performed on scalars will attempt to widen those scalars for the purpose of the operation, if reasonable.
-> select 2.0 = 2;
[{'column1': True}]
This widening includes joins on scalars, but not collections (ARRAY
or OBJECT
):
-> INSERT INTO zig {at: 2023-12-21, val: 2}
[{'result': 1}]
-> INSERT INTO zag {at: 2023-12-21T00:00:00, val: 2.0}
[{'result': 1}]
-> SELECT * FROM zig i JOIN zag a ON i.at = a.at;
[{'at': datetime.datetime(2023, 12, 21, 0, 0, tzinfo=datetime.timezone.utc),
'val': 2.0}]
-> SELECT * FROM zig i JOIN zag a ON i.val = a.val;
[{'at': datetime.datetime(2023, 12, 21, 0, 0, tzinfo=datetime.timezone.utc),
'val': 2.0}]
Operators
Comparison
Two values can be compared using standard SQL comparison operators:
=
,==
(equals)>
(greater than)<
(less than)>=
(greater than or equal to)<=
(less than or equal to)<>
,!=
(not equal to)
SELECT * FROM products WHERE NOT name = 'Coffee';
SELECT * FROM products WHERE name = 'Coffee' AND name <> 'Kaapi';
SELECT * FROM products WHERE name > 'Cake' AND price >= 5.00;
BETWEEN
BETWEEN
returns TRUE
when a value is greater-than-or-equal-to the first limit
and less-than-or-equal-to the second.
It has the form BETWEEN x AND y
.
It can be negated with the form NOT BETWEEN x AND y
.
SELECT * FROM products WHERE price BETWEEN 2.00 AND 4.00;
SELECT * FROM products WHERE price NOT BETWEEN 2.00 AND 4.00;
NOTE: BETWEEN
can also be used with System Time.
Boolean Operators
WHERE
and HAVING
clauses can be modified and combined with standard SQL boolean operators.
IS, IS NOT
IS
and IS NOT
behave like =
(==
) and <>
(!=
), respectively.
They are usually used to augment equality checks to test for NULL
,
which is the third boolean value, representing "unknown".
The literal UNKNOWN
is permitted in IS
/ IS NOT
expressions in place of NULL
.
- When both sides of
IS
evaluate toNULL
it returnsTRUE
. - When only one side of
IS NOT
evaluates toNULL
it returnsTRUE
, - When only one side of
IS
evaluates toNUll
it returnsFALSE
. - When both sides of
IS NOT
evaluates toNULL
it returnsFALSE
.
SELECT * FROM products WHERE product_no IS NULL;
SELECT * FROM products WHERE product_no IS UNKNOWN;
SELECT * FROM products WHERE product_no IS NOT NULL;
SELECT * FROM products WHERE product_no IS 386;
SELECT * FROM products WHERE product_no IS NOT 444;
NOTE: A WHERE
clause of the form <column> IS NULL
will not return
rows for which <column>
does not exist, as positive equality is only
tested against extant columns.
For example, the query SELECT * FROM products WHERE name IS NULL;
will
not return rows for which the column name
does not exist.
However, SELECT * FROM products WHERE name IS NOT NULL;
will not return
either rows where the name
column has a value of NULL
or the name
column is missing.
Thus, IS
and IS NOT
are not symmetrical for jagged data.
IS [NOT] DISTINCT FROM
IS DISTINCT FROM
is a synonym for IS NOT
.
IS NOT DISTINCT FROM
is a synonym for IS
.
NOTE: The IS [NOT] DISTINCT FROM
form is provided for SQL specification
compatibility and is not recommended, as it tends to be verbose and confusing.
NOT, AND, OR
NOT
can be prefixed to any clause to negate it:
SELECT * FROM products WHERE NOT (name = 'Coffee');
AND
returns true if two clauses both return true:
SELECT * FROM products WHERE name = 'Coffee' AND price > 2.99;
OR
returns true if either of two clauses return true:
SELECT * FROM products WHERE name = 'Coffee' OR name = 'Kaapi';
Math
Standard SQL mathemetical operators are available to any two numeric values:
+
(addition)-
(subtraction)*
(multiplication)/
(division)%
(modulo; integer remainder of division)<<
(left bit shift)>>
(right bit shift)+NUMBER
(unary plus)-NUMBER
(unary minus)
SELECT 1 + 3.555;
SELECT 1 - 3.555;
SELECT 2 * 3.555;
SELECT 2 / 3.555;
SELECT 2 % 3.555;
SELECT 62 << 2;
SELECT 62 >> 2;
SELECT +128.5;
SELECT -128.5;
NOTE: Mathematical functions are documented under Functions.
Bitwise Operators
Standard SQL bitwise manipulation operators are available to any two values.
&
(bitwise and)|
(bitwise or)
The bitwise not operator is also available to a single value:
~
(bitwise not)
SELECT 1 & 2;
SELECT 1 | 2;
SELECT ~1;
LIKE
LIKE
is the operator equivalent of the LIKE
function.
LIKE
returns TRUE
if a string matches the supplied LIKE pattern, as defined below:
A pattern can be a string literal.
It can also contain underscores (_
) and/or percentage symbols (%
).
An underscore matches exactly one character.
A percentage symbol matches zero or more characters.
Backslash escapes the following character to make it a literal.
Use ESCAPE
to override the default backslash escape character.
SELECT * FROM products WHERE name LIKE 'Tofu';
SELECT * FROM products WHERE name LIKE 'Tof_';
SELECT * FROM products WHERE name LIKE '%of%';
SELECT * FROM products WHERE name LIKE '\%of\%';
SELECT * FROM products WHERE name LIKE 'X%ofX%' ESCAPE 'X';
NOT LIKE
is used to invert the results of the match.
SELECT * FROM products WHERE name NOT LIKE '%of%';
NOTE: Endb LIKE
is case-sensitive.
REGEXP
REGEXP
returns TRUE
if a string matches the supplied regular expression.
REGEXP
may be prefixed with NOT
.
SELECT * FROM products WHERE name REGEXP '.*ee|.*ea';
SELECT * FROM products WHERE name NOT REGEXP '.*[fst]+.*';
GLOB
GLOB
returns TRUE
if a string matches the supplied UNIX glob.
GLOB
may be prefixed with NOT
.
SELECT * FROM products WHERE name GLOB '*of*';
SELECT * FROM avatars WHERE filename NOT GLOB '/opt/local/avatars/*/*.png';
NOTE: GLOB
is case-sensitive.
It conforms to standard UNIX globs and thus does not support "globstar"
(recursive directory) expansion like **/*.png
.
MATCH (Containment)
MATCH
returns TRUE
if the value on the left contains the value on the right,
at the top level.
Note that a top-level array to the right of the MATCH
refers to a set of values
that all need to match, not a literal array.
The following expressions return TRUE
:
SELECT 'foo' MATCH 'foo';
SELECT [1, 2, 3] MATCH [3, 1];
SELECT {user: 'foo', age: 42} MATCH {age: 42};
SELECT {a: [1, 2, {c: 3, x: 4}], c: 'b'} MATCH {a: [{x: 4}, 1]};
The following expressions return FALSE
:
SELECT [1, 2, [1, 3]] MATCH [1, 3];
SELECT {foo: {bar: 'baz'}} MATCH {bar: 'baz'};
SELECT {a: [1, 2, {c: 3, x: 4}], c: 'b'} MATCH {a: [{x: 4}, 3]};
NOTE: The @>
operator is a synonym for MATCH
.
It is provided as a convenience for users accustomed to the equivalent
JSON Containment Operator in Postgres.
It also has a symmetric operator, <@
, which returns TRUE
if the value on the right
contains the value on the left, at the top level.
No symmetric keyword exists for MATCH
.
ANY, SOME
SOME
is a synonym for ANY
.
ANY
qualifies a subquery by comparing a single column or literal value with the result of that subquery.
ANY
is used in the form <expression> <operator> ANY (<subquery>)
.
It returns true if the subquery returns a one or more values for which the operator is true.
The operator must return a boolean and the subquery must return a single column.
SELECT 1500 < SOME (SELECT price FROM products);
ALL
ALL
qualifies a subquery by comparing a single column or literal value with the result of that subquery.
ALL
is used in the form <expression> <operator> ALL (<subquery>)
.
It returns true only if all values returned by the subquery are true for the operator provided.
The operator must return a boolean and the subquery must return a single column.
SELECT "ok" = ALL (SELECT status_code FROM statuses);
EXISTS
EXISTS
returns TRUE
if the subquery which follows it returns at least one row.
SELECT name FROM products WHERE EXISTS (SELECT 1 FROM coupons WHERE name = products.name);
IN
The standard SQL IN
clause can be used to test lists and subqueries for containment of a value.
SELECT * FROM products WHERE price IN (5.00, 5.99);
SELECT * FROM products WHERE price IN (SELECT price FROM coupons);
NOTE: Use MATCH
to test for containment of a value in an array.
NOT IN
The standard SQL NOT IN
clause can be used to test lists and subqueries for absence of a value.
SELECT * FROM products WHERE price NOT IN (5.00, 5.99);
SELECT * FROM products WHERE price NOT IN (SELECT price FROM coupons);
NOTE: Use MATCH
to test for absence of a value in an array.
||
(Concatenation)
The ||
operator concatenates two strings or arrays supplied as arguments.
When concatenating to an array element: other elements, arrays, and blobs are accepted as the second argument.
When concatenating to an array: arrays, blobs, and array elements are accepted as the second argument.
Elements other than strings are cast to strings when concatenated with each other.
Multiple operators can be chained together.
SELECT "Hello" || "World";
SELECT [1, 2, 3] || [4, 5, 6];
SELECT 1 || 2;
SELECT "Hello" || ["World"];
SELECT ["Hello"] || "World";
SELECT "Hello" || "World" || "And" || "Friends";
The Concatenation Operator is equivalent to the CONCAT
function.
Vector Operators
Indexing
At the moment, Endb performs exact nearest neighbor search, which provides perfect recall. Endb may support approximate vector indexes in the future, which trade accuracy for speed.
<->
(L2 or Euclidean Distance)
The L2 Distance operator (<->
) compares two vectors by
Euclidean distance.
It is symmetrical to the L2_DISTANCE
function.
SELECT * FROM (VALUES ([0,0,0]), ([1,2,3]), ([1,1,1]), (NULL), ([1,2,4])) AS t(val) WHERE val NOT NULL ORDER BY t.val <-> [3,3,3];
-- [{'val': [1, 2, 3]},
-- {'val': [1, 2, 4]},
-- {'val': [1, 1, 1]},
-- {'val': [0, 0, 0]}]
<=>
(Cosine Distance)
The Cosine Distance operator (<=>
) compares two vectors by the complement of their
Cosine Similarity.
It is symmetrical to the COSINE_DISTANCE
function.
SELECT val FROM (VALUES ([0,0,0]), ([1,2,3]), ([1,1,1]), ([1,2,4])) AS t(val) WHERE t.val <=> [3,3,3] NOT NULL ORDER BY t.val <=> [3,3,3];
-- [{'val': [1, 1, 1]},
-- {'val': [1, 2, 3]},
-- {'val': [1, 2, 4]}]
<#>
(Inverse Inner Product)
The Inverse Inner Product operator (<#>
) compares two vectors by the inverse
of their Inner Product.
It is the inverse of the INNER_PRODUCT
function.
SELECT val FROM (VALUES ([0,0,0]), ([1,2,3]), ([1,1,1]), (NULL), ([1,2,4])) AS t(val) WHERE val IS NOT NULL ORDER BY t.val <#> [3,3,3];
-- [{'val': [1, 2, 4]},
-- {'val': [1, 2, 3]},
-- {'val': [1, 1, 1]},
-- {'val': [0, 0, 0]}]
Functions
String Functions
CHARACTER_LENGTH
The CHARACTER_LENGTH
function returns the number of unicode characters in a string.
SELECT CHARACTER_LENGTH('josé');
-- 4
SELECT CHARACTER_LENGTH('❤️🥫');
-- 3
OCTET_LENGTH
The OCTET_LENGTH
function returns the length of a string, in bytes (octets).
SELECT OCTET_LENGTH('josé');
-- 5
SELECT OCTET_LENGTH('❤️🥫');
-- 10
TRIM, LTRIM, RTRIM
The TRIM
, LTRIM
, and RTRIM
functions trim surrounding whitespace,
whitespace to the left, and whitespace to the right of a string, respectively.
SELECT TRIM(' hello ');
-- 'hello'
SELECT LTRIM(' hello ');
-- 'hello '
SELECT RTRIM(' hello ');
-- ' hello'
LOWER, UPPER
The LOWER
and UPPER
functions downcase and upcase a string, respectively.
SELECT LOWER('Relatable Algebra');
-- 'relatable algebra'
SELECT UPPER('Shouting Calculus');
-- 'SHOUTING CALCULUS'
REPLACE
The REPLACE
function returns the string in the first parameter,
with the second parameter (if found) replaced by the third.
SELECT REPLACE('Relatable Algebra', 'Rela', 'Infla');
INSTR
The INSTR
function returns the first character of a substring match on the second parameter,
if found, and 0
if it is not found.
SELECT INSTR('Coffee', 'ee');
SUBSTRING
The SUBSTRING
function returns the substring starting from the index provided as the second parameter.
If the (optional) third parameter is provided, the substring will be of that length (or less, if the end of the source string is reached).
SUBSTR
is a synonym for SUBSTRING
.
SELECT SUBSTRING('Hello Edgar', 4);
SELECT SUBSTR('Hello Edgar', 4, 2);
POSITION
The POSITION
pseudo-function returns the first position of the first character of the first matched substring
in another string.
If the substring is not detected, POSITION
returns 0
.
SELECT POSITION("h" IN "Hawaii");
SELECT POSITION("i" IN "Hawaii");
SELECT POSITION("Ha" IN "Hawaii");
NOTE: POSITION
is a "pseudo-function" because internally it uses custom syntax of the form
<substring> IN <string>
.
UNICODE
The UNICODE
function returns an integer unicode value for the first character
of the parameter given.
SELECT UNICODE('Adam');
CHAR
The CHAR
function returns a string corresponding to the supplied integer
character codes.
SELECT CHAR(65, 66, 67);
CONCAT
CONCAT
is equivalent to the Concatenation Operator (||)
except that CONCAT
is limited to 2-arity applications and ||
can be chained.
LIKE
The LIKE
function serves the same purpose as the
LIKE
operator.
However, the argument order is (effectively) reversed for the LIKE
function,
to match the signature used in SQLite.
For the function version, the pattern is the first argument.
Optionally, an alternative escape character can be provided as a third argument.
SELECT * FROM users WHERE LIKE('Stev%', name);
SELECT * FROM users WHERE LIKE('EdgarX%', name, 'X');
Collection Functions
LENGTH
The LENGTH
function counts the number of entries in a collection.
When supplied with a string, it is a synonym for CHARACTER_LENGTH
.
SELECT LENGTH([3, 2]);
SELECT LENGTH({name: 'Peas', price: 8.99, product_no: 77});
SELECT LENGTH('josé');
NOTE: CARDINALITY
is an synonym for LENGTH
.
OBJECT_KEYS
An object's keys can be selected using OBJECT_KEYS
.
SELECT OBJECT_KEYS({original_price: 1.99, sale_price: 1.50, coupon_price: 1.40});
OBJECT_VALUES
An object's values can be selected using OBJECT_VALUES
.
SELECT OBJECT_VALUES({original_price: 1.99, sale_price: 1.50, coupon_price: 1.40});
OBJECT_ENTRIES
Returns an array of key-value pairs representing the given object.
SELECT OBJECT_ENTRIES({a: 1, b: 2, c: 3});
-- [['a': 1], ['b': 2], ['c': 3]]
OBJECT_FROM_ENTRIES
Constructs an object from an array of key-value pairs.
SELECT OBJECT_FROM_ENTRIES([['a', 1], ['b', 2], ['c', 3]]);
-- {a: 1, b: 2, c: 3}
PATCH
The PATCH
function takes two documents.
The document returned is the first document "patched" with any fields found in the second document.
If the second document does not specify a field, that field is left untouched.
If the second document specifies any fields with values of NULL
, those fields are removed.
SELECT PATCH(
{name: 'Salt', nutrition: {sodium: 100, ingredients: 'Kosher Salt'}},
{name: 'Sea Salt', nutrition: {ingredients: NULL}}
);
The PATCH
function has an equivalent operator for data manipulation:
UPDATE PATCH
Table Value Functions
Table Value Functions are only valid within the FROM
clause.
UNNEST
The UNNEST
function can be thought of as the inverse of
ARRAY_AGG
,
although it offers more power than just unlinking elements.
It takes an array or object and pulls its elements into separate rows.
SELECT * FROM UNNEST([1.99, 2.99, 3.99]) AS products(price);
It is possible to unnest multiple arrays.
If the arrays do not have the same number of elements, the shorter array(s) will
have those values filled with NULL
:
SELECT names.* FROM (VALUES (['Leslie', 'Edgar', 'fiver2'], ['Lamport', 'Codd'])) AS x(first, last), UNNEST(x.first, x.last) AS names(first, last);
When unnesting an object, keys-value pairs will be returned as per object_entries. This behaviour is useful for manipulating collections:
SELECT * FROM UNNEST({original_price: 1.99, sale_price: 1.50, coupon_price: 1.40}) AS prices(price);
-- [{'price': ['sale_price', 1.5]},
-- {'price': ['coupon_price', 1.4]},
-- {'price': ['original_price', 1.99]}]
Unnesting nested data from a queried table is done with the form
FROM <table>, UNNEST(<table>.<column>) AS foo(new_column)
.
For example:
INSERT INTO msgs
{text: "Here is some classic material",
user: "George",
workday: 2024-02-25,
media: [{type: "image", src: "dsm.png"},
{type: "video", src: "vldb.mp4"}]};
WITH m AS (SELECT * FROM msgs, UNNEST(msgs.media) AS m(media))
SELECT media FROM m WHERE media..type MATCH 'video';
WITH ORDINALITY
UNNEST
can be suffixed with WITH ORDINALITY
to append an ordinal column to the results.
SELECT * FROM UNNEST([1.99, 2.99, 3.99]) WITH ORDINALITY AS products(price, n);
-- [{'n': 0, 'price': 1.99}, {'n': 1, 'price': 2.99}, {'n': 2, 'price': 3.99}]
NOTE: Endb ordinals are zero-indexed.
GENERATE_SERIES
The GENERATE_SERIES
function generates an array of numbers within a given interval.
The first and second parameters are the start and end of the interval.
The optional third parameter is a step value by which to increment each number.
The result is returned as a single anonymous column (with the default name, column1
)
containing the array.
SELECT * FROM GENERATE_SERIES(0, 21) AS t(s);
SELECT * FROM GENERATE_SERIES(0, 21, 3) AS t(s);
It is possible to use the result of GENERATE_SERIES
in other SQL expressions, like IN
:
SELECT * FROM products WHERE product_no IN (SELECT column1 FROM generate_series(1000, 20000) AS foo);
Numeric Functions
RANDOM
The RANDOM
function returns a random integer.
SELECT RANDOM();
Math
Endb provides standard SQL math functions based on SQLite's collection of math functions:
- ROUND
- SIN
- COS
- TAN
- SINH
- COSH
- TANH
- ASIN
- ACOS
- ATAN
- ASINH
- ACOSH
- ATANH
- ATAN2
- FLOOR
- CEILING, CEIL
- SIGN
- SQRT
- EXP
- POWER, POW
- LOG, LOG10
- LOG2
- LN
- DEGREES
- RADIANS
- PI
- ABS
NOTE: Endb follows the choice of most SQL databases and aliases LOG
to LOG10
rather than LN
(natural log), as specified by the SQL standard.
NOTE: Mathematical operators are documented under Operators.
Date/Time Functions
STRFTIME
The STRFTIME
function formats a date or time value as a string.
SELECT strftime('%Y/%m/%d', date('2001-01-01'));
SELECT strftime('%Y %m %d at %H %M %S', datetime('2001-01-01 03:04:05'));
UNIXEPOCH
The UNIXEPOCH
function returns the number of seconds since the UNIX epoch.
Accepts a DATE
, TIMESTAMP
, or STRING
.
SELECT UNIXEPOCH('2023-01-01');
SELECT UNIXEPOCH(1970-01-01T00:00:00Z);
JULIANDAY
The JULIANDAY
function returns the Julian Day, which is the number of days since
noon in UTC on November 24, 4714 B.C.
Accepts a DATE
, TIMESTAMP
, or STRING
.
SELECT JULIANDAY(1970-01-01);
EXTRACT
The EXTRACT
pseudo-function provides a way to access one named, numerical portion of a
date, time, or timestamp.
Portions of dates can only be extracted from dates or timestamps.
Portions of times can only be extracted from timestamps or times.
SELECT EXTRACT(YEAR FROM CURRENT_DATE);
SELECT EXTRACT(MONTH FROM CURRENT_DATE);
SELECT EXTRACT(DAY FROM CURRENT_TIMESTAMP);
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP);
SELECT EXTRACT(MINUTE FROM CURRENT_TIME);
SELECT EXTRACT(SECOND FROM CURRENT_TIME);
NOTE: EXTRACT
is a "pseudo-function" because internally it uses custom syntax of the form
<named-portion> FROM <date>
.
PERIOD
The PERIOD
function creates a new Period.
It is equivalent to using Period literals.
PERIOD(2001-04-01T00:00:00Z, 2001-05-01)
Aggregate Functions
MIN, MAX
The MIN
and MAX
functions return the minimum and maximum values for an expression,
respectively.
SELECT MIN(price) FROM products;
SELECT MAX(price) FROM products;
NOTE: MIN
and MAX
also have non-aggregate equivalents, which are 2-arity.
When used that way, they each return the minimum or maximum value of the two values provided.
SUM
The SUM
function returns the sum of all non-null values under the column given as a parameter.
SELECT SUM(price) FROM products;
If all values for the given column are NULL
, SUM
returns NULL
.
TOTAL
The TOTAL
function is equivalent to SUM
except that it returns 0.0
in the case where
all input values are NULL
.
AVG
The AVG
function takes a numerical-type-agnostic average of all values under the
column given as a parameter.
SELECT AVG(price) FROM products;
COUNT
The COUNT
function returns the count of non-null, non-empty values for the specified column.
SELECT COUNT(price) FROM sales;
NOTE: Because null/empty values are ignored, the behaviour of COUNT
will differ from other
SQL dialects. Whether or not COUNT(price)
and COUNT(1)
are equivalent is dependent on
whether the price
attribute exists with a non-null value on each document.
ARRAY_AGG
The ARRAY_AGG
function concatenates the results of an expression into an array.
The parameter may be ordered
within ARRAY_AGG
.
SELECT ARRAY_AGG(price) FROM products;
SELECT ARRAY_AGG(name ORDER BY price DESC) FROM products;
Note that when operating on arrays, the arrays themselves will be concatenated, not the contents of the arrays. The result will be an array of one higher dimension:
SELECT ARRAY_AGG(x.column1) FROM (VALUES ([1,2]), ([3,4])) AS x;
-- [{'column1': [[1, 2], [3, 4]]}]
GROUP_CONCAT
The GROUP_CONCAT
function returns a string with concatenated
non-null values from a column or group.
Given a second parameter,
It defaults to a comma-delimited list, but the second (optional) parameter
can override the delimiter.
SELECT GROUP_CONCAT(name) FROM products;
SELECT GROUP_CONCAT(name, ':') FROM products;
FILTER
All aggregate functions can have a filter applied before aggregation.
SELECT SUM(price) FILTER(WHERE price > 20) FROM products;
Data Type Functions
CAST
The CAST
function forces a value into a particular data type.
Note that not all types are cast-compatible with each other.
SELECT CAST(price AS INTEGER) FROM products;
TYPEOF
The TYPEOF
function returns the type of the provided value.
SELECT TYPEOF('hi2u');
SELECT TYPEOF(1.12345678901234);
SELECT TYPEOF(2018-01-01T00:00:00);
Conditional Functions
IIF
The IIF
function is a conditional shorthand.
It returns the second parameter if the condition is true and the third parameter if the condition is false.
SELECT IIF(price > 5.99, 'Expensive!', 'Cheap') FROM products;
NULLIF
The NULLIF
function returns TRUE
if the two supplied expressions are equal.
SELECT NULLIF(1, 1);
SELECT NULLIF(1, 'zig');
COALESCE
The COALESCE
function returns its first non-null argument.
The following example returns 'zig'
:
SELECT COALESCE(NULL, NULL, 'zig', 'zag');
Encoding Functions
BASE64
The BASE64
function takes a hexadecimal-encoded BLOB and returns a base64-encoded string, or vice-versa.
BASE64
roundtrips its own data.
There is therefore no BLOBFROMBASE64
function.
SELECT BASE64(x'010203');
SELECT BASE64('AQID');
UUID
The UUID
function returns a universally-unique identifier, as a string.
The UUID_BLOB
function takes a string UUID and returns a BLOB.
The UUID_STR
function takes a BLOB UUID and returns a string.
When given a parameter of their return type, UUID_BLOB
and UUID_STR
will format the UUID provided.
SELECT UUID();
SELECT UUID_BLOB('d2ce21c9-d268-409a-b1e0-49e1200bfa47');
SELECT UUID_STR(x'd2ce21c9d268409ab1e049e1200bfa47');
-- formatting:
SELECT UUID_BLOB(x'd2ce21c9d268409ab1e049e1200bfa47');
SELECT UUID_STR('d2ce21c9d268409ab1e049e1200bfa47');
SHA1
The SHA1
function takes either a hexadecimal-encoded BLOB, a string, or a number.
It returns the SHA-1 encoding of that value.
SELECT SHA1('2');
RANDOMBLOB, ZEROBLOB
The RANDOMBLOB
function returns a random binary large object of the size given, in bytes.
The ZEROBLOB
function returns a zeroed-out binary large object of the size given, in bytes.
SELECT RANDOMBLOB(32);
SELECT ZEROBLOB(32);
HEX, UNHEX
The HEX
function takes a BLOB (or coerces its argument into a UTF-8 string,
which in turn is interpreted as a BLOB)
and turns the BLOB into an upper-case hexadecimal string.
The UNHEX
function takes a hexadecimal string and turns it into a BLOB.
The hexadecimal string provided must contain character pairs.
UNHEX
takes an optional second parameter: a string containing non-hexadecimal
characters to be ignored in the first parameter.
If non-hexadecimal characters are found in the first parameter but not ignored
in the second parameter, UNHEX
returns NULL
.
SELECT HEX(15);
-- '3135'
SELECT UNHEX('3135');
-- b'15'
SELECT UNHEX('3135ZZ', 'Z');
-- b'15'
SELECT UNHEX('3135ZZ', 'M');
-- NULL
Vector Functions
Indexing
At the moment, Endb performs exact nearest neighbor search, which provides perfect recall. Endb may support approximate vector indexes in the future, which trade accuracy for speed.
L2_DISTANCE
The L2_DISTANCE
function returns the
Euclidean distance
between two vectors.
It is symmetrical to the <->
operator.
SELECT L2_DISTANCE([-0.7888,-0.7361,-0.6208,-0.5134,-0.4044], [0.8108,0.6671,0.5565,0.5449,0.4466]);
-- [{'column1': 2.7853052938591847}]
COSINE_DISTANCE
The COSINE_DISTANCE
function returns the complement of
Cosine Similarity
for two vectors.
It is symmetrical to the <=>
operator.
SELECT COSINE_DISTANCE([-0.7888,-0.7361,-0.6208,-0.5134,-0.4044], [0.8108,0.6671,0.5565,0.5449,0.4466]);
-- [{'column1': 1.9970250541178656}]
INNER_PRODUCT
The INNER_PRODUCT
function returns the
Inner Product
of two vectors.
It is inverse of the <#>
operator.
SELECT INNER_PRODUCT([1,2], [3,4]);
-- [{'column1': 11.0}]
Time Queries
To make best use of Time Queries, it is a good idea to review the
time-related SQL data types, such as TIMESTAMP
, DATE
, TIME
,
and INTERVAL
.
These are covered in the SQL Data Types section.
It is also a good idea to review Endb's other time-related functions, in case they are helpful to you:
Note on SQL:2011 closed-open period model
All Endb temporal predicates (CONTAINS
, OVERLAPS
, PRECEDES
,
SUCCEEDS
, IMMEDIATELY PRECEDES
, and IMMEDIATELY SUCCEEDS
)
follow the SQL:2011 standard's "closed-open period model".
This means that a period represents all times starting from (and including)
the start time up to (but excluding) the end time.
Note on timezones
Endb currently only supports times encoded as UTC.
Now
Endb provides access to the current value of the clock "now" in multiple date/time configurations.
CURRENT_TIMESTAMP
CURRENT_TIMESTAMP
gets the current date and time in UTC.
SELECT CURRENT_TIMESTAMP;
CURRENT_TIME
CURRENT_TIME
gets the current time in UTC.
SELECT CURRENT_TIME;
CURRENT_DATE
CURRENT_DATE
gets the current date in UTC.
Note that this may be different from your local date,
depending on the time of day when your query is run.
SELECT CURRENT_DATE;
System Time
All states an Endb database has ever seen are recorded, immutably.
Accessing these prior states is accomplished by querying System Time.
System Time is encoded in a special column, which is normally invisible to most queries,
named system_time
(lower case).
Because system_time
is invisible by default, it must be retrieved explicitly:
SELECT *, system_time FROM products;
AS OF (Time Travel)
Endb permits time-traveling queries with the SQL:2011-compatible
AS OF
operator.
The query will treat the DATE
or TIMESTAMP
supplied after AS OF
as if it were that time now.
SELECT * FROM products FOR SYSTEM_TIME AS OF 2023-08-25T00:00:00;
ALL (Time Omniscience)
Endb permits time-omniscient queries with the SQL:2011-compatible
ALL
operator.
All states, across the entire history of the relevant tables, are
visible to a query suffixed with FOR SYSTEM_TIME ALL
:
SELECT * FROM products FOR SYSTEM_TIME ALL;
BETWEEN
The syntax for time-aware BETWEEN
is the same as the
normal BETWEEN
operator.
Inspect System Time with the form FOR SYSTEM_TIME BETWEEN x AND y
.
SELECT * FROM products FOR SYSTEM_TIME BETWEEN 2023-08-24T00:00:00 AND 2023-08-25T00:00:00;
FROM ... TO
Selects rows which fall between the two times, similar to BETWEEN
,
but is exclusive of both the start and end times.
SELECT * FROM products FOR SYSTEM_TIME FROM 2023-08-24T00:00:00 TO 2023-08-30T00:00:00;
Period Predicates
The standard SQL:2011 period predicates are available.
CONTAINS
Returns TRUE
if the second period is contained within the first.
SELECT {start: 2001-01-01, end: 2001-04-01} CONTAINS {start: 2001-02-01, end: 2001-04-01};
OVERLAPS
Returns TRUE
if any part of the first period is found within the second.
SELECT {start: 2001-01-01, end: 2001-03-01} OVERLAPS {start: 2001-02-01, end: 2001-04-01};
PRECEDES
Returns TRUE
if the first period ends before the second period begins.
SELECT 2001-03-01 PRECEDES [2001-04-01T00:00:00Z, 2001-05-01];
SUCCEEDS
Returns TRUE
if the first period begins after the second period ends.
SELECT 2001-06-01 SUCCEEDS [2001-04-01T00:00:00Z, 2001-05-01];
IMMEDIATELY PRECEDES
Returns TRUE
if the first period ends exactly as the second period begins.
SELECT 2001-04-01 IMMEDIATELY PRECEDES [2001-04-01T00:00:00Z, 2001-05-01];
IMMEDIATELY SUCCEEDS
Returns TRUE
if the first period begins exactly as the second period ends.
SELECT 2001-05-01 IMMEDIATELY SUCCEEDS [2001-04-01T00:00:00Z, 2001-05-01];
Path Navigation
Because Endb is schemaless and semi-structured, it offers a number of powerful path-nativation primitives inspired by JSONPath, SQL/JSON, and their derivatives in legacy relational databases.
You will want to familiarize yourself with Endb's nested data types (arrays and objects) before learning about path navigation.
It will also be helpful to learn about other functions and operators that are frequently helpful in conjunction with path navigation:
- MATCH - filter by containment
- ARRAY_AGG - concatenate results into an array
- UNNEST - pull array elements into separate rows
- WITH - Common Table Expressions
In the examples below, most path examples are shown in the SELECT
clause,
but they are also valid in the WHERE
clause.
Nested Objects
If you are familiar with arrays and objects, try adding
some nested objects to the table paths
(the table name is arbitrary).
INSERT INTO paths {a: 2, b: {a: 3}, c: [{a: 1}, 2]};
Root Navigation
Navigating the root of any document (row) as a tree looks like standard SQL, because it is:
SELECT a FROM paths;
Similarly, however, it is possible to navigate documents listed in an array:
SELECT [{a: 2}, {a: 3}, {b: 4}].a;
-- [{'a': [2, 3]}]
It is also possible to navigate fields of sub-documents (columns of nested rows) with further dot notation:
SELECT b.a FROM paths;
-- [{'a': 3}]
Row Literals
It is possible (and helpful) to create a row literal to
represent rows returned, so they are easier to navigate.
The format of a row literal is { <table>.* }
:
SELECT { paths.* } FROM paths;
Or the shorthand:
SELECT paths FROM paths;
Recursive Paths
The double dot (..
) notation performs a "deep scan" by
recursively walking the document to match the name given.
SELECT { paths.* }..a FROM paths;
-- [{'a': [2, 3, 1]}]
SELECT b..a FROM paths;
-- [{'a': [3]}]
SELECT * FROM paths WHERE paths..a MATCH 3; -- finds a match
SELECT * FROM paths WHERE paths..a MATCH 4; -- does not
Named Child
The square bracket notation (['<NAME>']
) performs a lookup
of a single descendent child.
SELECT { paths.* }['b']['a'] FROM paths;
-- [{'a': 3}]
SELECT b['a'] FROM paths;
-- [{'a': 3}]
Named Children can be combined with recursive paths, though the default recursive path syntax is synonymous with named children:
SELECT { paths.* }..a FROM paths;
SELECT { paths.* }..['a'] FROM paths;
SELECT b..['a'] FROM paths;
Numbered Child
The square bracket notation ([<NUMBER>]
) can also perform indexed
lookups of a single descendent child.
SELECT { paths.* }['b'][0] FROM paths;
-- [{'column1': {'a': 3}}]
SELECT { paths.* }['c'][1] FROM paths;
-- [{'column1': 2}]
SELECT c[1] FROM paths;
-- [{'column1': 2}]
Numbered Children can be combined with recursive paths. This finds and returns all indexed values, counting backward:
SELECT { paths.* }..[-1] FROM paths;
-- [{'column1': [2]}]
Wildcard Child
The square bracket notation ([*]
) can also perform a wildcard
lookup of all descendent children.
SELECT [{a: 2}, {a: 3}, {b: 4}, 5][*];
Wildcards can be combined with recursive paths. This finds and returns all values:
SELECT { paths.* }..[*] FROM paths;
-- [{'column1': [2, {'a': 3}, [{'a': 1}, 2], 3, {'a': 1}, 2, 1]}]
SELECT c..[*] FROM paths;
-- [{'column1': [{'a': 1}, 2, 1]}]
Path Functions
Path editing is accomplished with an extended path syntax, where each
path begins with $
.
Endb's path editing functions are heavily inspired by SQLite's JSON Functions.
Path editing functions add functionality ($
, #
) to a subset
Endb's normal path navigation syntax:
path editing functions do not support recursion or wildcards.
Path editing functions are available to
UPDATE ... SET
and
UPDATE ... UNSET/REMOVE
.
path_set
Takes an object, a path, and a new value. The new value will overwrite existing fields or add a new field if it doesn't already exist.
SELECT path_set({a: 2, c: 4}, $.c, [97,96]);
-- {'a': 2, 'c': [97, 96]}
path_replace
Takes an object, a path, and a new value. The new value is ignored if the path does not match an existing field.
SELECT path_replace({a: 2, c: 4}, $.a, 99);
-- {'a': 99, 'c': 4}
SELECT path_replace({a: 2, c: 4}, $.e, 99);
-- {'a': 2, 'c': 4}
path_insert
Takes an object, a path, and a new value. The new value is ignored if the path matches an existing field.
SELECT path_insert({a: 2, c: 4}, $.e, 99);
-- {'a': 2, 'c': 4, 'e': 99}
path_remove
Takes an object and a variable number of arguments specifying which paths to remove.
If a path is not found, nothing is removed for that argument.
#
represents the last element in a collection.
SELECT path_remove([0,1,2,3,4], $[#-1], $[0]);
-- [1, 2, 3]
SELECT path_remove({x: 25, y: 42}, $.y);
-- {'x': 25}
path_extract
Takes an object and a variable number of path arguments.
Returns the value found at each path, if any, otherwise NULL
.
If only a single path is provided, a scalar is returned.
If multiple paths are provided, an array is returned.
SELECT path_extract({a: 2, c: [4, 5, {f: 7}]}, $.c[2].f);
-- 7
SELECT path_extract({a: 2, c: [4, 5], f: 7}, $.x, $.a);
-- [NULL, 2]
Schema
Endb allows introspection of its information schema.
The Endb information schema does not describe the structure of
each table.
Because Endb is a document database, each document (row) is responsible
for its own schema.
The information schema is used by Endb to describe database objects
at a high level and is used for schemaless queries, such as SELECT *
.
Note that all information schema tables are hard-coded to lower-case names and must be queried as such.
Tables
-> SELECT * FROM information_schema.tables;
[{'table_catalog': None,
'table_name': 'stores',
'table_schema': 'main',
'table_type': 'BASE TABLE'},
{...
'table_name': 'products',
... },
{...
'table_name': 'sales',
... }]
Columns
-> SELECT * FROM information_schema.columns;
[{'column_name': 'addresses',
'ordinal_position': 0,
'table_catalog': None,
'table_name': 'stores',
'table_schema': 'main'},
{'column_name': 'brand',
... },
{'column_name': 'price',
... },
... ]
Views
-> SELECT * FROM information_schema.views;
[{'table_catalog': None,
'table_name': 'sold_products',
'table_schema': 'main',
'view_definition': 'SELECT * FROM products p JOIN sales s ON p.id = s.p_id'}]
Check Constraints
The check_constraints
table in Endb is used to store assertions.
-> SELECT * FROM information_schema.check_constraints;
[{'check_clause': "(NOT EXISTS (SELECT * FROM users WHERE TYPEOF(email) != 'text'))",
'constraint_catalog': None,
'constraint_name': 'string_email',
'constraint_schema': 'main'}]
Views
Endb provides basic view functionality.
CREATE VIEW
CREATE VIEW
creates a non-materialized view based on the query
which follows the AS
operator.
Column names are listed in parentheses after the view name.
CREATE VIEW simple_products(name, price) AS SELECT name, ROUND(price) FROM products;
Alternatively, named columns can each immediately follow queried columns.
CREATE VIEW easy_products AS SELECT name label, ROUND(price) easy_price FROM products;
NOTE: To modify a view, use DROP VIEW
then re-create the view with the desired columns.
DROP VIEW
DROP VIEW
deletes a view based on its name.
DROP VIEW easy_products;
Assertions
CREATE ASSERTION
Creates a checked, deferred assertion which executes on commit
for inserts and updates.
Although CREATE ASSERTION
(normally) refers to the table
it is asserting on, that table need not exist for the assertion to be created.
Type Constraint
CREATE ASSERTION string_email CHECK (NOT EXISTS (SELECT * FROM users WHERE TYPEOF(email) != 'text'));
INSERT INTO users {name: 'Steven', email: 123};
Unique Constraint
There are a number of possible ways to create the equivalent of a UNIQUE
constraint,
as found in schema-on-write databases.
One sensible approach is given below.
CREATE ASSERTION unique_email CHECK (1 >= (SELECT MAX(c.cnt) FROM (SELECT COUNT(*) AS cnt FROM users GROUP BY email) AS c));
INSERT INTO users {name: 'Steven', email: 's@endatabas.com'};
INSERT INTO users {name: 'Sarah', email: 's@endatabas.com'};
Not Null Constraint
There are multiple possible meanings to "not null" columns in Endb:
Columns can be strictly checked for literal NULL
in the value position.
Rows can be forbidden from eliding a column (a missing value).
Both literal NULL
and elided columns can both be prevented.
This example checks for literal NULL
only:
CREATE ASSERTION notnull_email CHECK (NOT EXISTS (SELECT * FROM users WHERE email IS NULL));
INSERT INTO users {name: 'Tian Tian', email: NULL}; -- check fails
INSERT INTO users {name: 'Tian Tian'}; -- permitted
DROP ASSERTION
Removes an assertion from the database based on its name.
DROP ASSERTION string_email;
Clients
Endatabas provides a variety of tools to communicate with an Endb instance.
The Console is a command-line user interface to send SQL to Endb manually. The client libraries are used in the data access layer of an application which uses Endb for storage.
The client libraries support both HTTP and WebSocket APIs.
To access Endb over HTTP, create an Endb
instance
and call the sql
method on it.
To access Endb over WebSockets, create an EndbWebSocket
instance and call the sql
method on it.
See the language-specific documentation below for further details.
NOTE: For any programming language not listed above, it is possible to create small client libraries such as these using the HTTP and WebSocket APIs. All official Endatabas clients are permissively-licensed and you are welcome to use them as examples to create your own client library. Their source code is available here: https://github.com/endatabas/endb/tree/main/clients
Console
A small console is provided to learn Endb SQL interactively and run experimental queries. It is distributed in the same Python package as the Python client.
Install
pip install endb
Usage
endb_console [--url URL] [-u USERNAME] [-p PASSWORD]
Show additional command line options with endb_console --help
:
usage: endb_console [-h] [--url URL] [-u USERNAME] [-p PASSWORD] [sql ...]
positional arguments:
sql SQL statement or file
options:
-h, --help show this help message and exit
--url URL
-u USERNAME, --username USERNAME
-p PASSWORD, --password PASSWORD
Prompt
When the Endb console starts, it will provide a multiline prompt (->
)
where you can enter SQL statements and queries.
The prompt will extend onto new lines (..
) until you enter a semicolon.
-> INSERT INTO users
.. {name: 'Conrad', email: 'c1o2n@shell.com'};
-> SELECT * FROM users;
The console has history which you can use the up and down arrows to navigate. History does not persist between console sessions.
Learn more about Endb SQL in the SQL Reference.
To quit, type CTRL+D
.
Commands
The console comes with a number of special, non-SQL commands.
URL
Set the URL at runtime with the url
command.
-> url https://192.168.1.200:3803/sql
Accept
Set the accept header content type at runtime with the accept
command.
-> accept text/csv
Username
Set the username at runtime with the username
command.
-> username efcodd
Password
Set the password at runtime with the password
command.
-> password equ1valenc3
Timer
All SQL commands can be timed by toggling the timer on: timer on
.
Toggle the timer back off with timer off
.
-> timer on
-> SELECT * FROM users;
[{'name': 'Aaron'}
{'name': 'Irene'}
... ]
Elapsed: 0.003904 ms
Quit
Quit the console by typing quit
.
-> quit
Data Types
The console communicates with Endb over the HTTP API. Data is returned as LD-JSON documents and marshalled into strongly-typed Python objects. For example:
[{'date': None, 'email': 'c1o2n@shell.com', 'name': 'Conrad'},
{'date': datetime.datetime(2024, 1, 29, 18, 18, 30, 129159, tzinfo=datetime.timezone.utc),
'email': 'kitty@tramline.in',
'name': 'Akshay'}]
The Data Types page talks about types in more detail.
JavaScript
The JavaScript and TypeScript client library is used to communicate with an Endb instance from a JavaScript or TypeScript application. Type declarations are provided in the package for TypeScript.
Table of Contents
Install
npm install @endatabas/endb
npm install ws
Usage Examples
Import
import { Endb, EndbWebSocket } from '@endatabas/endb';
Endb
Use the Endb
class to communicate with Endb over HTTP.
It accepts an optional url
parameter.
Options can be supplied for accept
, username
, and password
.
Accept headers default to LD-JSON and can be set to any valid
content type listed in the HTTP API.
If you choose application/vnd.apache.arrow.file
or application/vnd.apache.arrow.stream
,
the raw response body will be be returned from sql()
.
var e = new Endb();
var e = new Endb('http://localhost:3803/sql');
var e = new Endb('http://localhost:3803/sql', {accept: 'text/csv'});
var e = new Endb('http://localhost:3803/sql', {accept: 'application/json', username: 'zig', password: 'zag'});
NOTE: Choosing accept headers other than LD-JSON will return
JavaScript data structures symmetrical with those returned from
the respective accept header provided to the HTTP API.
text/csv
returns comma-delimited strings, application/json
returns tuples as arrays, and so on.
EndbWebSocket
Use the EndbWebSocket
class to communicate with Endb over WebSockets.
It accepts an optional url
parameter.
Options can be supplied for ws
(any implementation of the
JavaScript WebSocket interface definition),
username
, and password
.
In a web browser, ws
will default to the web browser's WebSocket implementation.
EndbWebSocket
only communicates in LD-JSON, so the accept header cannot be set.
// in web browser:
var ews = new EndbWebSocket();
var ews = new EndbWebSocket('ws://localhost:3803/sql', {username: 'zig', password: 'zag'});
// in node.js:
import WebSocket from 'ws';
var ews = new EndbWebSocket({ws: WebSocket});
var ews = new EndbWebSocket('ws://localhost:3803/sql', {ws: WebSocket, username: 'zig', password: 'zag'});
sql()
The asynchronous sql
method is available to both Endb
and EndbWebSocket
.
It accepts q
, and optional p
, m
, and accept
parameters
and returns an array of strongly-typed documents.
(See JavaScript API Reference.)
To ignore the p
or m
parameters but still supply an accept header,
supply the default values or null
.
e.sql("SELECT * FROM users;");
e.sql("INSERT INTO USERS (date, name, email) VALUES (?, ?, ?);", [new Date(), 'Aaron', 'aaron@canadahealth.ca']);
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true);
e.sql("SELECT * FROM users;", [], false, 'text/csv');
e.sql("SELECT * FROM users;", null, null, 'application/json');
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
Template Literals
It is possible to use Template Literals (string templating) to pass named SQL parameters. The parameter passed to the Template Literal is only valid when used in a position where a positional SQL parameter is also valid. The signature which accepts Template Literals does not accept any other parameters to the method.
var n = 'Michael';
e.sql`INSERT INTO users (name) VALUES (${n});`;
var u = {name: 'Radha', roles: ['artist', 'marketing']};
e.sql`INSERT INTO users ${u}`;
Data Types
When an LD-JSON (default) accept header is used, strongly typed data is returned according to this mapping:
null
-null
xsd:date
-Date
xsd:dateTime
-Date
xsd:base64Binary
-Uint8Array
xsd:integer
-BigInt
For more information on Endb data types, see the Data Types doc.
Complete Examples
import { Endb } from '@endatabas/endb';
var e = new Endb();
await e.sql("INSERT INTO users {name: 'Thupil'};");
var result = await e.sql("SELECT * FROM users;");
console.log(result);
var e2 = new Endb('http://localhost:3803/sql', {accept: 'application/json', username: 'zig', password: 'zag'});
await e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
result = await e.sql("SELECT * FROM users;", null, null, 'application/json');
console.log(result);
import WebSocket from 'ws';
import { EndbWebSocket } from '@endatabas/endb';
var ews = new EndbWebSocket({ws: WebSocket});
await ews.sql("insert into users {name: 'Lydia'};");
var ws_result = await ews.sql("select * from users;");
console.log(ws_result);
var ews2 = new EndbWebSocket({ws: WebSocket, username: 'zig', password: 'zag'});
await ews2.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
ws_result = await ews2.sql("SELECT * FROM users;", null, null, 'application/json');
console.log(ws_result);
JavaScript API Reference
NOTE: The following API documentation is generated from source code docstrings in the
endb
repository.
Classes
- Endb
Endatabas client for the HTTP API
- EndbWebSocket
Endatabas client for the WebSocket API
Endb
Endatabas client for the HTTP API
Kind: global class
- Endb
new Endb([url], [opt])
.sql(q, [p], [m], [accept])
⇒Promise.<Array>
new Endb([url], [opt])
Create an Endb object (Endatabas client for the HTTP API)
Param | Type | Default | Description |
---|---|---|---|
[url] | string | "http://localhost:3803/sql" | HTTP URL to the Endatabas /sql API |
[opt] | Object | HTTP options |
|
[opt.accept] | string | "application/ld+json" | Accept Header content type |
[opt.username] | string | username for HTTP Basic Auth |
|
[opt.password] | string | password for HTTP Basic Auth |
endb.sql(q, [p], [m], [accept])
⇒ Promise.<Array>
Execute a SQL statement over HTTP
Kind: instance method of Endb
Returns: Promise.<Array>
- - Array of documents
Param | Type | Description |
---|---|---|
q | string | SQL query as string or Template Literal |
[p] | Array | Object | Positional parameters, named parameters, or an array of either |
[m] | boolean | many parameters flag |
[accept] | string | Accept Header content type |
Example
// Simple query
sql("SELECT * FROM users;");
// Positional parameters
sql("INSERT INTO users (date, name) VALUES (?, ?);", [new Date(), 'Aaron']);
// Named parameters
sql("INSERT INTO users {date: :d, name: :n};", {d: new Date(), n: 'Aaron'});
// Many positional parameters (batches)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true);
// Many named parameters (batches)
sql("INSERT INTO users {name: :n};", [{n: 'Judy'}, {n: 'Addis'}], true);
// All parameters (many parameters and accept header)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
// Named parameters via Template Literals
sql(`INSERT INTO users (name) VALUES (${u});`, [{u: 'Michael'}]);
EndbWebSocket
Endatabas client for the WebSocket API
Kind: global class
- EndbWebSocket
new EndbWebSocket([url], [opt])
.close()
.sql(q, [p], [m])
⇒Promise.<Array>
new EndbWebSocket([url], [opt])
Create an EndbWebSocket object (Endatabas client for the WebSocket API)
Param | Type | Default | Description |
---|---|---|---|
[url] | string | "ws://localhost:3803/sql" | WebSocket URL to the Endatabas /sql API |
[opt] | Object | WebSocket options |
|
[opt.ws] | string | WebSocket implementation |
|
[opt.username] | string | username for Basic Auth |
|
[opt.password] | string | password for Basic Auth |
endbWebSocket.close()
Close the WebSocket connection
Kind: instance method of EndbWebSocket
endbWebSocket.sql(q, [p], [m])
⇒ Promise.<Array>
Execute a SQL statement over a WebSocket with LD-JSON
Kind: instance method of EndbWebSocket
Returns: Promise.<Array>
- - Array of documents
Param | Type | Description |
---|---|---|
q | string | SQL query as string or Template Literal |
[p] | Array | Object | Positional parameters, named parameters, or an array of either |
[m] | boolean | many parameters flag |
Example
// Simple query
sql("SELECT * FROM users;");
// Positional parameters
sql("INSERT INTO users (date, name) VALUES (?, ?);", [new Date(), 'Aaron']);
// Named parameters
sql("INSERT INTO users {date: :d, name: :n};", {d: new Date(), n: 'Aaron'});
// Many positional parameters (batches)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true);
// Many named parameters (batches)
sql("INSERT INTO users {name: :n};", [{n: 'Judy'}, {n: 'Addis'}], true);
// All parameters (many parameters and accept header)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
// Named parameters via Template Literals
sql(`INSERT INTO users (name) VALUES (${u});`, [{u: 'Michael'}]);
Python
The Python client library is used to communicate with an Endb instance from a Python application.
Table of Contents
Install
pip install endb
pip install websockets # optional WebSocket support
pip install pyarrow # optional Apache Arrow support
Usage Examples
Import
from endb import (Endb, EndbWebSocket)
Endb
Use Endb
to communicate with Endb over HTTP.
It accepts optional url
, accept
, username
, and password
parameters.
Accept headers default to LD-JSON and can be set to any valid
content type listed in the HTTP API.
If you have pyarrow
installed, you can also use application/vnd.apache.arrow.file
and application/vnd.apache.arrow.stream
.
e = Endb()
e = Endb('http://localhost:3803/sql')
e = Endb('http://localhost:3803/sql', 'text/csv')
e = Endb('http://localhost:3803/sql', 'application/json', 'zig', 'zag')
NOTE: Choosing accept headers other than LD-JSON will return
Python data structures symmetrical with those returned from
the respective accept header provided to the HTTP API.
text/csv
returns comma-delimited strings, application/json
returns tuples as lists, and so on.
EndbWebSocket
Use the EndbWebSocket
class to communicate with Endb over WebSockets.
It accepts optional url
, username
, and password
parameters.
ews = EndbWebSocket()
ews = EndbWebSocket('ws://localhost:3803/sql', 'zig', 'zag')
sql()
The sql
method is available to both Endb
and EndbWebSocket
.
It accepts q
, and optional p
, m
, and accept
parameters
and returns an list of strongly-typed documents.
(See Python API Reference.)
It is sychronous for Endb
and asynchronous for EndbWebSocket
.
To ignore the p
or m
parameters but still supply an accept header,
supply the default values or use a named (accept
) parameter.
from datetime import date, datetime, timezone
e.sql("INSERT INTO USERS (date, name, email) VALUES (?, ?, ?);", [datetime.now(timezone.utc), 'Aaron', 'aaron@canadahealth.ca'])
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True)
e.sql("SELECT * FROM users;", [], False, 'text/csv')
e.sql("SELECT * FROM users;", accept = 'text/csv')
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True, 'text/csv')
Data Types
When an LD-JSON (default) accept header is used, strongly typed data is returned according to this mapping:
null
-None
xsd:date
-datetime.date
xsd:time
-datetime.time
xsd:dateTime
-datetime.datetime
xsd:base64Binary
-bytes
(frombase64
)xsd:integer
-int
For more information on Endb data types, see the Data Types doc.
Complete Examples
from endb import Endb
from datetime import date, datetime, timezone
e = Endb()
e.sql("INSERT INTO users {name: 'Yuvi'}")
e.sql("SELECT * FROM users;")
e2 = Endb('http://localhost:3803/sql', 'application/json', 'zig', 'zag')
e2.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True, 'text/csv')
e2.sql("SELECT * FROM users;", [], False, 'application/json')
When the websockets
dependency is installed, it is possible to
return asynchronous results to the Python interactive shell
directly if you start it with python3 -m asyncio
:
from endb import EndbWebSocket
ews = EndbWebSocket()
await ews.sql("INSERT INTO users {name: 'Lydia'}")
ws_result = await ews.sql("SELECT * FROM users;")
print(ws_result)
ews2 = EndbWebSocket(username = 'zig', password = 'zag')
await ews2.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True, 'text/csv')
ws_result = await ews2.sql("SELECT * FROM users;", [], False, 'application/json')
print(ws_result)
Python API Reference
NOTE: The following API documentation is generated from source code docstrings in the
endb
repository.
endb module
class endb.Endb(url='http://localhost:3803/sql', accept='application/ld+json', username=None, password=None)
Bases: AbstractEndb
An Endatabas client for the HTTP API
url
HTTP URL of an Endatabas instance
- Type: str
accept
Accept header content type
- Type: str
username
Username for HTTP Basic Auth
- Type: str
password
Password for HTTP Basic Auth
- Type: str
sql(q, p=[], m=False, accept=None)
Executes a SQL statement
The SQL statement is sent to Endb.url over HTTP.
- Parameters:
- q (str) – SQL statement or query to execute
- p (array_like , default= [ ]) – Positional or named SQL parameters (or an array of either, if using many parameters)
- m (bool , default=False) – Many parameters flag
- accept (str , optional) – Accept header content type (defaults to Endb.accept)
- Raises: TypeError – Internal error if attempting to translate an unknown type to LD-JSON.
class endb.EndbWebSocket(url='ws://localhost:3803/sql', username=None, password=None)
Bases: AbstractEndb
An Endatabas client for the HTTP API
url
HTTP URL of an Endatabas instance
- Type: str
username
Username for HTTP Basic Auth
- Type: str
password
Password for HTTP Basic Auth
- Type: str
async close()
Closes the WebSocket connection
async sql(q, p=[], m=False, accept=None)
Executes a SQL statement
The SQL statement is sent to Endb.url over WebSockets.
- Parameters:
- q (str) – SQL statement or query to execute
- p (array_like , default= [ ]) – Positional or named SQL parameters (or an array of either, if using many parameters)
- m (bool , default=False) – Many parameters flag
- accept (str , optional) – Ignored. WebSocket communication is always in LD-JSON.
- Raises:
- AssertionError – If ‘id’ of request and response do not match.
- RuntimeError – If response from server contains an error.
Reference
Reference documentation exists as a detailed perspective on each feature of Endb. It does not need to be read from beginning to end.
Installation
You only need one of the following installation options. If you followed the Quickstart, you already have the Docker Hub version installed.
Warning: Early Beta
Endb is still in early beta.
While in beta, new versions are not guaranteed to have binary-compatible storage with previous versions. We encourage you to experiment with Endb but please do not use it in production until a General Availability release.
Installing from Docker Hub
There are two major release channels available:
stable and nightly.
To run the nightly builds, replace endatabas/endb
with endatabas/endb:nightly
in the commands below.
If you run Docker, you can use the default command below.
--rm
cleans up (optional), -p
exposes the default endb
port,
and -v
mounts a local volume so your data persists even if you shut down the Docker image.
mkdir -p endb_data
docker pull endatabas/endb
docker run --rm -p 3803:3803 -v endb_data:/app/endb_data endatabas/endb
If you run Podman, you'll need to specify the docker.io
repo explicitly:
mkdir -p endb_data
podman pull docker.io/endatabas/endb
podman run --rm -p 3803:3803 -v endb_data:/app/endb_data docker.io/endatabas/endb
Installing from Git: Docker
If you want to run endb
from the main branch, compile and build the Docker image:
Installing from Git: Binary
If you don't want Docker at all, you can compile and run the endb
binary:
- https://github.com/endatabas/endb/#building
./target/endb
NOTE: If you move the endb
binary, be sure to copy libendb.so
(Linux)
or libendb.dylib
(MacOS) into the same directory.
This is because endb
requires libendb
to run.
Operation
The endb
executable aims to provide self-explanatory help
for direct usage of the binary.
See Monitoring for more information about logging, metrics, and tracing.
$ endb --help
Usage: endb [OPTIONS]
Options:
-d, --data-directory <DATA_DIRECTORY> [env: ENDB_DATA_DIRECTORY=] [default: endb_data]
-h, --help Print help
-V, --version Print version
Authentication:
--username <USERNAME> [env: ENDB_USERNAME=]
--password <PASSWORD> [env: ENDB_PASSWORD=]
Network:
-p, --port <PORT> [env: ENDB_PORT=] [default: 3803]
--bind-address <BIND_ADDRESS> [env: ENDB_BIND_ADDRESS=] [default: 0.0.0.0]
--protocol <PROTOCOL> [env: ENDB_PROTOCOL=] [default: http] [possible values: http, https]
--cert-file <CERT_FILE> [env: ENDB_CERT_FILE=]
--key-file <KEY_FILE> [env: ENDB_KEY_FILE=]
The -d
option accepts a special value of :memory:
to run an in-memory node,
without persisting anything to disk.
The --cert-file
and --key-file
options are ignored when --protocol
is set to http
.
When --protocol
is set to https
, they are both required.
Backup and Restore
If you would like to back up your Endb data, you can use any commodity copy or sync tool
(such as rsync
) to maintain a copy of your --data-directory
(ENDB_DATA_DIRECTORY
).
To restore that directory, stop Endb, replace or sync that directory where Endb is running, and restart Endb.
In the future, a backup option will be provided for an object store separated from compute.
Monitoring
Endb exposes plaintext logs, Prometheus metrics, and optional OpenTelemetry tracing. Any one, or a combination, of these can be used to monitor an Endb instance.
Some monitoring options are offered as flags.
Set flags by setting the environment variable to 0
or 1
(ex. ENDB_LOG_ANSI=0
).
Logging
By default, endb
logs to STDOUT with a log level of endb=INFO
.
Adjust the log level with the ENDB_LOG_LEVEL
environment variable.
Endb uses Rust Logging
and more details about log levels are available in that document.
Other flags include:
ENDB_LOG_ANSI
- turns ANSI terminal colour output on or off (on by default)ENDB_LOG_THREAD_IDS
- set to enable or disable logging of thread ids (enabled by default)
Example:
docker run --rm -e ENDB_LOG_LEVEL=endb=DEBUG -e ENDB_LOG_ANSI=0 -e ENDB_LOG_THREAD_IDS=1 -p 3803:3803 -v demo_data:/app/endb_data docker.io/endatabas/endb:latest
Prometheus
A Prometheus endpoint is available at /metrics
.
This endpoint is enabled by default.
If your Endb instance is running locally, you can view metrics in a browser at
http://localhost:3803/metrics
The Prometheus tracing level defaults to endb=DEBUG
.
Set the tracing level with ENDB_TRACING_LEVEL
.
OpenTelemetry
To enable OpenTelemetry, set ENDB_TRACING_OTEL=1
and
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317/
Configuration options for the OTLP exporter and
additional OTLP environment variables
may also prove useful for configuration.
The OpenTelemetry tracing level defaults to endb=DEBUG
.
Set the tracing level with ENDB_TRACING_LEVEL
.
At the moment, only gRPC metrics are exposed, not HTTP.
Metrics
Endb exposes a number of metrics to the Prometheus endpoint and OpenTelemetry. These can be useful in debugging performance of applications connected to Endb.
Histograms
query_real_time_duration_seconds
query_gc_run_time_duration_seconds
query_consed_bytes
http_request_duration_seconds
Counters
queries_active
interactive_transactions_active
buffer_pool_usage_bytes
dynamic_space_usage_bytes
Monotonic Counters
websocket_message_internal_errors_total
object_store_read_bytes_total
object_store_written_bytes_total
queries_total
transactions_conflicted_total
transactions_committed_total
transactions_prepared_total
transactions_retried_total
wal_read_bytes_total
wal_written_bytes_total
http_requests_total
Tracing
Endb exposes a variety of tracing spans to OpenTelemetry. Tracing data is mostly useful if your Endb instance is not performing in the way you expect.
buffer_pool_eviction
build_info
commit
compaction
constraints
gc
index
log_replay
log_rotation
object_store_delete
object_store_get
object_store_list
object_store_put
query
shutdown
snapshot
startup
wal_append_entry
wal_read_next_entry
wal_fsync
websocket_connections_active
websocket_message_duration_seconds
websocket_messages_total
HTTP API
If a client is
not available for your programming language yet, your app can interact
directly with the Endb HTTP API.
Any HTTP client may be used but in the examples below we'll use curl
to demonstrate the API without writing any code.
You can send SQL statements to endb
over HTTP:
curl -d "INSERT INTO users (name) VALUES ('Tianyu')" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
curl -d "SELECT * FROM users" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
You can send SQL to endb
with standard HTTP Query Parameters, Verbs,
Content Types, Accept Headers, and HTTP Basic Authentication.
Each one is outlined below.
HTTP Query Parameters
The query parameters Endb's HTTP endpoint accepts are:
q
- (q)uery: a SQL query, optionally parameterizedp
- (p)arameters: named or positional parametersm
- (m)any parameter lists: bulk parameters, used for bulk insert/update
HTTP Verbs
POST
allows explicit Content Types and Accept headers:
curl -d 'SELECT 1' -H "Content-Type: application/sql" -H "Accept: text/csv" -X POST http://localhost:3803/sql
GET
allows a single, simple URL.
GET
does not permit DML.
curl -X GET "http://localhost:3803/sql?q=SELECT%201"
Content Types
The HTTP Content-Type
header is used to specify what
format the client is sending data to Endb.
application/json
:
curl -d '{"q": "SELECT * from products;"}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "SELECT * from products WHERE name = ?;", "p": ["Salt"]}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "INSERT INTO products {name: :name};", "p": {"name": "Paprika"}}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
NOTE: To enable strongly-typed values, payloads sent with
the application/json
content type have values resolved with JSON-LD scalars.
Standard JSON values are a subset of JSON-LD scalars, so data sent as regular
JSON is unaffected by this behaviour.
application/ld+json
Although values in the application/json
content type are resolved using
JSON-LD scalars, you can explicitly specify an application/ld+json
content type to avoid all ambiguity.
See JSON-LD.
curl -d '{"q": "INSERT INTO events {start: :start};", "p": {"start": {"@type": "xsd:dateTime", "@value": "2011-04-09T20:00:00Z"}}}' -H "Content-Type: application/ld+json" -X POST http://localhost:3803/sql
application/sql
:
curl -d 'SELECT 1' -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
Submit parameters to application/sql
by providing form data or query parameters.
Form data and query parameters can be combined, though it is not necessarily recommended.
curl -F q="INSERT INTO sauces {name: ?, color: ?};" -X POST http://localhost:3803/sql?p=%5B%22ketchup%22%2C%22purple%22%5D
multipart/form-data
curl -F q="SELECT * from products;" -H "Content-Type: multipart/form-data" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO products {name: ?};" -F p='["Sriracha"]' -X POST http://localhost:3803/sql
NOTE: Many HTTP clients (including curl
) automatically assume a content type of
multipart/form-data
when form fields are provided.
This is true for curl
when the -F
(--form
) argument is used and it has
been elided from further examples.
application/x-www-form-urlencoded
Although the other content types are preferable for obvious reasons,
application/x-www-form-urlencoded
is offered for completeness.
curl -d 'q=SELECT%20*%20FROM%20products;' -H "Content-Type: application/x-www-form-urlencoded" -X POST http://localhost:3803/sql
Accept Headers
The HTTP Accept
header is used to specify how data is returned to
the Endb client.
The default Accept
header content type is application/json
.
text/csv
text/csv
returns comma-separated rows.
Column order from the SELECT
clause is maintained.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,'csv')) t1" -H "Content-Type: application/sql" -H "Accept: text/csv" -X POST http://localhost:3803/sql
returns:
"column1","column2"
2,"csv"
1,"hello"
application/json
application/json
returns rows as an array of JSON tuples.
Column order from the SELECT
clause is maintained.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/json" -X POST http://localhost:3803/sql
returns:
[[2,"2023-07-22"],[1,"hello"]]
application/x-ndjson
application/x-ndjson
returns newline-delimited JSON documents.
Column order from the SELECT
clause is not maintained.
JSON documents cannot guarantee column order.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/x-ndjson" -X POST http://localhost:3803/sql
returns:
{"column1":2,"column2":"2023-07-22"}
{"column1":1,"column2":"hello"}
application/ld+json
application/ld+json
returns documents of strongly-typed ("Linking Data") JSON records.
Column order from the SELECT
clause is not maintained.
JSON documents cannot guarantee column order.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/ld+json" -X POST http://localhost:3803/sql
returns:
{"@context":{"xsd":"http://www.w3.org/2001/XMLSchema#","@vocab":"http://endb.io/"},"@graph":[{"column1":2,"column2":{"@value":"2023-07-22","@type":"xsd:date"}},{"column1":1,"column2":"hello"}]}
See JSON-LD.
multipart/mixed
multipart/mixed
returns documents as a valid multipart/mixed
response,
albeit all as LD-JSON.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: multipart/mixed" -X POST http://localhost:3803/sql
returns:
--369d7885-c6e9-4b01-93c9-9449a5fcd637
Content-Type: application/ld+json
{"@context":{"xsd":"http://www.w3.org/2001/XMLSchema#","@vocab":"http://endb.io/"},"@graph":[{"column1":2,"column2":{"@value":"2023-07-22","@type":"xsd:date"}},{"column1":1,"column2":"hello"}]}
--369d7885-c6e9-4b01-93c9-9449a5fcd637--
application/vnd.apache.arrow.file
application/vnd.apache.arrow.file
returns columnar data as an Apache Arrow file.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/vnd.apache.arrow.file" -X POST http://localhost:3803/sql --output hello.arrow
The above command returns a file containing a single RecordBatch
in an Apache Arrow file in IPC format.
You can examine the file with functions like
pyarrow.ipc.open_file
,
as seen in this gist.
application/vnd.apache.arrow.stream
application/vnd.apache.arrow.stream
returns columnar data as an Apache Arrow stream.
curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/vnd.apache.arrow.stream" -X POST http://localhost:3803/sql --output streamed.arrows
The above command returns a file containing an Apache Arrow IPC stream.
You can examine the file with functions like
pyarrow.ipc.open_stream
,
as seen in this gist.
HTTP Basic Authentication
Endb supports HTTP Basic Authentication as defined by
RFC 7235.
Pass --username
and --password
arguments to the endb
binary to force
basic authentication for HTTP connections.
(See Operation for more details, including environment variables
which can be passed to Docker images.)
./target/endb --username zig --password zag
Then, from any HTTP client, provide the username and password combination to execute queries.
curl --user zig:zag -d "SELECT 'Hello World';" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
If the client passes an incorrect username or password, it will receive a
401 Authorization Required
HTTP status code as a result, but no body.
Be aware of this to ensure client code is written to detect 401 status codes.
$ curl -i --user zig:wrong -d "SELECT 'Hello World';" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
HTTP/1.1 401 Authorization Required
NOTE: HTTP Basic Auth is not secure without the additional use of HTTPS.
Parameters
SQL parameters are available to:
application/json
andapplication/ld+json
as part of thePOST
bodymultipart/form-data
as form dataapplication/x-www-form-urlencoded
as URL query parametersapplication/sql
as form data and/or URL query parameters
Parameters can be JSON literals, JSON-LD scalars, or SQL literals.
A JSON-LD scalar always has the form: {"@type": "xsd:TYPE", "@value": "DATA"}
.
JSON-LD types are listed under the Data Types table.
Named Parameters
Named parameters substitute parameter placeholders with the form :param
by the parameter key with the corresponding name.
Named parameters are represented as a JSON object.
curl -d '{"q": "INSERT INTO products {name: :name, price: :price};", "p": {"name": "Paprika", "price": 2.99}}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "INSERT INTO events {start: :start};", "p": {"start": {"@type": "xsd:dateTime", "@value": "2011-04-09T20:00:00Z"}}}' -H "Content-Type: application/ld+json" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO products {name: :sauce};" -F p='{"sauce": "Sriracha"}' -X POST http://localhost:3803/sql
Positional Parameters
Positional parameters substitute parameter placeholders with the form ?
by the parameter values, in the order they appear.
Positional parameters are respresented as a JSON array.
curl -d '{"q": "SELECT * FROM products WHERE name = ? AND price > ?;", "p": ["Salt", 3.99]}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "INSERT INTO events {start: ?};", "p": [{"@type": "xsd:dateTime", "@value": "2011-04-09T20:00:00Z"}]}' -H "Content-Type: application/ld+json" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO products {name: ?};" -F p='["Sriracha"]' -X POST http://localhost:3803/sql
Bulk Parameters
Bulk operations are possible by setting the m
flag to true
.
Bulk operations are available to both named and positional parameters.
The list of parameters supplied in bulk must be nested in an array.
curl -d '{"q": "INSERT INTO products {name: :name};", "p": [{"name": "Soda"}, {"name": "Tonic"}], "m": true}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO sauces {name: ?, color: ?};" -F p='[["Mustard", "Yellow"], ["Ketchup", "Red"]]' -F m=true -X POST http://localhost:3803/sql
Apache Arrow File Parameters
As it is possible to receive Apache Arrow data from an Endb query,
it is possible to submit Apache Arrow as a statement parameter.
The example below assumes the existence of a a table called names
,
which only contains one column (name
).
Apache Arrow Streams can also be used as parameters in the same way.
# create a sample Arrow file:
curl -d "SELECT * FROM names;" -H "Content-Type: application/sql" -H "Accept: application/vnd.apache.arrow.file" -X POST http://localhost:3803/sql --output names.arrow
# use the sample Arrow file:
curl -F m=true -F q="INSERT INTO projects {name: :name};" -F "p=@names.arrow;type=application/vnd.apache.arrow.file" -X POST http://localhost:3803/sql
NOTE: This feature should be used with caution. Do not submit arbitrary Arrow files as parameters. If a malformed Arrow file is submitted, the error message returned (if any) is unlikely to provide guidance. Preferably, Arrow files should be created using Endb itself, as in the example above. Most users will prefer to use a human-readable file format instead, such as a JSON variant or static SQL statements.
Bulk Insert
Bulk inserts are possible by combining the tools mentioned under Parameters.
For example, the OBJECTS
keyword can insert an
array of object literals.
Note that each object used as a positional parameter must be wrapped in a JSON array,
since there may be more than one positional parameter supplied.
Similarly, each named parameter must be wrapped in an object containing a key of the
corresponding name.
curl -F m=true -F q="INSERT INTO products OBJECTS ?" -F p="[[{name: 'jam'}], [{name: 'butter'}]]" -X POST http://localhost:3803/sql
curl -F m=true -F q="INSERT INTO products OBJECTS :product" -F p="[{product: {name: 'jelly'}}, {product: {name: 'ghee'}}]" -X POST http://localhost:3803/sql
Multiple Statements
It is possible to pass multiple SQL statements to Endb by delimiting them with semicolons. Parameters will be passed to all statements in order.
Only the result of the last statement is returned to the client.
In the following example, the INSERT
will be successful but will not
return a result.
The SELECT
will return to the client.
curl --form-string q="INSERT INTO sauces {name: ?, color: ?}; SELECT {namo: ?, colour: ?};" -F p='["Mustard", "Yellow", "Ketchup", "Red"]' -X POST http://localhost:3803/sql
NOTE: --form-string
is required instead of --form
to send semicolon-delimited
statements with curl
.
WebSocket API
If a client is
not available for your programming language yet, your app can interact
directly with the Endb WebSocket API.
Any WebSocket client can be used but in the examples below we'll use
websocat
to demonstrate
the API without writing any code.
Connect to Endb with:
websocat ws://localhost:3803/sql
You can send SQL statements to endb
over
WebSockets
with JSON-RPC 2.0.
{"jsonrpc": "2.0", "id": 1, "method": "sql", "params": {"q": "INSERT INTO users (name) VALUES (?);", "p": [["Tsehay"], ["Iniku"]], "m": true}}
{"jsonrpc": "2.0", "id": 2, "method": "sql", "params": {"q": "SELECT * FROM users;", "p": [], "m": false}}
NOTE: To send the method calls above, paste them into a websocat
session one-at-a-time and press <enter>
.
Further examples will assume JSON-RPC strings are sent in an existing
websocat
session.
JSON-RPC Request Object
Following the JSON-RPC 2.0 spec, a SQL statement or query to the Endb WebSocket API requires these components:
id
- a string or integer identifier used to correlate requests to responsesmethod
- always the string"sql"
params
- request params are sent as JSON-RPC by-name request parameters, as listed below
JSON-RPC Request Parameters
q
- (q)uery: a SQL query, optionally parameterizedp
- (p)arameters: named or positional SQL parametersm
- (m)any parameter lists: bulk parameters, used for bulk insert/update
JSON-RPC Response Object
A response from the Endb WebSocket API will include these components:
id
- the id provided in the request object, so it can be correlatedresult
- the response, encoded as JSON-LD, as in this example:
{"jsonrpc":"2.0", "id":111, "result":{"@context":{"xsd":"http://www.w3.org/2001/XMLSchema#","@vocab":"http://endb.io/"},"@graph":[{"name":"Hing","price":2.99}]}}
If your query is invalid or causes an internal error, you will receive an error instead, which includes:
id
- the id of the request object, if one was provided (otherwisenull
)error
- a code and message, as returned by Endb
{"jsonrpc":"2.0", "id":222, "error":{"code":-32603,"message":"Error: Unknown table: users\n ╭─[<unknown>:1:15]\n │\n 1 │ SELECT * FROM users;\n │ ──┬── \n │ ╰──── Unknown table\n───╯\n"}}
WebSocket Basic Authentication
Endb supports HTTP Basic Authentication, as defined by RFC 7235, on top of RFC 6455. This is because the WebSocket Protocol RFC (6455) does not define authentication mechanisms itself.
Pass --username
and --password
arguments to the endb
binary to force
basic authentication for HTTP connections.
(See Operation for more details, including environment variables
which can be passed to Docker images.)
./target/endb --username zig --password zag
Then, use Basic Auth headers to connect to Endb:
websocat -H='Authorization: Basic' ws://zig:zag@localhost:3803/sql
Rather than Authorization: Basic
, the Sec-WebSocket-Protocol
header
may be used.
It is offered because web browsers do not support the Basic Auth
header over WebSockets.
Parameters
SQL parameters to a WebSocket request are always JSON-LD.
A JSON-LD scalar always has the form: {"@type": "xsd:TYPE", "@value": "DATA"}
.
JSON-LD types are listed under the Data Types table.
For example, if a strongly-typed DateTime is required, it can be submitted like so:
{"jsonrpc": "2.0", "id": 11, "method": "sql", "params": {"q": "INSERT INTO users (name, date) VALUES (?, ?);", "p": ["dosha", {"@value": "2024-01-29T18:18:30.129159", "@type": "xsd:dateTime"}], "m": false}}
Named Parameters
Named SQL parameters substitute parameter placeholders with the form :param
by the parameter key with the corresponding name.
Named parameters are represented as a JSON object.
{"jsonrpc": "2.0", "id": 111, "method": "sql", "params": {"q": "INSERT INTO products {name: :name, price: :price};", "p": {"name": "Hing", "price": 2.99}, "m": false}}
{"jsonrpc": "2.0", "id": 112, "method": "sql", "params": {"q": "INSERT INTO events {start: :start};", "p": {"start": {"@type": "xsd:dateTime", "@value": "2021-04-09T20:00:00Z"}}, "m": false}}
Positional Parameters
Positional SQL parameters substitute parameter placeholders with the form ?
by the parameter values, in the order they appear.
Positional parameters are respresented as a JSON array.
{"jsonrpc": "2.0", "id": 211, "method": "sql", "params": {"q": "SELECT * FROM products WHERE name = ? AND price > ?;", "p": ["Hing", 2.00], "m": false}}
{"jsonrpc": "2.0", "id": 213, "method": "sql", "params": {"q": "INSERT INTO events {start: ?};", "p": [{"@type": "xsd:dateTime", "@value": "2021-04-09T20:00:00Z"}], "m": false}}
Bulk Parameters
Bulk operations are possible by setting the m
flag to true
.
Bulk operations are available to both named and positional parameters.
The list of parameters supplied in bulk must be nested in an array.
{"jsonrpc": "2.0", "id": 311, "method": "sql", "params": {"q": "INSERT INTO users {name: :name};", "p": [{"name": "Sungwon"}, {"name": "Olga"}], "m": true}}
{"jsonrpc": "2.0", "id": 312, "method": "sql", "params": {"q": "INSERT INTO sauces {name: ?, color: ?};", "p": [["Sriracha", "#FE6F5E"], ["Gochujang", "#FF8066"]], "m": true}}
Transactions
Unlike the HTTP API, the WebSocket API is stateful and thus capable of explicit transactions. See the Transactions documentation for further details on SQL syntax.
{"jsonrpc": "2.0", "id": 411, "method": "sql", "params": {"q": "BEGIN TRANSACTION;", "p": [], "m": false}}
{"jsonrpc": "2.0", "id": 412, "method": "sql", "params": {"q": "INSERT INTO delete_me {name: 'Roll Me Back'};", "p": [], "m": false}}
{"jsonrpc": "2.0", "id": 413, "method": "sql", "params": {"q": "ROLLBACK;", "p": [], "m": false}}
{"jsonrpc": "2.0", "id": 414, "method": "sql", "params": {"q": "SELECT * FROM delete_me;", "p": [], "m": false}}
Data Types
Scalars
Endb supports the following native scalar types for now.
JSON-LD values with a type of @json
will be returned from Endb as JSON literals, verbatim.
The @type
listed in parentheses is for reference purposes only.
JSON literal types are implicit so a @value/@type
pair is not returned for a JSON literal.
SQL | JSON-LD | Example Literal | Description |
---|---|---|---|
NULL | @json | null | Null, nil, undefined, or missing |
TEXT | @json (xsd:string ) | "hello" | UTF-8 string |
BOOLEAN | @json (xsd:boolean ) | FALSE | 8-bit boolean |
INTEGER | @json (xsd:int[eger] ) | 42 | 64/128-bit two's complement integer |
REAL | @json (xsd:double ) | 9007199254740992.123 | 64-bit IEEE 754 floating point number |
TIMESTAMP | xsd:dateTime | 2007-01-01T00:00:00 | ISO microsecond precision timestamp |
DATE | xsd:date | 2007-01-01 | ISO date |
TIME | xsd:time | 23:30:00 | ISO time |
INTERVAL | xsd:duration | PT12H34M56.789S | ISO interval |
BLOB | xsd:base64Binary | x'DEADBEEF' | Binary large object |
SQL Data Types are covered in detail in the SQL Reference.
Collections
SQL | JSON-LD | Example Literal | Description |
---|---|---|---|
ARRAY | @json | ["Joe", "Dan", "Dwayne"] | Zero-based array |
OBJECT | @json | {n: 3, b: 2023-01-01} | Object, map, dict, document |
Unsupported Scalar Types
SQL | JSON-LD | Description |
---|---|---|
DECIMAL | xsd:decimal | Arbitrary precision decimal. Use 2 BIGINT s or VARCHAR |
URI | xsd:anyURI | Unsupported. Use VARCHAR |
UUID | @json (xsd:string ) | Unsupported. Use VARCHAR or BLOB |
If you strongly feel you need a native representation of one of these types, email us: hello@endatabas.com
Appendix
What is Endatabas?
From the outside, Endatabas (or Endb) is an open source SQL document database with full history.
On the inside, this means all records in Endatabas are immutable.
An ERASE
statement is provided for compliance with laws and policies like GDPR.
Time-travel is possible, but queries default to "as-of-now".
(Which is the thing you want 97% of the time.)
Rows are "schema-last" documents and joins can be arbitrary, but queries are written in SQL.
Endb stores data in Apache Arrow: scalars are strongly typed, the on-disk format is columnar, and the execution engine understands rows and columns.
Endb separates storage from compute to provide unlimited disk space.
In Swedish, Endatabas means both "a database" and "one database". This One Database Dream is twofold: Endb hopes to provide HTAP, so a secondary analytics database is not required for most users. Endb plans to use AI (adaptive indexing) to provide fast OLTP and OLAP queries on cheap, elastic infrastructure.
Who wants One Database?
After years of market research, the complaint of database users is universal: "I want to stop babysitting the database." This can mean many things but they're all expensive and complex.
The database equivalent of Greenspun's 10th Rule might be "any sufficiently complicated backend system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half a database." This was true for much of the work we did in the 2010s and 2020s. "Babysitting" is sometimes the energy wasted by repeatedly building and maintaining ad-hoc databases for ourselves instead of building tools for customers.
Buying data products also requires babysitting. DBAs babysit Oracle indexes. Developers babysit Postgres query optimizations. Data engineers babysit ETL pipelines. Analysts baybsit Redshift history. SREs babysit Aurora costs.
Endb can't solve all these problems, but it attempts to be a jack-of-all-trades database that solves as many as it can — for as many people as it can.
When is One Database possible?
After years of Computer Science research, it's also clear a sea change in database tech is due... right about now. (Give or take ten years. Our timing may be off.)
Hellerstein and Stonebraker's What Goes Around Comes Around remains true, decade after decade, since it was published in 2004. As always, the relational data model is still king and SQL is still the lingua franca of databases. Together, they assimilate new rivals every decade or so. Endatabas tries to stand right at the center of this upcoming collapse of the data toolchain.
If we, as an industry, can drop a decade's vestigial growth in favour of a tighter, simpler solution? Wonderful. But what if we could shed a half-century's vestiges? Perhaps event streams, relations, documents, graphs, temporal data, ETL and CDC can all live under one roof for many businesses.
Let's see. We're not sure if we can make this work. But it's exciting to try.
—
One Database: Clean. Simple. Less.
(Read more in Why Endatabas?)
Why Endatabas?
(Read more in What is Endatabas?)
Why did we build Endatabas (aka Endb) at all? Isn't one of the many (many) existing databases good enough?
Many incumbent databases serve their use cases and markets well. But the demands placed on databases are growing rapidly. These demands pull in multiple directions, all at once, and existing technology cannot support them without introducing enormous complexity. Metamorphosis is required.
Endb takes good ideas and makes them easier to access, while reducing operational headache. It does not try to be flashy or unnecessarily revolutionary. Instead, it tries to be simple and familiar on the surface while providing a lot of new power under the covers.
Let's talk about what that means in clear, concrete terms.
What is Endatabas, anyway?
The tagline "SQL Document Database With Full History" says a lot, but it doesn't say everything. Endatabas is, first and foremost, an immutable database. That's the Full History part. But storing all your data, forever, has clear implications.
We consider these implications to be the pillars of Endatabas. In 3D geometry, the legs of a tripod are mutually supportive; as long as all three feet are in contact with the ground, the tripod will not wobble or collapse. So it is with the pillars. Each supports and implies the others. The pillars are as follows:
- Full History (requires: immutable data and erasure)
- Timeline (requires: time-traveling queries)
- Separation of Storage from Compute (requires: light and adaptive indexing)
- Documents (requires: schemaless tables, "schema-per-row", arbitrary joins)
- Analytics (requires: columnar storage and access)
At the top of this five-dimensional structure is SQL, the lingua franca of database queries. We believe it is the right time to integrate these ideas. But first let's go back a few decades to see how we got here.
History
None of the ideas in Endatabas are new.
George Copeland's What if mass storage were free? asked, back in 1980, what an immutable database might look like. His prescient vision for a database with full history enjoys the clarity of a researcher at the beginning of the database era. People have occasionally asked of Endatabas, "why bother retaining all history?" But this is the wrong question. The real question is: "why bother destroying data?" Copeland's answers, "The deletion concept was invented to reuse expensive computer storage." The software industry has grown so accustomed to the arbitrary deletion of historical data that we now take destroying data for granted.
Mass storage is not free yet — but it is cheap. Copeland himself addresses "a more realistic argument: if the cost of mass storage were low enough, then deletion would become undesirable." Any system that exploits the separation of storage and compute can enjoy these low costs.
An immutable dataset and a timeline of changing states are two sides of the same coin. Previous states carry the innate property of time (whether defined by familiar wall clocks or versions or logical clocks). Jensen and Snodgrass have thoroughly researched time-related database queries. Much of their work was published in the 1990s and early 2000s. Storing time, querying across time, time as a value ... these challenging subjects eventually grew to form SQL:2011. Most SQL databases have struggled to implement SQL:2011. Incorporating time as a core concept in mutable databases (those which support destructive updates and deletes) amplifies existing complexity. Time should simplify the database, not complicate it.
Document databases have a more convoluted story. Attempts at "schemaless", semi-structured, document, and object databases stretch from Smalltalk in the 1980s to C++ in the 1990s to Java and graphs in the 2000s to JSON in the 2010s. Despite all this, the most successful semi-structured document store, as of 2023, is a Postgres database with JSON columns. Database users desire flexible storage and querying — but yesterday's weather says they desire SQL more. Can't we have both?
SQL has four identities, four histories. There is an SQL of academia, born of Codd's relational algebra (1970) and Chamberlin/Boyce SEQUEL (1974), grown over decades with research like Snodgrass/Jensen's TSQL2. Then there is the SQL of industry, the many-tentacled leviathan of IBM, Oracle, and Microsoft: the SQL sold to businesses and governments, ceaselessly bifurcated into new dialects with each version and implementation. Between these two rests the SQL of the ISO specification — unified across 11 published standards, from SQL-86 to SQL:2023, spanning thousands of pages, adhered to by no single database.1 Last, there is colloquial SQL, the language one refers to by the question, "do you know SQL?" These four threads are intertwined across four decades, making it very difficult to clearly define what is meant by "SQL", even in very narrow contexts. Colloquial SQL is of greatest interest to us. This is the ubiquitous query language any new database must implement to succeed.
Khoshafian and Copeland introduced the Decomposition Storage Model (DSM) in 1985. The four decades that followed saw any number of approaches to data analytics. Most of the time, however, these demanded expensive data acrobatics: data was piped, streamed, dumped, and copied into denormalized cubes and time-series databases. As humanity grew out of the batch processing of the 1980s into the always-online society of the 2020s, analytics data became another form of operational data and this pipeline was looped back to users and customers. Hybrid Transactional/Analytical Processing (HTAP) promises a simpler, natural successor to OLTP and OLAP systems. For many businesses, the transactional/analytical divide is as arbitrary as destroying data with every state change.
Humanity will arbitrarily destroy data in 2026 because hard disks were expensive in 1986. Similarly, we will wastefully query data with multiple systems in 2026 because CPUs were slow in 1986.
Why SQL?
When you approach Endb from a distance, you won't see the pillars at first. You'll see the structure they support. This structure also binds the pillars together. The query language of a database is its glue. It is the user interface. It defines the user experience. It is at once a programming language, a dialogue, an envelope, a protocol. But a query engine is not divorced from its query language, and so the language choice also informs system design and behaviour.
There are mundane reasons for choosing SQL. If asked "do you know SQL?" there's a high probability the answer is "yes." SQL is the language of data science and third-party tools. If any sufficiently successful database has to provide SQL access anyway, one may as well start there.
But this is not why we chose SQL. We believe SQL can be beautiful.
As mentioned in History, we are primarily concerned with colloquial SQL. The other threads of SQL are relevant but Endb will never implement the entire SQL specification. Instead, the Endb SQL dialect unifies the pillars under one strongly-dynamically-typed umbrella, a variation of the SQLite dialect. SQLite's applications are quite different from those of Endatabas, so Endb SQL is not weakly-typed and Endb types are not a free-for-all. Instead, Endb internally adopts types from Apache Arrow, which it uses as a storage medium.
When considering alternatives, there are no direct contenders. Cypher, Datalog, MongoDB query documents, and other schemaless query languages work well for one database implementation but lack both mindshare and standards. PartiQL, SQL++, and other NewSQL languages that depart from SQL suffer precisely because they are almost SQL. One can fantasize about designing a query language from scratch but it's very easy to get wrong and takes decades to build mindshare. SQL has been through this decades-long gauntlet.
Just as PL/SQL and T-SQL differ, so will Endb SQL from other dialects. However, colloquial SQL is comparable to colloquial Hindi — at higher levels, it bifurcates into Urdu and Sanskrit but speakers of both lineages understand one another. Endb SQL will be familiar to users of other SQL dialects.
With its long, rich history SQL not only has the necessary theoretical underpinnings but the battle scars of technology that lasts. It sits alongside POSIX, TCP/IP, LISP, C, and the QWERTY keyboard layout. It will see its centenary.
Why Full History?
Even if we ignore Copeland's dream of mass storage from 1980, it is easy to see why destroying data is harmful. To destroy data is to destroy facts — to lie about the truth of what happened.
Few modern systems permit the total destruction of data for this obvious reason.
Some choose to create audit tables: users_audits
, sales_audits
, and so on.
Some choose to log anything and everything.
"It's on disk somewhere."
It's not uncommon for companies to extract metrics from logs to create invoices and reports, turning a log file into a bespoke immutable database.
Industries which take their data very seriously (banking, healthcare) already store immutable records. They just do so in a mutable database. Append-only tables are not new, but they're an order of magnitude easier to work with — for both users and operators — if the database is append-only from the ground up.
These same industries will resist the destruction of data unless absolutely necessary, but they will bend to necessity.
Erasure is concomitant with immutability — we cannot have one without the other.
The existing designs of databases create serious problems for privacy.
DELETE
, when overloaded to mean both "save disk space" and "explicitly remove this data", becomes opaque.
It does not leave any queryable record of the deletion.
Removing data should keep tombstones so it's at least known that some data was removed.
Why a timeline?
Keeping your data's entire history is the write-side of the equation. If you didn't care about getting this data back, you could just dump it into an unintelligible pile. But you not only want your data back, you want to query it in the easiest way possible.
One very sensible way to see and query immutable data is along a timeline. Endb makes no assumptions about your desire to participate in this timeline. By default, everything is visible as-of-now but querying the past should feel effortless.
-- without time travel:
SELECT * FROM products;
-- time travel to 2020:
SELECT * FROM products FOR SYSTEM_TIME AS OF 2020-08-25T00:00:00;
Why Separation of Storage and Compute?
Separating storage from compute is an implementation detail. AWS customers don't choose Amazon Aurora because they're craving this separation. Decoupling storage from compute makes scale (both up and down) trivial. It also introduces the possibility of "reducing network traffic, ... fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage."
This decoupling is concomitant with Light and Adaptive Indexing. It is undesirable to manually construct expensive indexes for unknown future schemas over (effectively) infinite data. Instead, we should let machine learning handle this job.
Why documents?
It can be argued that "why documents?" is really multiple questions: why schemaless? why nested data? why dynamic SQL?
First, the challenges. It is extremely difficult to force global schema onto every row in a table in an immutable world. Even if there were a simple mechanism in SQL to alter table schema only for certain durations (there isn't), the database user would still be burdened with querying based on a particular schema at a particular time in the history of the table. This complexity is compounded by the fact that static schemas have less and less meaning in a temporal world. Endb introduces SQL:2011 time-travel and period predicates. The difficulty, mentioned earlier, that other databases encounter when introducing SQL:2011 is twofold: dealing with "time" in a world where history can be violently rewritten and managing an unbending schema across time.
Nested data is equally unnatural in incumbent databases. SQL:99, SQL:2016, SQL:2023 all offer some way of shoehorning nested data into flat tables. DynamoDB, Couchbase, and MongoDB are the most common document databases but they don't speak SQL.2 Not only will no one ever decompose relational data into 6NF, it is unlikely we'll ever see a return to the classic BCNF of business entity-relationship diagrams. Nested data is here to stay. Foreign, embedded JSON (or XML) is little more than a band-aid. Nested data should be native to SQL queries.
Second, the joys. Schema-per-row can be incredibly liberating. Not only does this feel more natural, it embraces the messy truth of the real world. Schema-on-write can be added later, when the business is ready to lock down what it knows about a domain. But many use cases demand flexible schemas. What if this week's project requires downloading huge amounts of semi-structured JSON pricing data from the APIs of competing online stores to compare them? Endb can handle this case out of the box. Most databases would require a great deal of manipulation first.
Dynamic SQL is required to support schemaless, nested data — but it also brings its own joys. When dynamic SQL rests on top of a flexible data model which can ingest any data, it is capable of exploring nested data and easily constructing arbitrary joins users could normally only construct in a graph database.
Why "One Database"?
It is often the job of analytics databases to record and query all the data of a business, denormalized for speed. There will always be analytical jobs which require data to be transformed. But many businesses already allow data scientists, analysts, and even CEOs read-only access to an OLTP replica.
HTAP is right on the horizon. Endb hopes to participate in that evolution by making all business data accessible to easy-to-produce, cheap, ephemeral read replicas.
Why Commercial Open Source?
We cannot reconcile building and selling a product we would never buy ourselves.
Why Now?
All of the pillars outlined above have their own strike-when-the-iron-is-hot moment. The SQLite we know today began to materialize in 2001. Amazon S3 was launched in 2006. MongoDB was first released in 2007. Immutable data and functional programming reached the mainstream in the 2010s. Datomic (arguably the first immutable OLTP database) was release in 2012. Amazon Aurora was released in 2015 and Google's AlloyDB in 2022. Apache Arrow saw its first release in 2016. Over the past decade, many financial firms built their own in-house temporal databases. SQL:2011, SQL:2016, and SQL:2023 were ratified in their respective eponymous years. HTAP hasn't quite happened yet. AI-driven indexes haven't quite happened yet.
The moment for something like Endatabas is now... but it is a very long moment. Endatabas cannot be built in a Postgres-compatible fashion. Technically, it would be prohibitively expensive to build Endatabas from existing components. It's time for something new.
The complete SQL specification is very difficult to implement in full, though Mimer likely comes closest to this goal.
There is XQuery, of course. But most businesses today do not build their nested data storage on XQuery.
Architecture
Immutable
All records in Endatabas are immutable.
Historical data is not lost when an UPDATE
or DELETE
statement is executed.
You can think of endb
as one giant log of transactions
with fast queries made possible by adaptive indexing.
Erasure
The only time data in Endatabas is truly deleted is with the ERASE
statement.
This is used for user safety and compliance with laws like GDPR.
Dynamic SQL
Both the heart and "UI" of Endatabas is a dynamic SQL engine which natively understands strongly-typed, semi-structured rows (documents). Here is an example to illustrate what that means:
INSERT INTO stores {brand: "Alonzo's Analog Synthesizers",
addresses: [{city: "New Jersey",
country: "United States",
opened: 1929-09-01},
{city: "Göttingen",
country: "Germany",
opened: 1928-09-01}]};
-- recursive query of ISO dates from the nested document:
SELECT addresses..opened FROM stores;
Endb SQL bases its core SQL dialect on SQLite. It also draws inspiration from SQL:99, SQL:2011, SQL:2016, and SQL:2023. Read more in our bibliography.
Columnar: Hybrid Transactional Analytic Processing (HTAP)
Endatabas stores and processes data as columns. Endb's columnar storage is built on Apache Arrow. Endb SQL allows users to retrieve data as documents. The ultimate goal is for Endatabas to serve many (hybrid) purposes: day-to-day transactions and analytical jobs.
Query Execution
Queries are compiled to Common Lisp and use hash joins. Join ordering is done heuristically, by base table size. Correlated queries are memoized by building indexes in-memory during query execution.
Future
More detailed information about the future of Endb can be found in the roadmap.
Cloud: Separation of Storage from Compute
If you're not sure what this is, think Amazon Aurora, Google AlloyDB, and Neon. Compute nodes are the classic database (Postgres, MongoDB, etc.) — in an immutable world, these are just caches. Storage is elastic object or blob storage (S3, Azure Blobs, etc.).
Endatabas is built to live in the clouds, alongside infinite disk.
Although the groundwork for separating storage from compute exists in Endb today, elastic storage backends are not yet implemented.
Adaptive Indexing
For more information on light and adaptive indexing, you can watch Håkan's talk from 2022: "Light and Adaptive Indexing for Immutable Databases"
Endb does not yet support adaptive indexing.
Roadmap
You can read our high-level roadmap on GitHub.
License
All Endatabas code is Copyright Håkan Råberg and Steven Deobald and licensed GNU Affero General Public License v3.0, unless otherwise specified.
To contribute to Endatabas, please refer to
CONTRIBUTING
.