Introduction

Welcome to Endatabas!

Endatabas is a SQL document database with complete history. It will store anything, forever.

The best way to understand Endatabas is to use it. Head on over to the Quickstart to try it out.

If you're not ready to run your first query yet, there's plenty of explanatory material in our What? and Why? documents.

Elsewhere

If you came to the Endatabas book directly, you may also be interested in:

Tutorial

This quick tutorial intends to walk you through the core aspects of Endb. By the end, you will have endb installed (if you want) and you will have used it to learn some of what it can do.

While in beta, new versions are not guaranteed to have binary-compatible storage with previous versions. We encourage you to experiment with Endb but please do not use it in production until a General Availability release.

Quickstart

Install Endb


mkdir -p endb_data
docker pull endatabas/endb
docker run --rm -p 3803:3803 -v endb_data:/app/endb_data docker.io/endatabas/endb

See Installation for more ways to install and build endb.

See Try It! for methods of connecting to Endb and running queries.

Try Online

If you do not want to install a full Endb instance, you can try the Live Demo instead.

The Wasm Console runs Endb directly in your web browser, so you can ignore the steps in Try It! and jump straight to Endb SQL Basics.

NOTE: The Wasm Console is not a full Endb instance. It does not provide APIs and has other performance and infrastructure limitations.

Try It!

You can send SQL statements to endb over HTTP or WebSockets. Any HTTP or WebSocket client will do, and Endb ships with a console and client libraries.

curl

Our first couple examples will use curl, which you probably already have installed, to send queries directly to the API.


curl -d "INSERT INTO users (name) VALUES ('Tianyu')" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
curl -d "SELECT * FROM users" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql

You don't need to use the API diretly if there is a client library available for your language of choice, but it is helpful to know that the underlying API is human-readable and based on open standards. Read more in the full HTTP API docs.

Console

Endb provides a small console for writing Endb SQL directly:


pip install endb_console
endb_console # connects to localhost by default

Assuming you inserted a user with curl above, you can query that table directly:


-> SELECT * FROM users;

Client Libraries

Endb ships with permissively-licensed (MIT) client libraries:


pip install endb
npm install @endatabas/endb

You can copy and modify their source code for any purpose. Read more about how to use them in the client libraries docs.

Learning Endb SQL

You can use any of these tools (or any other HTTP client you prefer) for the rest of this tutorial.

Endb SQL Basics

If you know SQL, Endb SQL will feel instantly familiar. It is not "SQL-like". It is SQL. However, Endb SQL is dynamic, strongly-typed, time-aware, and shuns language embedding. Hopefully it is pleasant to use without feeling foreign.

Just Begin

Endb is a schemaless document database. You do not need CREATE TABLE — tables are dynamically created when you insert data. The following SQL is valid as soon as you start endb:


INSERT INTO posts (id, user_id, text) VALUES (123, 456, 'Hello World');

Try querying it out:


SELECT * FROM posts;

Immutable

Endb is immutable, so it does not permit destructive UPDATE or DELETE. For example, if you run an UPDATE, your previous INSERT isn't lost.

Before we update the record, we'll make note of the current time, according to the database. (Any time after the INSERT and before the UPDATE would suffice.)


-- make note of this time to use below
SELECT CURRENT_TIMESTAMP;

Multiple statements can be separated by semicolons. This time, we'll update the record and view it at once:


UPDATE posts SET text = 'Hello Immutable World' WHERE id = 123;

SELECT * from posts;

You'll note that Hello World from your original insert isn't visible. That's because it only exists in the past and, by default, SELECT will show the state of the database as of now.

To see the old version, you can time-travel back to a time when the old record was visible. Copy the timestamp you noted, without the quotes, something like SELECT * from posts FOR SYSTEM_TIME AS OF 2024-01-01T00:00:00.000000Z;:


SELECT * from posts FOR SYSTEM_TIME AS OF {YOUR_NOTED_TIMESTAMP};

NOTE: Although there is no destructive DELETE in an immutable database, there is ERASE, which exists to remove data for user safety and compliance with laws like GDPR.

Dynamic Joins

Relationships are also dynamic. You can join any two tables on any two columns. Adding a user with id 456 allows a join with the previous posts table.


INSERT INTO users (id, name) VALUES (456, 'Vikram');

SELECT * from posts p JOIN users u ON p.user_id = u.id;

Semi-Structured Data

Endb allows you to insert asymmetrical, jagged data. Let's add another user with more columns.


INSERT INTO users (id, name, email) VALUES (789, 'Daniela', 'daniela@endatabas.com');

SELECT * from users;

Note that the SELECT * is an implicitly dynamic query. It doesn't have any difficulty with the previous user document, even though it lacked an email column. In practice, most applications and SQL queries should specify exactly the columns they want to query. SELECT * is really only for exploratory queries, so it shows you everything visible in the table.

Data "Migration"

It may seem strange to leave jagged columns lying around. Endb doesn't discourage you from cleaning up your data, if you can:


UPDATE users SET email = 'vikram@stockholm.se' WHERE name = 'Vikram';

SELECT * from users;

The difference in Endb is that we haven't "migrated" the old data — it's still there. If you query for Vikram's user document as of 2 minutes ago, you will see the old record without an email. Queries in Endb always default to as-of-now, which is why the results of the query above shouldn't be surprising.

Nested Data

Endb eschews JSONB columns in favour of a native, strongly-typed, document-relational model.


INSERT INTO users (id, name, friends)
VALUES (123, 'Anastasia', [{name: 'Heikki', country: 'Finland'},
                           {name: 'Amit', country: 'Japan'}]);

SELECT users.friends[1] FROM users WHERE id = 123;

The users.friends[1] expression above is a path expression. A detailed explanation of Endb's path navigation is provided in the SQL Reference Path Navigation docs.

Documents

Because of Endb's native document-relational model, rows are documents and vice-versa. You can use an INSERT statement to add a document directly to the database:


INSERT INTO users {id: 890,
                   name: 'Aaron',
                   friends: [{name: 'Jeff', country: 'Canada'},
                             {name: 'Kaia', country: 'Japan'}]};

Error Messages

Endb will always do its best to provide you with meaningful error messages that point you to a solution:


SELECT * FROM im_not_here;

Learn More

Much more detail on Endb SQL is provided in the SQL Reference.

SQL Reference

The SQL Reference contains details and edge cases about Endb SQL.

NOTE: The Endb SQL dialect is under active development. While most major features of Endb SQL are stable, some more esoteric features (MATCH, path navigation, etc.) may experience naming and semantics changes before Endb 1.0 is released.

Intention

The goal of Endb's SQL dialect is to be small, coherent, and powerful. The SQL specification is massive, with pages ordering in the thousands. Rather than implement the entire SQL specification from scratch (a gargantuan task) or mimic the SQL dialect of another database, Endb chooses a tiny core and builds powerful, composable features on top of that.

This tiny core draws inspiration from many sources, but SQLite in particular. If SQLite supports a particular operator or function, Endb SQL also tries to.

Endb SQL also draws strong inspiration from the SQL specification itself (and its predecessors) and from PostgreSQL. Endb SQL's nested data is also heavily inspired by JSONPath, SQL/JSON, and their derivatives found in major SQL databases.

Light inspiration is drawn from PartiQL, SQL++, and XQuery.

For more information on Endb's influences, please see our bibliography.

Data Manipulation

Creating, updating, and deleting data in Endb is done using standard SQL Data Manipulation Language (DML). Endb is also immutable and schemaless, so it contains a number of shorthands and document-oriented conveniences.

Endb does not require any Data Definition Language (DDL), such as CREATE TABLE.

INSERT

To create a new document, you can use the standard SQL INSERT command.


INSERT INTO products (product_no, name, price) VALUES (1, 'Tofu', 7.99);

To create multiple new documents at once, delimit their value lists with commas:


INSERT INTO products (product_no, name, price) VALUES (1, 'Butter', 5.99), (2, 'Margarine', 4.99);

It is also possible to insert a document directly using an OBJECT literal.


INSERT INTO products {product_no: 3, name: 'Tea', price: 3.99};

To insert multiple documents directly, delimit documents with commas:


INSERT INTO products {name: 'Coffee', price: 3.99}, {name: 'Croissant', price: 2.99};

It is possible to insert the results of a query:


INSERT INTO cheap_products SELECT * FROM products WHERE price < 4.00;

UPDATE

To update an existing row, you can use the standard SQL UPDATE command:


UPDATE products SET price = 4.99 WHERE name = 'Coffee';

Set multiple columns by separating them with commads;


UPDATE products SET price = 4.99, name = 'Kaapi' WHERE name = 'Coffee';

Because Endb is schemaless, each document (or row) has its own schema. As a result, you may want to remove a column from an individual row. You can do this with the UNSET operator:


UPDATE products UNSET product_no WHERE name = 'Coffee';
UPDATE products REMOVE product_no WHERE name = 'Coffee';

REMOVE is an alias for UNSET.

It is possible to set and unset values in a single update. Unsetting a column which doesn't exist is not an error:


UPDATE products SET price = 5.98 UNSET product_no WHERE name = 'Coffee';

UPDATE PATCH

Endb provides a PATCH operator, similar to the PATCH function. The PATCH operator is used in conjunction with UPDATE to set fields on a document (columns on a row) in a declarative fashion.


UPDATE products PATCH {price: 1.98, product_no: products.product_no + 1000} WHERE price = 2.00;

PATCH is based on RFC 7386: JSON Merge Patch.

UPDATE SET $path

The SET operator permits paths on its left-hand side. The behaviour of the form UPDATE <table> SET <path> = <value> is identical to that of the path_set function.


UPDATE users SET $.addresses[0].city = 'Chicago' WHERE name = 'Steven';

UPDATE UNSET $path

The UNSET (synonym: REMOVE) operator permits paths on its left-hand side. The behaviour of the form UPDATE <table> UNSET <path> = <value> is identical to that of the path_remove function.


UPDATE users UNSET $.addresses[0].city WHERE name = 'Steven';

DELETE

To delete an existing row, use the standard SQL DELETE command.


DELETE FROM products WHERE price = 5.98;

You may delete all rows from a table by eliding the WHERE clause:


DELETE FROM products;

Note: In Endb, DELETE does not remove any data. It is always possible to view data prior to the DELETE with time queries. If you need to remove data for compliance (with laws such as GDPR or PIPEDA), use ERASE.

ON CONFLICT (Upsert)

Endb provides flexible upserts with the common ON CONFLICT clause. When the INSERT command detects a conflict, it will perform the instructions in the DO clause. The following command needs to be executed twice to see the upsert effect.


INSERT INTO products {name: 'Pepper', price: 9.99} ON CONFLICT (name, price) DO UPDATE SET v = 2;

To specify no operation on conflict, use DO NOTHING:


INSERT INTO products {name: 'Pepper', price: 9.99} ON CONFLICT (name, price) DO NOTHING;

To reference the document currently being inserted, the DO clause provides a statement-local relation named excluded.


INSERT INTO products {name: 'Salt', price: 6};
INSERT INTO products {name: 'Salt', price: 7} ON CONFLICT (name) DO UPDATE SET price = excluded.price;

Similarly, the existing table is still available in the DO clause to provide further filtering:


INSERT INTO products {product_no: 99, name: 'Cumin', price: 3.00, v: 5};
INSERT INTO products {product_no: 99, name: 'Cumin', price: 5.00, v: 6} ON CONFLICT (product_no, name) DO UPDATE SET price = excluded.price, v = excluded.v WHERE products.v < 6;

ERASE

ERASE completely removes documents (rows) from visibility to any queries. Once a document has been erased, it is no longer possible to query for it at all.


ERASE FROM products WHERE name = 'Salt';

WITH (Common Table Expressions)

The WITH keyword can create Common Table Expressions (CTEs) for DML, just as it can for queries.


WITH top_margin_products AS (SELECT product_no, name FROM products WHERE (price - cost) > 5.00)
INSERT INTO banner_products SELECT product_no, name FROM top_margin_products;

Parameters

Parameters to DML are documented under the HTTP API.

Transactions

Transactions in Endb are implicit. Run multiple DML statements in a single transaction by providing multiple statements (delimited by semicolons) to a single POST to the HTTP API.

Queries

Endb SQL strives to empower the query author. Basic SQL queries should be familiar. Advanced SQL queries should be simple.

SELECT *

The most common hand-written SQL query tends to be the easiest:


SELECT * FROM products;

Without a WHERE clause (discussed below) the entire table is returned. Because Endb is schemaless, * has a special meaning. Each document (row) in Endb carries with it its own schema. As a result, * refers to the widest possible set of columns, across all the rows returned.

In many SQL dialects, columns are strictly ordered. Because Endb columns cannot have a strict order, * returns them in alphabetical order.

You can select the widest set of columns for a specific table with <table>.*:


SELECT p.* FROM products p JOIN coupons c ON p.price = c.price;

SELECT

In programmatic environments, it is almost always preferable to query for specific columns by name:


SELECT product_no, price FROM products;

Select a list of columns by delimiting them with commas.


SELECT product_no, v, price, name FROM products;

Select a column with spaces by delimiting it with backticks or square braces:


SELECT products['product no'] FROM products;
SELECT `product no` FROM products;

NOTE: Whether or not your Endb client respects column ordering is dependent on the content type it uses in Accept Headers. It is worth reading over the Accept Header documentation, in this regard.

Limit query results to distinct rows or expand to all rows with DISTINCT and ALL.


SELECT DISTINCT name FROM products;
SELECT ALL product_no FROM products p JOIN sales s ON p.name = s.name;

FROM

Alias Tables

For convenience, tables can be given aliases immediately following their name in the FROM clause.


SELECT p.name FROM products p;

The AS operator can also (optionally) be used to provide an alias for a table.


SELECT p.name FROM products AS p;

More usefully, it can give a temporary table name to an expression. The temporary table name can either have anonymous columns or named columns. (The VALUES keyword is explained in the following VALUES Lists section.)


SELECT p.column1 FROM (VALUES ('Paprika', 4.77)) AS p;
SELECT p.price FROM (VALUES ('Paprika', 4.77)) AS p(name, price);

Alias Columns

The AS keyword is also used to alias columns. This is useful when column names conflict in a join. (Joins are explained below.) If the same column is specified more than once, the last reference to that column name is the one which will be returned:


SELECT c.price, p.price FROM products p JOIN coupons c ON p.name = c.name;

If both columns are required, AS can be used to rename one or both of the columns:


SELECT p.price AS regular_price, c.price FROM products p JOIN coupons c ON p.name = c.name;

JOIN

Because Endb is schemaless, documents (rows) can be joined on any fields (columns) which have equivalent values. Joins are written in the form <table1> JOIN <table2> ON <table1>.<column1> <operator> <table2>.<column2>. Join operators are generally comparisons or booleans.


INSERT INTO coupons {name: 'Salt', price: 3.0};
SELECT * FROM products p JOIN coupons c ON p.name = c.name;

LEFT JOIN, LEFT OUTER JOIN, INNER JOIN, and CROSS JOIN are all supported.

NOTE: Endb does not restrict name clashes in unqualified column selection or SELECT *. If two tables share the same column name, the results from the column of the second table in the join will be visible in the returned result, but not the results of the first.

USING

When the columns to join share the same name between two tables, USING is a shorthand that permits joining on the equality of those two columns. A USING query also supports unambiguous use of the unqualified column name in the SELECT clause.


SELECT project_name, users.name, projects.budget FROM users JOIN projects USING (project_name);

WHERE (Filtering)

Rather than returning the entire table, documents (rows) can be filtered with a WHERE clause.


SELECT * FROM products WHERE price > 4;

Advanced Filtering

More advanced filters are documented in Operators and Functions.

ORDER BY (Sorting Results)

Results from queries can be ordered with standard SQL ORDER BY.


SELECT * FROM products ORDER BY price;

By default, ordering is ascending. For descending order, suffix the ORDER BY clause with DESC:


SELECT * FROM products ORDER BY price DESC;

To force ascending order, use ASC:


SELECT * FROM products ORDER BY price ASC;

It is also possible to order by an expression:


SELECT * FROM products ORDER BY LENGTH(name);
SELECT * FROM products ORDER BY -price;

In the example above, LENGTH is an example of a function. A complete list of functions can be found in the Functions documentation.

GROUP BY

GROUP BY accepts a list of columns and creates aggregated rows based on each of those columns, in order. Each aggregate is returned as a single row. Each column returned must either be a column specified in the GROUP BY clause or a column created with an aggregate function, such as SUM.


SELECT name, price FROM products GROUP BY name, price;
SELECT name, SUM(price) FROM products GROUP BY name;

HAVING

HAVING adds a search condition to an aggregate query.


SELECT name, SUM(price) FROM products GROUP BY name HAVING LENGTH(name) > 4;

It is most often used with GROUP BY (seen above), but it is also legal with other aggregates:


SELECT SUM(products.price) FROM products HAVING SUM(products.price) = 13;

LIMIT

LIMIT specifies the maximum number of rows to be returned by the query.


SELECT * FROM products LIMIT 2;

It always makes sense to control the order of returned rows so LIMIT always returns the same rows for the same query -- unless you don't care which rows are returned.


SELECT * FROM products ORDER BY price ASC LIMIT 2;

OFFSET allows queries to skip rows before returning a limited set.


SELECT * FROM products ORDER BY price ASC LIMIT 2 OFFSET 2;

VALUES Lists

The VALUES keyword is used to create a static table of documents (rows). Each row is denoted by a pair of parentheses. All rows must have Union Compatibility which, for Endb, means they have the same number of columns.


VALUES (1, 'Salt'), (2, 'Pepper'), (3, 'Vinegar');

Endb assigns anonymous columns the names column1, column2, etc. Columns can instead be given names with a table alias:


SELECT * FROM (VALUES (1, 'Salt'), (2, 'Pepper'), (3, 'Vinegar')) AS t (product_no, name);

OBJECTS Lists

The OBJECTS keyword is used to create a static table comprised of object literals, each representing a document (row). Each row is directly denoted by an object literal. OBJECTS lists do not require Union Compatibility, so jagged lists are permitted.


OBJECTS {product_no: 1, name: 'Salt'}, {product_no: 2, name: 'Pepper', price: 3.99};
SELECT * FROM (OBJECTS {product_no: 1, name: 'Salt'}, {product_no: 2, name: 'Pepper'}) as t;

Set Operations: UNION, INTERSECT, EXCEPT

The set operations union, intersection, and difference are available to the results of two queries.

UNION

Append the results of one query to another. Duplicate rows are removed.


SELECT * FROM products UNION SELECT * FROM new_products;

To keep duplicate rows, use UNION ALL:


SELECT * FROM products UNION ALL SELECT * FROM new_products;

INTERSECT

The intersection of two queries returns results which are found in both.


SELECT * FROM products INTERSECT SELECT * FROM new_products;

EXCEPT

The difference of two queries returns only results from the first query which are not found in the second. Another way of thinking about this is that results of the second query are removed from the first.


SELECT * FROM products EXCEPT SELECT * FROM ignored_products;

Union-Compatibility

"Union Compatibility" refers to the ability of two queries to be used in a union, intersection, or difference. Because Endb is dynamically-typed, the only constraint on union compatibility is the number of columns returned.

In general, it only makes sense to use set operations on two queries which return either: (1) explicit columns, so order and naming are respected or (2) columns with the same names, so they are guaranteed to return in order, in the case of * queries. When applying set operations to * queries, keep in mind that the widest column set (across the entire history of the table) will be returned.

If the queries return a different number of columns, set operations will result in an error:


-> SELECT * FROM products UNION SELECT * FROM new_products;
400 Bad Request
Number of UNION left columns: 3 does not match right columns: 2

WITH (Common Table Expressions)

The WITH keyword is used to create Common Table Expressions, or CTEs. CTEs act like temporary tables or views within the context of a query or DML statement. CTEs are used in place of a sub-select to simplify the appearance of a query. WITH clauses take the form WITH <cte-name> AS (<cte-select>).


WITH top_margin_products AS (SELECT product_no FROM products WHERE (price - cost) > 5.00)
SELECT name FROM products
WHERE product_no IN (SELECT product_no FROM top_margin_products);

WITH RECURSIVE

The RECURSIVE keyword can be added to WITH to create recursive CTEs which can refer to themselves. Recursive CTEs will always have the form <initial-select> UNION <recursive-select> or <initial-select> UNION ALL <recursive-select>.

Here is a naive example, demonstrating the recursive construction of a Fibonacci Sequence:


WITH RECURSIVE fib(previous, current) AS (
  VALUES (0, 1)
    UNION ALL
  SELECT fib.current, fib.previous + fib.current
  FROM fib
  WHERE fib.previous + fib.current < 5000
)
SELECT * FROM fib;

The most beneficial uses for WITH RECURSIVE are walking hierarchical and graph-shaped data sets -- capabilities ordinary SQL lacks. However, Endb recursive queries are also capable of solving Sudoku puzzles and constructing fractals, as seen in the test suite. (Credit goes to SQLite's delightful Outlandish Recursive Query Examples.)

LATERAL

Lateral queries permit access to named values from the FROM clause in the SELECT clause. The LATERAL keyword is optional in Endb as all queries are lateral queries by default.


SELECT halved FROM products, LATERAL (SELECT price / 2.0 AS halved) h;
SELECT halved FROM products, (SELECT price / 2.0 AS halved) h;

Repeatable Reads: SAVEPOINT, ROLLBACK, RELEASE

Repeatable reads are achieved in Endb by creating savepoints, which queries can return to later. When returning to a savepoint in the future, queries will return results according to the state of the database when the savepoint was captured.

By default, savepoints have a 60-second lifespan, after which they expire. Use of a savepoint during its lifespan will renew its lease, allowing it to be used for another 60 seconds. Savepoints can be forcibly expired with RELEASE. Attempting to use a savepoint that has been expired or released will return No active savepoint: <savepoint>.

NOTE: Repeatable reads are only available to the HTTP API.

Minimal Example

Other SQL dialects have more transaction-specific semantics for SAVEPOINT, ROLLBACK, and RELEASE. Because this difference in behaviour may be confusing to users familiar with other databases, we provide the example below. The result of the final query returns the first two dungeons, which were inserted prior to the savepoint, but not "Tower of Hera".


INSERT INTO dungeons {name: 'Eastern Palace'};
INSERT INTO dungeons {name: 'Desert Palace'};
SAVEPOINT desert_palace;
INSERT INTO dungeons {name: 'Tower of Hera'};
ROLLBACK TO desert_palace; SELECT * FROM dungeons;

NOTE: The ROLLBACK and SELECT above must be executed together, in the same stateless "transaction". (Normally this will mean executing both statements in a single HTTP request.)

SAVEPOINT

The SAVEPOINT operator captures a new savepoint at the time it executes. Savepoints can be named or anonymous. Anonymous savepoints are named with a UUID. The name of the savepoint is returned as an attribute named result.


SAVEPOINT desert_palace;
-- [{'result': 'desert_palace'}]
SAVEPOINT;
-- [{'result': '0b12de43-1c92-4d92-ab7c-51c5a5129074'}]

ROLLBACK

Inside the scope of a stateless "transaction" (normally a single HTTP request), ROLLBACK is used to return to a savepoint. ROLLBACK TO <savepoint> returns to a named savepoint (by name) or an anonymous savepoint (by string UUID). Queries executed inside such a transaction return results according to the state of the database when the savepoint was created. The use of ROLLBACK renews the lease of the associated savepoint.


ROLLBACK TO desert_palace; SELECT * FROM dungeons;
ROLLBACK TO 'eab07765-de6f-4f74-8052-838dd29ee8e7'; SELECT * FROM dungeons;

RELEASE

The RELEASE keyword expires a savepoint so it can no longer be used. Anonymous savepoints can be released by string UUID.


RELEASE desert_palace;
RELEASE 'f7c314dd-47b9-4c85-9502-b8e35c82b935';

Transactions

Standard SQL keywords are used to begin, commit, and rollback explicit transactions.

NOTE: Explicit transactions are only available to the WebSocket API.

Minimal Example

This minimal example assumes a connection via the WebSocket API.


BEGIN;
INSERT INTO dungeons {name: 'Dark Palace'};
INSERT INTO dungeons {name: 'Swamp Palace'};
INSERT INTO dungeons {name: 'Skull Woods'};
ROLLBACK;
SELECT * FROM dungeons;

BEGIN [TRANSACTION]

The BEGIN keyword starts a new transaction.


BEGIN TRANSACTION;

COMMIT [TRANSACTION]

The COMMIT keyword commits the open transaction to disk. END is a synonym for COMMIT.


COMMIT TRANSACTION;

ROLLBACK [TRANSACTION]

The ROLLBACK keyword throws away any uncommitted changes since the transaction began.


ROLLBACK TRANSACTION;

NOTE: The ROLLBACK keyword should not be confused with the ROLLBACK keyword for savepoints. Both can be called without mentioning a savepoint or transaction, but their behaviour differs based on whether the client is in an explicit transaction or not.

SQL Data Types

NULL

Null serves a variety of purposes in Endb.

Explicit: You can provide an explicit NULL value
"Unknown": As with any SQL, you will receive a null when 3-Valued Logic cannot determine if a statement is true or false
"Missing": Jagged rows will return NULL for columns projected for a document which does not contain them

TEXT (CHAR, VARCHAR)

Endb accepts unbounded, variable-length strings with either single or double quotes. CHAR and VARCHAR are synonyms for TEXT.


INSERT INTO users (name, title) VALUES ('River', "Prodigy");

When casting using the CHAR synonym, an optional integer argument indicates the desired length of the resulting string, padded with spaces:


SELECT CAST(123 AS CHAR(10));
-- [{'column1': '123       '}]

BOOLEAN

Boolean values can be TRUE, FALSE, or NULL.

INTEGER (BIGINT)

64-bit integer capable of auto-promotion to 128-bit integer.

REAL (DOUBLE)

A 64-bit IEEE 754 floating point number, or double.

TIMESTAMP

Timestamps can be represented as either SQL timestamps, according to the SQL Specification, or ISO timestamps. The following are legal timestamp literals:

2007-01-01T00:00:00
2007-01-01T00:00:00.123Z
2007-01-01T00:00:00.000000Z
TIMESTAMP '2007-01-01 00:00:00'
TIMESTAMP '2007-01-01 00:00:00.000000Z'

DATE

Dates can be represented as either SQL dates, according to the SQL Specification, or ISO dates. The following are legal date literals:

2007-01-01
DATE '2007-01-01'

TIME

Times can be represented as either SQL times, according to the SQL Specification, or ISO times. The following are legal time literals:

23:59:12
23:59:12.12345
TIME '23:59:12'
TIME '23:59:12.12345'

PERIOD

A Period is a window of time between two date/time types. Periods are used with Period Predicates.

{start: 2001-01-01, end: 2001-04-01}
[2001-04-01T00:00:00Z, 2001-05-01]

It is also possible to create a Period with the PERIOD constructor function.

INTERVAL (DURATION)

An interval (or duration) is created whenever two times are subtracted.


SELECT 2001-01-02 - 2001-01-01;

Interval literals can be constructed with ISO 8601 syntax:

PT12H30M5S
P1Y2M10DT2H30M

This is equivalent to the same ISO 8601 syntax provided as a string to the DURATION constructor:

DURATION('PT12H30M5S')
DURATION('P1Y2M10DT2H30M')

Interval literals can also be constructed with the classic SQL intervals DSL:

INTERVAL '1-2' YEAR TO MONTH
INTERVAL '0 12:34:56.789' DAY TO SECOND

BLOB (VARBINARY)

Binary Large Objects can be encoded as hexidecimal literals or cast from strings.

x'DEADBEEF'
CAST("hello" AS BLOB)

ARRAY

Endb SQL uses zero-based arrays. Arrays can be created with array literals similar to JSON.

["one", "two", "three"]

Alternatively, arrays can also be created using a literal syntax similar to that in the SQL Specification or a constructor function.

ARRAY ["one", "two", "three"]
ARRAY("one", "two", "three")

Array literals can contain the spread operator


SELECT [1, 2, ...[3, 4], 5];
-- [{'column1': [1, 2, 3, 4, 5]}]

Array equality is tested lexicographically.

OBJECT

Objects (which can also be thought of as documents, or rows) can be created with object literals enclosed in curly braces, similar to JSON. Keys in object literals can be quoted or unquoted.

{name: "Hanna", birthday: 1982-12-31}
{'name': "Hanna", 'birthday': 1982-12-31}

Alternatively, objects can be created using either an OBJECT constructor keyword, similar to that in the SQL Specification.

OBJECT(name: 'Hanna', birthday: 1982-12-31)

Object literals can contain spreads, computed fields, shorthands, and row literals.


SELECT { a: 1, ...[2, 3] };
-- [{'column1': {'0': 2, '1': 3, 'a': 1}}]
SELECT { foo: 2, ['foo' || 2]: 5 };
-- [{'column1': {'foo': 2, 'foo2': 5}}]
SELECT {p.name, c.discounted} FROM products p JOIN coupons c ON p.name = c.name;
-- [{'column1': {'discounted': 2.99, 'name': 'Salt'}}]
SELECT {product: {p.*}, discounted: c.discounted} FROM products p JOIN coupons c ON p.name = c.name;
-- [{'column1': {'discounted': 2.99, 'product': {'name': 'Salt', 'price': 5.99}}}]

Object equality is tested by comparing each key-value pair as an array.

Dynamic Literals

Row Literals

It is possible return an entire document (row) as a single literal value. The syntax is akin to Postgres ROW literals. Unlike table.*, which pads non-existent columns with NULL, a row literal returns exactly the schema specified for each individual row.

{ table.* }

Example usage:


SELECT { products.* } FROM products;

As a shorthand, a table's name may be used in the SELECT clause to return entire rows as documents:


-> SELECT users FROM users;
-- [{'users': {'email': 'patrick@oracle.com', 'name': 'Patrick'}},
--  {'users': {'email': 'preethi@shopify.ca', 'name': 'Preethi'}}]

NOTE: When a table contains a column of the same name, the column takes precedence and a single column is returned, as usual:


-> SELECT status FROM status;
-- [{'status': 'error'}, {'status': 'ok'}]

Spread

The Spread Operator (..., sometimes known as "splat") can be used to directly flatten/unnest one collection (an array, object, or row literal) into another. Strings are treated as character collections.


SELECT [1, 2, ...[3, 4], 5];
-- [{'column1': [1, 2, 3, 4, 5]}]

SELECT [1, 2, ..."foo", 5];
-- [{'column1': [1, 2, 'f', 'o', 'o', 5]}]

If an array is spread into an object, its ordinals will be used as properties:


SELECT { a: 1, ...{b: 2} };
-- [{'column1': {'a': 1, 'b': 2}}]

SELECT { a: 1, ...[2, 3] };
-- [{'column1': {'0': 2, '1': 3, 'a': 1}}]

Computed Fields

In the key/property position, square brackets are used to construct computed fields in object literals. Computed fields are implicitly cast to string.


SELECT { foo: 2, [2 + 2]: 5 };
-- [{'column1': {'4': 5, 'foo': 2}}]
SELECT { foo: 2, ['foo' || 2]: 5 };
-- [{'column1': {'foo': 2, 'foo2': 5}}]

Shorthands

Column names can be referred to in place of key-value pairs in object literals.


SELECT {p.name, c.discounted} FROM products p JOIN coupons c ON p.name = c.name;
-- [{'column1': {'discounted': 2.99, 'name': 'Salt'}}]

Note on timezones

Endb date/time data types currently only support times encoded as UTC.

Note on type widening

Operations performed on scalars will attempt to widen those scalars for the purpose of the operation, if reasonable.


-> select 2.0 = 2;
[{'column1': True}]

This widening includes joins on scalars, but not collections (ARRAY or OBJECT):


-> INSERT INTO zig {at: 2023-12-21, val: 2}
[{'result': 1}]
-> INSERT INTO zag {at: 2023-12-21T00:00:00, val: 2.0}
[{'result': 1}]
-> SELECT * FROM zig i JOIN zag a ON i.at = a.at;
[{'at': datetime.datetime(2023, 12, 21, 0, 0, tzinfo=datetime.timezone.utc),
  'val': 2.0}]
-> SELECT * FROM zig i JOIN zag a ON i.val = a.val;
[{'at': datetime.datetime(2023, 12, 21, 0, 0, tzinfo=datetime.timezone.utc),
  'val': 2.0}]

Operators

Comparison

Two values can be compared using standard SQL comparison operators:

=, == (equals)
> (greater than)
< (less than)
>= (greater than or equal to)
<= (less than or equal to)
<>, != (not equal to)


SELECT * FROM products WHERE NOT name = 'Coffee';
SELECT * FROM products WHERE name = 'Coffee' AND name <> 'Kaapi';
SELECT * FROM products WHERE name > 'Cake' AND price >= 5.00;

BETWEEN

BETWEEN returns TRUE when a value is greater-than-or-equal-to the first limit and less-than-or-equal-to the second. It has the form BETWEEN x AND y. It can be negated with the form NOT BETWEEN x AND y.


SELECT * FROM products WHERE price BETWEEN 2.00 AND 4.00;
SELECT * FROM products WHERE price NOT BETWEEN 2.00 AND 4.00;

NOTE: BETWEEN can also be used with System Time.

Boolean Operators

WHERE and HAVING clauses can be modified and combined with standard SQL boolean operators.

IS, IS NOT

IS and IS NOT behave like = (==) and <> (!=), respectively. They are usually used to augment equality checks to test for NULL, which is the third boolean value, representing "unknown". The literal UNKNOWN is permitted in IS / IS NOT expressions in place of NULL.

When both sides of IS evaluate to NULL it returns TRUE.
When only one side of IS NOT evaluates to NULL it returns TRUE,
When only one side of IS evaluates to NUll it returns FALSE.
When both sides of IS NOT evaluates to NULL it returns FALSE.


SELECT * FROM products WHERE product_no IS NULL;
SELECT * FROM products WHERE product_no IS UNKNOWN;
SELECT * FROM products WHERE product_no IS NOT NULL;
SELECT * FROM products WHERE product_no IS 386;
SELECT * FROM products WHERE product_no IS NOT 444;

NOTE: A WHERE clause of the form <column> IS NULL will not return rows for which <column> does not exist, as positive equality is only tested against extant columns. For example, the query SELECT * FROM products WHERE name IS NULL; will not return rows for which the column name does not exist. However, SELECT * FROM products WHERE name IS NOT NULL; will not return either rows where the name column has a value of NULL or the name column is missing. Thus, IS and IS NOT are not symmetrical for jagged data.

IS [NOT] DISTINCT FROM

IS DISTINCT FROM is a synonym for IS NOT. IS NOT DISTINCT FROM is a synonym for IS.

NOTE: The IS [NOT] DISTINCT FROM form is provided for SQL specification compatibility and is not recommended, as it tends to be verbose and confusing.

NOT, AND, OR

NOT can be prefixed to any clause to negate it:


SELECT * FROM products WHERE NOT (name = 'Coffee');

AND returns true if two clauses both return true:


SELECT * FROM products WHERE name = 'Coffee' AND price > 2.99;

OR returns true if either of two clauses return true:


SELECT * FROM products WHERE name = 'Coffee' OR name = 'Kaapi';

Math

Standard SQL mathemetical operators are available to any two numeric values:

+ (addition)
- (subtraction)
* (multiplication)
/ (division)
% (modulo; integer remainder of division)
<< (left bit shift)
>> (right bit shift)
+NUMBER (unary plus)
-NUMBER (unary minus)


SELECT 1 + 3.555;
SELECT 1 - 3.555;
SELECT 2 * 3.555;
SELECT 2 / 3.555;
SELECT 2 % 3.555;
SELECT 62 << 2;
SELECT 62 >> 2;
SELECT +128.5;
SELECT -128.5;

NOTE: Mathematical functions are documented under Functions.

Bitwise Operators

Standard SQL bitwise manipulation operators are available to any two values.

& (bitwise and)
| (bitwise or)

The bitwise not operator is also available to a single value:

~ (bitwise not)


SELECT 1 & 2;
SELECT 1 | 2;
SELECT ~1;

LIKE

LIKE is the operator equivalent of the LIKE function.

LIKE returns TRUE if a string matches the supplied LIKE pattern, as defined below:

A pattern can be a string literal. It can also contain underscores (_) and/or percentage symbols (%). An underscore matches exactly one character. A percentage symbol matches zero or more characters.

Backslash escapes the following character to make it a literal. Use ESCAPE to override the default backslash escape character.


SELECT * FROM products WHERE name LIKE 'Tofu';
SELECT * FROM products WHERE name LIKE 'Tof_';
SELECT * FROM products WHERE name LIKE '%of%';
SELECT * FROM products WHERE name LIKE '\%of\%';
SELECT * FROM products WHERE name LIKE 'X%ofX%' ESCAPE 'X';

NOT LIKE is used to invert the results of the match.


SELECT * FROM products WHERE name NOT LIKE '%of%';

NOTE: Endb LIKE is case-sensitive.

REGEXP

REGEXP returns TRUE if a string matches the supplied regular expression. REGEXP may be prefixed with NOT.


SELECT * FROM products WHERE name REGEXP '.*ee|.*ea';
SELECT * FROM products WHERE name NOT REGEXP '.*[fst]+.*';

GLOB

GLOB returns TRUE if a string matches the supplied UNIX glob. GLOB may be prefixed with NOT.


SELECT * FROM products WHERE name GLOB '*of*';
SELECT * FROM avatars WHERE filename NOT GLOB '/opt/local/avatars/*/*.png';

NOTE: GLOB is case-sensitive. It conforms to standard UNIX globs and thus does not support "globstar" (recursive directory) expansion like **/*.png.

MATCH (Containment)

MATCH returns TRUE if the value on the left contains the value on the right, at the top level. Note that a top-level array to the right of the MATCH refers to a set of values that all need to match, not a literal array.

The following expressions return TRUE:


SELECT 'foo' MATCH 'foo';
SELECT [1, 2, 3] MATCH [3, 1];
SELECT {user: 'foo', age: 42} MATCH {age: 42};
SELECT {a: [1, 2, {c: 3, x: 4}], c: 'b'} MATCH {a: [{x: 4}, 1]};

The following expressions return FALSE:


SELECT [1, 2, [1, 3]] MATCH [1, 3];
SELECT {foo: {bar: 'baz'}} MATCH {bar: 'baz'};
SELECT {a: [1, 2, {c: 3, x: 4}], c: 'b'} MATCH {a: [{x: 4}, 3]};

NOTE: The @> operator is a synonym for MATCH. It is provided as a convenience for users accustomed to the equivalent JSON Containment Operator in Postgres. It also has a symmetric operator, <@, which returns TRUE if the value on the right contains the value on the left, at the top level. No symmetric keyword exists for MATCH.

ANY, SOME

SOME is a synonym for ANY. ANY qualifies a subquery by comparing a single column or literal value with the result of that subquery. ANY is used in the form <expression> <operator> ANY (<subquery>). It returns true if the subquery returns a one or more values for which the operator is true. The operator must return a boolean and the subquery must return a single column.


SELECT 1500 < SOME (SELECT price FROM products);

ALL

ALL qualifies a subquery by comparing a single column or literal value with the result of that subquery. ALL is used in the form <expression> <operator> ALL (<subquery>). It returns true only if all values returned by the subquery are true for the operator provided. The operator must return a boolean and the subquery must return a single column.


SELECT "ok" = ALL (SELECT status_code FROM statuses);

EXISTS

EXISTS returns TRUE if the subquery which follows it returns at least one row.


SELECT name FROM products WHERE EXISTS (SELECT 1 FROM coupons WHERE name = products.name);

IN

The standard SQL IN clause can be used to test lists and subqueries for containment of a value.


SELECT * FROM products WHERE price IN (5.00, 5.99);
SELECT * FROM products WHERE price IN (SELECT price FROM coupons);

NOTE: Use MATCH to test for containment of a value in an array.

NOT IN

The standard SQL NOT IN clause can be used to test lists and subqueries for absence of a value.


SELECT * FROM products WHERE price NOT IN (5.00, 5.99);
SELECT * FROM products WHERE price NOT IN (SELECT price FROM coupons);

NOTE: Use MATCH to test for absence of a value in an array.

`||` (Concatenation)

The || operator concatenates two strings or arrays supplied as arguments. When concatenating to an array element: other elements, arrays, and blobs are accepted as the second argument. When concatenating to an array: arrays, blobs, and array elements are accepted as the second argument. Elements other than strings are cast to strings when concatenated with each other. Multiple operators can be chained together.


SELECT "Hello" || "World";
SELECT [1, 2, 3] || [4, 5, 6];
SELECT 1 || 2;
SELECT "Hello" || ["World"];
SELECT ["Hello"] || "World";
SELECT "Hello" || "World" || "And" || "Friends";

The Concatenation Operator is equivalent to the CONCAT function.

Vector Operators

Indexing

At the moment, Endb performs exact nearest neighbor search, which provides perfect recall. Endb may support approximate vector indexes in the future, which trade accuracy for speed.

`<->` (L2 or Euclidean Distance)

The L2 Distance operator (<->) compares two vectors by Euclidean distance. It is symmetrical to the L2_DISTANCE function.


SELECT * FROM (VALUES ([0,0,0]), ([1,2,3]), ([1,1,1]), (NULL), ([1,2,4])) AS t(val) WHERE val NOT NULL ORDER BY t.val <-> [3,3,3];
-- [{'val': [1, 2, 3]},
--  {'val': [1, 2, 4]},
--  {'val': [1, 1, 1]},
--  {'val': [0, 0, 0]}]

`<=>` (Cosine Distance)

The Cosine Distance operator (<=>) compares two vectors by the complement of their Cosine Similarity. It is symmetrical to the COSINE_DISTANCE function.


SELECT val FROM (VALUES ([0,0,0]), ([1,2,3]), ([1,1,1]), ([1,2,4])) AS t(val) WHERE t.val <=> [3,3,3] NOT NULL ORDER BY t.val <=> [3,3,3];
-- [{'val': [1, 1, 1]},
--  {'val': [1, 2, 3]},
--  {'val': [1, 2, 4]}]

`<#>` (Inverse Inner Product)

The Inverse Inner Product operator (<#>) compares two vectors by the inverse of their Inner Product. It is the inverse of the INNER_PRODUCT function.


SELECT val FROM (VALUES ([0,0,0]), ([1,2,3]), ([1,1,1]), (NULL), ([1,2,4])) AS t(val) WHERE val IS NOT NULL ORDER BY t.val <#> [3,3,3];
-- [{'val': [1, 2, 4]},
--  {'val': [1, 2, 3]},
--  {'val': [1, 1, 1]},
--  {'val': [0, 0, 0]}]

Functions

String Functions

CHARACTER_LENGTH

The CHARACTER_LENGTH function returns the number of unicode characters in a string.


SELECT CHARACTER_LENGTH('josé');
-- 4

SELECT CHARACTER_LENGTH('❤️🥫');
-- 3

OCTET_LENGTH

The OCTET_LENGTH function returns the length of a string, in bytes (octets).


SELECT OCTET_LENGTH('josé');
-- 5

SELECT OCTET_LENGTH('❤️🥫');
-- 10

TRIM, LTRIM, RTRIM

The TRIM, LTRIM, and RTRIM functions trim surrounding whitespace, whitespace to the left, and whitespace to the right of a string, respectively.


SELECT TRIM('  hello  ');
-- 'hello'

SELECT LTRIM('  hello  ');
-- 'hello  '

SELECT RTRIM('  hello  ');
-- '  hello'

LOWER, UPPER

The LOWER and UPPER functions downcase and upcase a string, respectively.


SELECT LOWER('Relatable Algebra');
-- 'relatable algebra'

SELECT UPPER('Shouting Calculus');
-- 'SHOUTING CALCULUS'

REPLACE

The REPLACE function returns the string in the first parameter, with the second parameter (if found) replaced by the third.


SELECT REPLACE('Relatable Algebra', 'Rela', 'Infla');

INSTR

The INSTR function returns the first character of a substring match on the second parameter, if found, and 0 if it is not found.


SELECT INSTR('Coffee', 'ee');

SUBSTRING

The SUBSTRING function returns the substring starting from the index provided as the second parameter. If the (optional) third parameter is provided, the substring will be of that length (or less, if the end of the source string is reached). SUBSTR is a synonym for SUBSTRING.


SELECT SUBSTRING('Hello Edgar', 4);
SELECT SUBSTR('Hello Edgar', 4, 2);

POSITION

The POSITION pseudo-function returns the first position of the first character of the first matched substring in another string. If the substring is not detected, POSITION returns 0.


SELECT POSITION("h" IN "Hawaii");
SELECT POSITION("i" IN "Hawaii");
SELECT POSITION("Ha" IN "Hawaii");

NOTE: POSITION is a "pseudo-function" because internally it uses custom syntax of the form <substring> IN <string>.

UNICODE

The UNICODE function returns an integer unicode value for the first character of the parameter given.


SELECT UNICODE('Adam');

CHAR

The CHAR function returns a string corresponding to the supplied integer character codes.


SELECT CHAR(65, 66, 67);

CONCAT

CONCAT is equivalent to the Concatenation Operator (||) except that CONCAT is limited to 2-arity applications and || can be chained.

LIKE

The LIKE function serves the same purpose as the LIKE operator. However, the argument order is (effectively) reversed for the LIKE function, to match the signature used in SQLite. For the function version, the pattern is the first argument. Optionally, an alternative escape character can be provided as a third argument.


SELECT * FROM users WHERE LIKE('Stev%', name);
SELECT * FROM users WHERE LIKE('EdgarX%', name, 'X');

Collection Functions

LENGTH

The LENGTH function counts the number of entries in a collection. When supplied with a string, it is a synonym for CHARACTER_LENGTH.


SELECT LENGTH([3, 2]);
SELECT LENGTH({name: 'Peas', price: 8.99, product_no: 77});
SELECT LENGTH('josé');

NOTE: CARDINALITY is an synonym for LENGTH.

OBJECT_KEYS

An object's keys can be selected using OBJECT_KEYS.


SELECT OBJECT_KEYS({original_price: 1.99, sale_price: 1.50, coupon_price: 1.40});

OBJECT_VALUES

An object's values can be selected using OBJECT_VALUES.


SELECT OBJECT_VALUES({original_price: 1.99, sale_price: 1.50, coupon_price: 1.40});

OBJECT_ENTRIES

Returns an array of key-value pairs representing the given object.


SELECT OBJECT_ENTRIES({a: 1, b: 2, c: 3});
-- [['a': 1], ['b': 2], ['c': 3]]

OBJECT_FROM_ENTRIES

Constructs an object from an array of key-value pairs.


SELECT OBJECT_FROM_ENTRIES([['a', 1], ['b', 2], ['c', 3]]);
-- {a: 1, b: 2, c: 3}

PATCH

The PATCH function takes two documents. The document returned is the first document "patched" with any fields found in the second document. If the second document does not specify a field, that field is left untouched. If the second document specifies any fields with values of NULL, those fields are removed.


SELECT PATCH(
  {name: 'Salt', nutrition: {sodium: 100, ingredients: 'Kosher Salt'}},
  {name: 'Sea Salt', nutrition: {ingredients: NULL}}
);

The PATCH function has an equivalent operator for data manipulation: UPDATE PATCH

Table Value Functions

Table Value Functions are only valid within the FROM clause.

UNNEST

The UNNEST function can be thought of as the inverse of ARRAY_AGG, although it offers more power than just unlinking elements. It takes an array or object and pulls its elements into separate rows.


SELECT * FROM UNNEST([1.99, 2.99, 3.99]) AS products(price);

It is possible to unnest multiple arrays. If the arrays do not have the same number of elements, the shorter array(s) will have those values filled with NULL:


SELECT names.* FROM (VALUES (['Leslie', 'Edgar', 'fiver2'], ['Lamport', 'Codd'])) AS x(first, last), UNNEST(x.first, x.last) AS names(first, last);

When unnesting an object, keys-value pairs will be returned as per object_entries. This behaviour is useful for manipulating collections:


SELECT * FROM UNNEST({original_price: 1.99, sale_price: 1.50, coupon_price: 1.40}) AS prices(price);
-- [{'price': ['sale_price', 1.5]},
--  {'price': ['coupon_price', 1.4]},
--  {'price': ['original_price', 1.99]}]

Unnesting nested data from a queried table is done with the form FROM <table>, UNNEST(<table>.<column>) AS foo(new_column). For example:


INSERT INTO msgs
  {text: "Here is some classic material",
   user: "George",
   workday: 2024-02-25,
   media: [{type: "image", src: "dsm.png"},
           {type: "video", src: "vldb.mp4"}]};

WITH m AS (SELECT * FROM msgs, UNNEST(msgs.media) AS m(media))
SELECT media FROM m WHERE media..type MATCH 'video';

WITH ORDINALITY

UNNEST can be suffixed with WITH ORDINALITY to append an ordinal column to the results.


SELECT * FROM UNNEST([1.99, 2.99, 3.99]) WITH ORDINALITY AS products(price, n);
-- [{'n': 0, 'price': 1.99}, {'n': 1, 'price': 2.99}, {'n': 2, 'price': 3.99}]

NOTE: Endb ordinals are zero-indexed.

GENERATE_SERIES

The GENERATE_SERIES function generates an array of numbers within a given interval. The first and second parameters are the start and end of the interval. The optional third parameter is a step value by which to increment each number. The result is returned as a single anonymous column (with the default name, column1) containing the array.


SELECT * FROM GENERATE_SERIES(0, 21) AS t(s);
SELECT * FROM GENERATE_SERIES(0, 21, 3) AS t(s);

It is possible to use the result of GENERATE_SERIES in other SQL expressions, like IN:


SELECT * FROM products WHERE product_no IN (SELECT column1 FROM generate_series(1000, 20000) AS foo);

Numeric Functions

RANDOM

The RANDOM function returns a random integer.


SELECT RANDOM();

Math

Endb provides standard SQL math functions based on SQLite's collection of math functions:

ROUND
SIN
COS
TAN
SINH
COSH
TANH
ASIN
ACOS
ATAN
ASINH
ACOSH
ATANH
ATAN2
FLOOR
CEILING, CEIL
SIGN
SQRT
EXP
POWER, POW
LOG, LOG10
LOG2
LN
DEGREES
RADIANS
PI
ABS

NOTE: Endb follows the choice of most SQL databases and aliases LOG to LOG10 rather than LN (natural log), as specified by the SQL standard.

NOTE: Mathematical operators are documented under Operators.

Date/Time Functions

STRFTIME

The STRFTIME function formats a date or time value as a string.


SELECT strftime('%Y/%m/%d', date('2001-01-01'));
SELECT strftime('%Y %m %d at %H %M %S', datetime('2001-01-01 03:04:05'));

UNIXEPOCH

The UNIXEPOCH function returns the number of seconds since the UNIX epoch. Accepts a DATE, TIMESTAMP, or STRING.


SELECT UNIXEPOCH('2023-01-01');
SELECT UNIXEPOCH(1970-01-01T00:00:00Z);

JULIANDAY

The JULIANDAY function returns the Julian Day, which is the number of days since noon in UTC on November 24, 4714 B.C. Accepts a DATE, TIMESTAMP, or STRING.


SELECT JULIANDAY(1970-01-01);

EXTRACT

The EXTRACT pseudo-function provides a way to access one named, numerical portion of a date, time, or timestamp. Portions of dates can only be extracted from dates or timestamps. Portions of times can only be extracted from timestamps or times.


SELECT EXTRACT(YEAR FROM CURRENT_DATE);
SELECT EXTRACT(MONTH FROM CURRENT_DATE);
SELECT EXTRACT(DAY FROM CURRENT_TIMESTAMP);
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP);
SELECT EXTRACT(MINUTE FROM CURRENT_TIME);
SELECT EXTRACT(SECOND FROM CURRENT_TIME);

NOTE: EXTRACT is a "pseudo-function" because internally it uses custom syntax of the form <named-portion> FROM <date>.

PERIOD

The PERIOD function creates a new Period. It is equivalent to using Period literals.


PERIOD(2001-04-01T00:00:00Z, 2001-05-01)

Aggregate Functions

MIN, MAX

The MIN and MAX functions return the minimum and maximum values for an expression, respectively.


SELECT MIN(price) FROM products;
SELECT MAX(price) FROM products;

NOTE: MIN and MAX also have non-aggregate equivalents, which are 2-arity. When used that way, they each return the minimum or maximum value of the two values provided.

SUM

The SUM function returns the sum of all non-null values under the column given as a parameter.


SELECT SUM(price) FROM products;

If all values for the given column are NULL, SUM returns NULL.

TOTAL

The TOTAL function is equivalent to SUM except that it returns 0.0 in the case where all input values are NULL.

AVG

The AVG function takes a numerical-type-agnostic average of all values under the column given as a parameter.


SELECT AVG(price) FROM products;

COUNT

The COUNT function returns the count of non-null, non-empty values for the specified column.


SELECT COUNT(price) FROM sales;

NOTE: Because null/empty values are ignored, the behaviour of COUNT will differ from other SQL dialects. Whether or not COUNT(price) and COUNT(1) are equivalent is dependent on whether the price attribute exists with a non-null value on each document.

ARRAY_AGG

The ARRAY_AGG function concatenates the results of an expression into an array. The parameter may be ordered within ARRAY_AGG.


SELECT ARRAY_AGG(price) FROM products;
SELECT ARRAY_AGG(name ORDER BY price DESC) FROM products;

Note that when operating on arrays, the arrays themselves will be concatenated, not the contents of the arrays. The result will be an array of one higher dimension:


SELECT ARRAY_AGG(x.column1) FROM (VALUES ([1,2]), ([3,4])) AS x;
-- [{'column1': [[1, 2], [3, 4]]}]

GROUP_CONCAT

The GROUP_CONCAT function returns a string with concatenated non-null values from a column or group. Given a second parameter, It defaults to a comma-delimited list, but the second (optional) parameter can override the delimiter.


SELECT GROUP_CONCAT(name) FROM products;
SELECT GROUP_CONCAT(name, ':') FROM products;

FILTER

All aggregate functions can have a filter applied before aggregation.


SELECT SUM(price) FILTER(WHERE price > 20) FROM products;

Data Type Functions

CAST

The CAST function forces a value into a particular data type. Note that not all types are cast-compatible with each other.


SELECT CAST(price AS INTEGER) FROM products;

TYPEOF

The TYPEOF function returns the type of the provided value.


SELECT TYPEOF('hi2u');
SELECT TYPEOF(1.12345678901234);
SELECT TYPEOF(2018-01-01T00:00:00);

Conditional Functions

IIF

The IIF function is a conditional shorthand. It returns the second parameter if the condition is true and the third parameter if the condition is false.


SELECT IIF(price > 5.99, 'Expensive!', 'Cheap') FROM products;

NULLIF

The NULLIF function returns TRUE if the two supplied expressions are equal.


SELECT NULLIF(1, 1);
SELECT NULLIF(1, 'zig');

COALESCE

The COALESCE function returns its first non-null argument. The following example returns 'zig':


SELECT COALESCE(NULL, NULL, 'zig', 'zag');

Encoding Functions

BASE64

The BASE64 function takes a hexadecimal-encoded BLOB and returns a base64-encoded string, or vice-versa. BASE64 roundtrips its own data. There is therefore no BLOBFROMBASE64 function.


SELECT BASE64(x'010203');
SELECT BASE64('AQID');

UUID

The UUID function returns a universally-unique identifier, as a string. The UUID_BLOB function takes a string UUID and returns a BLOB. The UUID_STR function takes a BLOB UUID and returns a string. When given a parameter of their return type, UUID_BLOB and UUID_STR will format the UUID provided.


SELECT UUID();
SELECT UUID_BLOB('d2ce21c9-d268-409a-b1e0-49e1200bfa47');
SELECT UUID_STR(x'd2ce21c9d268409ab1e049e1200bfa47');

-- formatting:
SELECT UUID_BLOB(x'd2ce21c9d268409ab1e049e1200bfa47');
SELECT UUID_STR('d2ce21c9d268409ab1e049e1200bfa47');

SHA1

The SHA1 function takes either a hexadecimal-encoded BLOB, a string, or a number. It returns the SHA-1 encoding of that value.


SELECT SHA1('2');

RANDOMBLOB, ZEROBLOB

The RANDOMBLOB function returns a random binary large object of the size given, in bytes. The ZEROBLOB function returns a zeroed-out binary large object of the size given, in bytes.


SELECT RANDOMBLOB(32);
SELECT ZEROBLOB(32);

HEX, UNHEX

The HEX function takes a BLOB (or coerces its argument into a UTF-8 string, which in turn is interpreted as a BLOB) and turns the BLOB into an upper-case hexadecimal string.

The UNHEX function takes a hexadecimal string and turns it into a BLOB. The hexadecimal string provided must contain character pairs. UNHEX takes an optional second parameter: a string containing non-hexadecimal characters to be ignored in the first parameter. If non-hexadecimal characters are found in the first parameter but not ignored in the second parameter, UNHEX returns NULL.


SELECT HEX(15);
-- '3135'

SELECT UNHEX('3135');
-- b'15'

SELECT UNHEX('3135ZZ', 'Z');
-- b'15'

SELECT UNHEX('3135ZZ', 'M');
-- NULL

Vector Functions

Indexing

At the moment, Endb performs exact nearest neighbor search, which provides perfect recall. Endb may support approximate vector indexes in the future, which trade accuracy for speed.

L2_DISTANCE

The L2_DISTANCE function returns the Euclidean distance between two vectors. It is symmetrical to the <-> operator.


SELECT L2_DISTANCE([-0.7888,-0.7361,-0.6208,-0.5134,-0.4044], [0.8108,0.6671,0.5565,0.5449,0.4466]);
-- [{'column1': 2.7853052938591847}]

COSINE_DISTANCE

The COSINE_DISTANCE function returns the complement of Cosine Similarity for two vectors. It is symmetrical to the <=> operator.


SELECT COSINE_DISTANCE([-0.7888,-0.7361,-0.6208,-0.5134,-0.4044], [0.8108,0.6671,0.5565,0.5449,0.4466]);
-- [{'column1': 1.9970250541178656}]

INNER_PRODUCT

The INNER_PRODUCT function returns the Inner Product of two vectors. It is inverse of the <#> operator.


SELECT INNER_PRODUCT([1,2], [3,4]);
-- [{'column1': 11.0}]

Time Queries

To make best use of Time Queries, it is a good idea to review the time-related SQL data types, such as TIMESTAMP, DATE, TIME, and INTERVAL. These are covered in the SQL Data Types section.

It is also a good idea to review Endb's other time-related functions, in case they are helpful to you:

Note on SQL:2011 closed-open period model

All Endb temporal predicates (CONTAINS, OVERLAPS, PRECEDES, SUCCEEDS, IMMEDIATELY PRECEDES, and IMMEDIATELY SUCCEEDS) follow the SQL:2011 standard's "closed-open period model". This means that a period represents all times starting from (and including) the start time up to (but excluding) the end time.

Note on timezones

Endb currently only supports times encoded as UTC.

Now

Endb provides access to the current value of the clock "now" in multiple date/time configurations.

CURRENT_TIMESTAMP

CURRENT_TIMESTAMP gets the current date and time in UTC.


SELECT CURRENT_TIMESTAMP;

CURRENT_TIME

CURRENT_TIME gets the current time in UTC.


SELECT CURRENT_TIME;

CURRENT_DATE

CURRENT_DATE gets the current date in UTC. Note that this may be different from your local date, depending on the time of day when your query is run.


SELECT CURRENT_DATE;

System Time

All states an Endb database has ever seen are recorded, immutably. Accessing these prior states is accomplished by querying System Time. System Time is encoded in a special column, which is normally invisible to most queries, named system_time (lower case). Because system_time is invisible by default, it must be retrieved explicitly:


SELECT *, system_time FROM products;

AS OF (Time Travel)

Endb permits time-traveling queries with the SQL:2011-compatible AS OF operator. The query will treat the DATE or TIMESTAMP supplied after AS OF as if it were that time now.


SELECT * FROM products FOR SYSTEM_TIME AS OF 2023-08-25T00:00:00;

ALL (Time Omniscience)

Endb permits time-omniscient queries with the SQL:2011-compatible ALL operator. All states, across the entire history of the relevant tables, are visible to a query suffixed with FOR SYSTEM_TIME ALL:


SELECT * FROM products FOR SYSTEM_TIME ALL;

BETWEEN

The syntax for time-aware BETWEEN is the same as the normal BETWEEN operator. Inspect System Time with the form FOR SYSTEM_TIME BETWEEN x AND y.


SELECT * FROM products FOR SYSTEM_TIME BETWEEN 2023-08-24T00:00:00 AND 2023-08-25T00:00:00;

FROM ... TO

Selects rows which fall between the two times, similar to BETWEEN, but is exclusive of both the start and end times.


SELECT * FROM products FOR SYSTEM_TIME FROM 2023-08-24T00:00:00 TO 2023-08-30T00:00:00;

Period Predicates

The standard SQL:2011 period predicates are available.

CONTAINS

Returns TRUE if the second period is contained within the first.


SELECT {start: 2001-01-01, end: 2001-04-01} CONTAINS {start: 2001-02-01, end: 2001-04-01};

OVERLAPS

Returns TRUE if any part of the first period is found within the second.


SELECT {start: 2001-01-01, end: 2001-03-01} OVERLAPS {start: 2001-02-01, end: 2001-04-01};

PRECEDES

Returns TRUE if the first period ends before the second period begins.


SELECT 2001-03-01 PRECEDES [2001-04-01T00:00:00Z, 2001-05-01];

SUCCEEDS

Returns TRUE if the first period begins after the second period ends.


SELECT 2001-06-01 SUCCEEDS [2001-04-01T00:00:00Z, 2001-05-01];

IMMEDIATELY PRECEDES

Returns TRUE if the first period ends exactly as the second period begins.


SELECT 2001-04-01 IMMEDIATELY PRECEDES [2001-04-01T00:00:00Z, 2001-05-01];

IMMEDIATELY SUCCEEDS

Returns TRUE if the first period begins exactly as the second period ends.


SELECT 2001-05-01 IMMEDIATELY SUCCEEDS [2001-04-01T00:00:00Z, 2001-05-01];

Because Endb is schemaless and semi-structured, it offers a number of powerful path-nativation primitives inspired by JSONPath, SQL/JSON, and their derivatives in legacy relational databases.

You will want to familiarize yourself with Endb's nested data types (arrays and objects) before learning about path navigation.

It will also be helpful to learn about other functions and operators that are frequently helpful in conjunction with path navigation:

MATCH - filter by containment
ARRAY_AGG - concatenate results into an array
UNNEST - pull array elements into separate rows
WITH - Common Table Expressions

In the examples below, most path examples are shown in the SELECT clause, but they are also valid in the WHERE clause.

Nested Objects

If you are familiar with arrays and objects, try adding some nested objects to the table paths (the table name is arbitrary).


INSERT INTO paths {a: 2, b: {a: 3}, c: [{a: 1}, 2]};

Navigating the root of any document (row) as a tree looks like standard SQL, because it is:


SELECT a FROM paths;

Similarly, however, it is possible to navigate documents listed in an array:


SELECT [{a: 2}, {a: 3}, {b: 4}].a;
-- [{'a': [2, 3]}]

It is also possible to navigate fields of sub-documents (columns of nested rows) with further dot notation:


SELECT b.a FROM paths;
-- [{'a': 3}]

Row Literals

It is possible (and helpful) to create a row literal to represent rows returned, so they are easier to navigate. The format of a row literal is { <table>.* }:


SELECT { paths.* } FROM paths;

Or the shorthand:


SELECT paths FROM paths;

See Row Literal Data Type

Recursive Paths

The double dot (..) notation performs a "deep scan" by recursively walking the document to match the name given.


SELECT { paths.* }..a FROM paths;
-- [{'a': [2, 3, 1]}]

SELECT b..a FROM paths;
-- [{'a': [3]}]

SELECT * FROM paths WHERE paths..a MATCH 3; -- finds a match
SELECT * FROM paths WHERE paths..a MATCH 4; -- does not

Named Child

The square bracket notation (['<NAME>']) performs a lookup of a single descendent child.


SELECT { paths.* }['b']['a'] FROM paths;
-- [{'a': 3}]

SELECT b['a'] FROM paths;
-- [{'a': 3}]

Named Children can be combined with recursive paths, though the default recursive path syntax is synonymous with named children:


SELECT { paths.* }..a FROM paths;
SELECT { paths.* }..['a'] FROM paths;
SELECT b..['a'] FROM paths;

Numbered Child

The square bracket notation ([<NUMBER>]) can also perform indexed lookups of a single descendent child.


SELECT { paths.* }['b'][0] FROM paths;
-- [{'column1': {'a': 3}}]

SELECT { paths.* }['c'][1] FROM paths;
-- [{'column1': 2}]

SELECT c[1] FROM paths;
-- [{'column1': 2}]

Numbered Children can be combined with recursive paths. This finds and returns all indexed values, counting backward:


SELECT { paths.* }..[-1] FROM paths;
-- [{'column1': [2]}]

Wildcard Child

The square bracket notation ([*]) can also perform a wildcard lookup of all descendent children.


SELECT [{a: 2}, {a: 3}, {b: 4}, 5][*];

Wildcards can be combined with recursive paths. This finds and returns all values:


SELECT { paths.* }..[*] FROM paths;
-- [{'column1': [2, {'a': 3}, [{'a': 1}, 2], 3, {'a': 1}, 2, 1]}]

SELECT c..[*] FROM paths;
-- [{'column1': [{'a': 1}, 2, 1]}]

Path Functions

Path editing is accomplished with an extended path syntax, where each path begins with $.

Endb's path editing functions are heavily inspired by SQLite's JSON Functions.

Path editing functions add functionality ($, #) to a subset Endb's normal path navigation syntax: path editing functions do not support recursion or wildcards.

Path editing functions are available to UPDATE ... SET and UPDATE ... UNSET/REMOVE.

`path_set`

Takes an object, a path, and a new value. The new value will overwrite existing fields or add a new field if it doesn't already exist.


SELECT path_set({a: 2, c: 4}, $.c, [97,96]);
-- {'a': 2, 'c': [97, 96]}

`path_replace`

Takes an object, a path, and a new value. The new value is ignored if the path does not match an existing field.


SELECT path_replace({a: 2, c: 4}, $.a, 99);
-- {'a': 99, 'c': 4}

SELECT path_replace({a: 2, c: 4}, $.e, 99);
-- {'a': 2, 'c': 4}

`path_insert`

Takes an object, a path, and a new value. The new value is ignored if the path matches an existing field.


SELECT path_insert({a: 2, c: 4}, $.e, 99);
-- {'a': 2, 'c': 4, 'e': 99}

`path_remove`

Takes an object and a variable number of arguments specifying which paths to remove. If a path is not found, nothing is removed for that argument. # represents the last element in a collection.


SELECT path_remove([0,1,2,3,4], $[#-1], $[0]);
-- [1, 2, 3]

SELECT path_remove({x: 25, y: 42}, $.y);
-- {'x': 25}

`path_extract`

Takes an object and a variable number of path arguments. Returns the value found at each path, if any, otherwise NULL. If only a single path is provided, a scalar is returned. If multiple paths are provided, an array is returned.


SELECT path_extract({a: 2, c: [4, 5, {f: 7}]}, $.c[2].f);
-- 7

SELECT path_extract({a: 2, c: [4, 5], f: 7}, $.x, $.a);
-- [NULL, 2]

Schema

Endb allows introspection of its information schema. The Endb information schema does not describe the structure of each table. Because Endb is a document database, each document (row) is responsible for its own schema. The information schema is used by Endb to describe database objects at a high level and is used for schemaless queries, such as SELECT *.

Note that all information schema tables are hard-coded to lower-case names and must be queried as such.

Tables


-> SELECT * FROM information_schema.tables;
[{'table_catalog': None,
  'table_name': 'stores',
  'table_schema': 'main',
  'table_type': 'BASE TABLE'},
 {...
  'table_name': 'products',
  ... },
 {...
  'table_name': 'sales',
  ... }]

Columns


-> SELECT * FROM information_schema.columns;
[{'column_name': 'addresses',
  'ordinal_position': 0,
  'table_catalog': None,
  'table_name': 'stores',
  'table_schema': 'main'},
 {'column_name': 'brand',
  ... },
 {'column_name': 'price',
  ... },
 ... ]

Views


-> SELECT * FROM information_schema.views;
[{'table_catalog': None,
  'table_name': 'sold_products',
  'table_schema': 'main',
  'view_definition': 'SELECT * FROM products p JOIN sales s ON p.id = s.p_id'}]

Check Constraints

The check_constraints table in Endb is used to store assertions.


-> SELECT * FROM information_schema.check_constraints;
[{'check_clause': "(NOT EXISTS (SELECT * FROM users WHERE TYPEOF(email) != 'text'))",
  'constraint_catalog': None,
  'constraint_name': 'string_email',
  'constraint_schema': 'main'}]

Views

Endb provides basic view functionality.

CREATE VIEW

CREATE VIEW creates a non-materialized view based on the query which follows the AS operator. Column names are listed in parentheses after the view name.


CREATE VIEW simple_products(name, price) AS SELECT name, ROUND(price) FROM products;

Alternatively, named columns can each immediately follow queried columns.


CREATE VIEW easy_products AS SELECT name label, ROUND(price) easy_price FROM products;

NOTE: To modify a view, use DROP VIEW then re-create the view with the desired columns.

DROP VIEW

DROP VIEW deletes a view based on its name.


DROP VIEW easy_products;

Assertions

CREATE ASSERTION

Creates a checked, deferred assertion which executes on commit for inserts and updates. Although CREATE ASSERTION (normally) refers to the table it is asserting on, that table need not exist for the assertion to be created.

Type Constraint


CREATE ASSERTION string_email CHECK (NOT EXISTS (SELECT * FROM users WHERE TYPEOF(email) != 'text'));
INSERT INTO users {name: 'Steven', email: 123};

Unique Constraint

There are a number of possible ways to create the equivalent of a UNIQUE constraint, as found in schema-on-write databases. One sensible approach is given below.


CREATE ASSERTION unique_email CHECK (1 >= (SELECT MAX(c.cnt) FROM (SELECT COUNT(*) AS cnt FROM users GROUP BY email) AS c));
INSERT INTO users {name: 'Steven', email: 's@endatabas.com'};
INSERT INTO users {name: 'Sarah', email: 's@endatabas.com'};

Not Null Constraint

There are multiple possible meanings to "not null" columns in Endb: Columns can be strictly checked for literal NULL in the value position. Rows can be forbidden from eliding a column (a missing value). Both literal NULL and elided columns can both be prevented.

This example checks for literal NULL only:


CREATE ASSERTION notnull_email CHECK (NOT EXISTS (SELECT * FROM users WHERE email IS NULL));
INSERT INTO users {name: 'Tian Tian', email: NULL}; -- check fails
INSERT INTO users {name: 'Tian Tian'}; -- permitted

DROP ASSERTION

Removes an assertion from the database based on its name.


DROP ASSERTION string_email;

Clients

Endatabas provides a variety of tools to communicate with an Endb instance.

The Console is a command-line user interface to send SQL to Endb manually. The client libraries are used in the data access layer of an application which uses Endb for storage.

The client libraries support both HTTP and WebSocket APIs. To access Endb over HTTP, create an Endb instance and call the sql method on it. To access Endb over WebSockets, create an EndbWebSocket instance and call the sql method on it. See the language-specific documentation below for further details.

NOTE: For any programming language not listed above, it is possible to create small client libraries such as these using the HTTP and WebSocket APIs. All official Endatabas clients are permissively-licensed and you are welcome to use them as examples to create your own client library. Their source code is available here: https://github.com/endatabas/endb/tree/main/clients

Console

A small console is provided to learn Endb SQL interactively and run experimental queries. It is distributed in the same Python package as the Python client.

Install


pip install endb

Usage


endb_console [--url URL] [-u USERNAME] [-p PASSWORD]

Show additional command line options with endb_console --help:


usage: endb_console [-h] [--url URL] [-u USERNAME] [-p PASSWORD] [sql ...]

positional arguments:
  sql                   SQL statement or file

options:
  -h, --help            show this help message and exit
  --url URL
  -u USERNAME, --username USERNAME
  -p PASSWORD, --password PASSWORD

Prompt

When the Endb console starts, it will provide a multiline prompt (->) where you can enter SQL statements and queries. The prompt will extend onto new lines (..) until you enter a semicolon.


-> INSERT INTO users
.. {name: 'Conrad', email: 'c1o2n@shell.com'};
-> SELECT * FROM users;

The console has history which you can use the up and down arrows to navigate. History does not persist between console sessions.

Learn more about Endb SQL in the SQL Reference.

To quit, type CTRL+D.

Commands

The console comes with a number of special, non-SQL commands.

URL

Set the URL at runtime with the url command.


-> url https://192.168.1.200:3803/sql

Accept

Set the accept header content type at runtime with the accept command.


-> accept text/csv

Username

Set the username at runtime with the username command.


-> username efcodd

Password

Set the password at runtime with the password command.


-> password equ1valenc3

Timer

All SQL commands can be timed by toggling the timer on: timer on. Toggle the timer back off with timer off.


-> timer on
-> SELECT * FROM users;
[{'name': 'Aaron'}
 {'name': 'Irene'}
 ... ]
Elapsed: 0.003904 ms

Quit

Quit the console by typing quit.


-> quit

Data Types

The console communicates with Endb over the HTTP API. Data is returned as LD-JSON documents and marshalled into strongly-typed Python objects. For example:


[{'date': None, 'email': 'c1o2n@shell.com', 'name': 'Conrad'},
 {'date': datetime.datetime(2024, 1, 29, 18, 18, 30, 129159, tzinfo=datetime.timezone.utc),
  'email': 'kitty@tramline.in',
  'name': 'Akshay'}]

The Data Types page talks about types in more detail.

JavaScript

The JavaScript and TypeScript client library is used to communicate with an Endb instance from a JavaScript or TypeScript application. Type declarations are provided in the package for TypeScript.

NPM Package: @endatabas/endb

Install


npm install @endatabas/endb
npm install ws

Usage Examples

Import


import { Endb, EndbWebSocket } from '@endatabas/endb';

Endb

Use the Endb class to communicate with Endb over HTTP. It accepts an optional url parameter. Options can be supplied for accept, username, and password. Accept headers default to LD-JSON and can be set to any valid content type listed in the HTTP API. If you choose application/vnd.apache.arrow.file or application/vnd.apache.arrow.stream, the raw response body will be be returned from sql().


var e = new Endb();
var e = new Endb('http://localhost:3803/sql');
var e = new Endb('http://localhost:3803/sql', {accept: 'text/csv'});
var e = new Endb('http://localhost:3803/sql', {accept: 'application/json', username: 'zig', password: 'zag'});

NOTE: Choosing accept headers other than LD-JSON will return JavaScript data structures symmetrical with those returned from the respective accept header provided to the HTTP API. text/csv returns comma-delimited strings, application/json returns tuples as arrays, and so on.

EndbWebSocket

Use the EndbWebSocket class to communicate with Endb over WebSockets. It accepts an optional url parameter. Options can be supplied for ws (any implementation of the JavaScript WebSocket interface definition), username, and password. In a web browser, ws will default to the web browser's WebSocket implementation. EndbWebSocket only communicates in LD-JSON, so the accept header cannot be set.


// in web browser:
var ews = new EndbWebSocket();
var ews = new EndbWebSocket('ws://localhost:3803/sql', {username: 'zig', password: 'zag'});

// in node.js:
import WebSocket from 'ws';
var ews = new EndbWebSocket({ws: WebSocket});
var ews = new EndbWebSocket('ws://localhost:3803/sql', {ws: WebSocket, username: 'zig', password: 'zag'});

sql()

The asynchronous sql method is available to both Endb and EndbWebSocket. It accepts q, and optional p, m, and accept parameters and returns an array of strongly-typed documents. (See JavaScript API Reference.)

To ignore the p or m parameters but still supply an accept header, supply the default values or null.


e.sql("SELECT * FROM users;");
e.sql("INSERT INTO USERS (date, name, email) VALUES (?, ?, ?);", [new Date(), 'Aaron', 'aaron@canadahealth.ca']);
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true);
e.sql("SELECT * FROM users;", [], false, 'text/csv');
e.sql("SELECT * FROM users;", null, null, 'application/json');
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');

Template Literals

It is possible to use Template Literals (string templating) to pass named SQL parameters. The parameter passed to the Template Literal is only valid when used in a position where a positional SQL parameter is also valid. The signature which accepts Template Literals does not accept any other parameters to the method.


var n = 'Michael';
e.sql`INSERT INTO users (name) VALUES (${n});`;

var u = {name: 'Radha', roles: ['artist', 'marketing']};
e.sql`INSERT INTO users ${u}`;

Data Types

When an LD-JSON (default) accept header is used, strongly typed data is returned according to this mapping:

null - null
xsd:date - Date
xsd:dateTime - Date
xsd:base64Binary - Uint8Array
xsd:integer - BigInt

For more information on Endb data types, see the Data Types doc.

Complete Examples


import { Endb } from '@endatabas/endb';

var e = new Endb();
await e.sql("INSERT INTO users {name: 'Thupil'};");
var result = await e.sql("SELECT * FROM users;");
console.log(result);

var e2 = new Endb('http://localhost:3803/sql', {accept: 'application/json', username: 'zig', password: 'zag'});
await e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
result = await e.sql("SELECT * FROM users;", null, null, 'application/json');
console.log(result);


import WebSocket from 'ws';
import { EndbWebSocket } from '@endatabas/endb';

var ews = new EndbWebSocket({ws: WebSocket});
await ews.sql("insert into users {name: 'Lydia'};");
var ws_result = await ews.sql("select * from users;");
console.log(ws_result);

var ews2 = new EndbWebSocket({ws: WebSocket, username: 'zig', password: 'zag'});
await ews2.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
ws_result = await ews2.sql("SELECT * FROM users;", null, null, 'application/json');
console.log(ws_result);

JavaScript API Reference

NOTE: The following API documentation is generated from source code docstrings in the endb repository.

Classes

Endb: Endatabas client for the HTTP API
EndbWebSocket: Endatabas client for the WebSocket API

Endb

Endatabas client for the HTTP API

Kind: global class

Endb
- new Endb([url], [opt])
- .sql(q, [p], [m], [accept]) ⇒ Promise.<Array>

`new Endb([url], [opt])`

Create an Endb object (Endatabas client for the HTTP API)

Param	Type	Default	Description
[url]	`string`	`"http://localhost:3803/sql"`	HTTP URL to the Endatabas /sql API
[opt]	`Object`		HTTP options
[opt.accept]	`string`	`"application/ld+json"`	Accept Header content type
[opt.username]	`string`		username for HTTP Basic Auth
[opt.password]	`string`		password for HTTP Basic Auth

`endb.sql(q, [p], [m], [accept])` ⇒ `Promise.<Array>`

Execute a SQL statement over HTTP

Kind: instance method of Endb
Returns: Promise.<Array> - - Array of documents

Param	Type	Description
q	`string`	SQL query as string or Template Literal
[p]	`Array` \| `Object`	Positional parameters, named parameters, or an array of either
[m]	`boolean`	many parameters flag
[accept]	`string`	Accept Header content type

Example


// Simple query
sql("SELECT * FROM users;");
// Positional parameters
sql("INSERT INTO users (date, name) VALUES (?, ?);", [new Date(), 'Aaron']);
// Named parameters
sql("INSERT INTO users {date: :d, name: :n};", {d: new Date(), n: 'Aaron'});
// Many positional parameters (batches)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true);
// Many named parameters (batches)
sql("INSERT INTO users {name: :n};", [{n: 'Judy'}, {n: 'Addis'}], true);
// All parameters (many parameters and accept header)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
// Named parameters via Template Literals
sql(`INSERT INTO users (name) VALUES (${u});`, [{u: 'Michael'}]);

EndbWebSocket

Endatabas client for the WebSocket API

Kind: global class

EndbWebSocket
- new EndbWebSocket([url], [opt])
- .close()
- .sql(q, [p], [m]) ⇒ Promise.<Array>

`new EndbWebSocket([url], [opt])`

Create an EndbWebSocket object (Endatabas client for the WebSocket API)

Param	Type	Default	Description
[url]	`string`	`"ws://localhost:3803/sql"`	WebSocket URL to the Endatabas /sql API
[opt]	`Object`		WebSocket options
[opt.ws]	`string`		WebSocket implementation
[opt.username]	`string`		username for Basic Auth
[opt.password]	`string`		password for Basic Auth

`endbWebSocket.close()`

Close the WebSocket connection

Kind: instance method of EndbWebSocket

`endbWebSocket.sql(q, [p], [m])` ⇒ `Promise.<Array>`

Execute a SQL statement over a WebSocket with LD-JSON

Kind: instance method of EndbWebSocket
Returns: Promise.<Array> - - Array of documents

Param	Type	Description
q	`string`	SQL query as string or Template Literal
[p]	`Array` \| `Object`	Positional parameters, named parameters, or an array of either
[m]	`boolean`	many parameters flag

Example


// Simple query
sql("SELECT * FROM users;");
// Positional parameters
sql("INSERT INTO users (date, name) VALUES (?, ?);", [new Date(), 'Aaron']);
// Named parameters
sql("INSERT INTO users {date: :d, name: :n};", {d: new Date(), n: 'Aaron'});
// Many positional parameters (batches)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true);
// Many named parameters (batches)
sql("INSERT INTO users {name: :n};", [{n: 'Judy'}, {n: 'Addis'}], true);
// All parameters (many parameters and accept header)
sql("INSERT INTO users (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], true, 'text/csv');
// Named parameters via Template Literals
sql(`INSERT INTO users (name) VALUES (${u});`, [{u: 'Michael'}]);

Python

The Python client library is used to communicate with an Endb instance from a Python application.

PyPI Package: endb

Install


pip install endb
pip install websockets  # optional WebSocket support
pip install pyarrow     # optional Apache Arrow support

Usage Examples

Import


from endb import (Endb, EndbWebSocket)

Endb

Use Endb to communicate with Endb over HTTP. It accepts optional url, accept, username, and password parameters. Accept headers default to LD-JSON and can be set to any valid content type listed in the HTTP API. If you have pyarrow installed, you can also use application/vnd.apache.arrow.file and application/vnd.apache.arrow.stream.


e = Endb()
e = Endb('http://localhost:3803/sql')
e = Endb('http://localhost:3803/sql', 'text/csv')
e = Endb('http://localhost:3803/sql', 'application/json', 'zig', 'zag')

NOTE: Choosing accept headers other than LD-JSON will return Python data structures symmetrical with those returned from the respective accept header provided to the HTTP API. text/csv returns comma-delimited strings, application/json returns tuples as lists, and so on.

EndbWebSocket

Use the EndbWebSocket class to communicate with Endb over WebSockets. It accepts optional url, username, and password parameters.


ews = EndbWebSocket()
ews = EndbWebSocket('ws://localhost:3803/sql', 'zig', 'zag')

sql()

The sql method is available to both Endb and EndbWebSocket. It accepts q, and optional p, m, and accept parameters and returns an list of strongly-typed documents. (See Python API Reference.)

It is sychronous for Endb and asynchronous for EndbWebSocket. To ignore the p or m parameters but still supply an accept header, supply the default values or use a named (accept) parameter.


from datetime import date, datetime, timezone
e.sql("INSERT INTO USERS (date, name, email) VALUES (?, ?, ?);", [datetime.now(timezone.utc), 'Aaron', 'aaron@canadahealth.ca'])
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True)
e.sql("SELECT * FROM users;", [], False, 'text/csv')
e.sql("SELECT * FROM users;", accept = 'text/csv')
e.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True, 'text/csv')

Data Types

When an LD-JSON (default) accept header is used, strongly typed data is returned according to this mapping:

null - None
xsd:date - datetime.date
xsd:time - datetime.time
xsd:dateTime - datetime.datetime
xsd:base64Binary - bytes (from base64)
xsd:integer - int

For more information on Endb data types, see the Data Types doc.

Complete Examples


from endb import Endb
from datetime import date, datetime, timezone
e = Endb()
e.sql("INSERT INTO users {name: 'Yuvi'}")
e.sql("SELECT * FROM users;")

e2 = Endb('http://localhost:3803/sql', 'application/json', 'zig', 'zag')
e2.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True, 'text/csv')
e2.sql("SELECT * FROM users;", [], False, 'application/json')

When the websockets dependency is installed, it is possible to return asynchronous results to the Python interactive shell directly if you start it with python3 -m asyncio:


from endb import EndbWebSocket
ews = EndbWebSocket()
await ews.sql("INSERT INTO users {name: 'Lydia'}")
ws_result = await ews.sql("SELECT * FROM users;")
print(ws_result)

ews2 = EndbWebSocket(username = 'zig', password = 'zag')
await ews2.sql("INSERT INTO USERS (name) VALUES (?);", [['Aaron'], ['Kurt'], ['Cindy']], True, 'text/csv')
ws_result = await ews2.sql("SELECT * FROM users;", [], False, 'application/json')
print(ws_result)

Python API Reference

NOTE: The following API documentation is generated from source code docstrings in the endb repository.

endb module

class endb.Endb(url='http://localhost:3803/sql', accept='application/ld+json', username=None, password=None)

Bases: AbstractEndb

An Endatabas client for the HTTP API

url

HTTP URL of an Endatabas instance

Type: str

accept

Accept header content type

Type: str

username

Username for HTTP Basic Auth

Type: str

password

Password for HTTP Basic Auth

Type: str

sql(q, p=[], m=False, accept=None)

Executes a SQL statement

The SQL statement is sent to Endb.url over HTTP.

Parameters:
- q (str) – SQL statement or query to execute
- p (array_like , default= [ ]) – Positional or named SQL parameters (or an array of either, if using many parameters)
- m (bool , default=False) – Many parameters flag
- accept (str , optional) – Accept header content type (defaults to Endb.accept)
Raises: TypeError – Internal error if attempting to translate an unknown type to LD-JSON.

class endb.EndbWebSocket(url='ws://localhost:3803/sql', username=None, password=None)

Bases: AbstractEndb

An Endatabas client for the HTTP API

url

HTTP URL of an Endatabas instance

Type: str

username

Username for HTTP Basic Auth

Type: str

password

Password for HTTP Basic Auth

Type: str

async close()

Closes the WebSocket connection

async sql(q, p=[], m=False, accept=None)

Executes a SQL statement

The SQL statement is sent to Endb.url over WebSockets.

Parameters:
- q (str) – SQL statement or query to execute
- p (array_like , default= [ ]) – Positional or named SQL parameters (or an array of either, if using many parameters)
- m (bool , default=False) – Many parameters flag
- accept (str , optional) – Ignored. WebSocket communication is always in LD-JSON.
Raises:
- AssertionError – If ‘id’ of request and response do not match.
- RuntimeError – If response from server contains an error.

Reference

Reference documentation exists as a detailed perspective on each feature of Endb. It does not need to be read from beginning to end.

Installation

You only need one of the following installation options. If you followed the Quickstart, you already have the Docker Hub version installed.

Warning: Early Beta

Endb is still in early beta.

Installing from Docker Hub

There are two major release channels available: stable and nightly. To run the nightly builds, replace endatabas/endb with endatabas/endb:nightly in the commands below.

If you run Docker, you can use the default command below. --rm cleans up (optional), -p exposes the default endb port, and -v mounts a local volume so your data persists even if you shut down the Docker image.


mkdir -p endb_data
docker pull endatabas/endb
docker run --rm -p 3803:3803 -v endb_data:/app/endb_data endatabas/endb

If you run Podman, you'll need to specify the docker.io repo explicitly:


mkdir -p endb_data
podman pull docker.io/endatabas/endb
podman run --rm -p 3803:3803 -v endb_data:/app/endb_data docker.io/endatabas/endb

Installing from Git: Docker

If you want to run endb from the main branch, compile and build the Docker image:

Installing from Git: Binary

If you don't want Docker at all, you can compile and run the endb binary:

https://github.com/endatabas/endb/#building
./target/endb

NOTE: If you move the endb binary, be sure to copy libendb.so (Linux) or libendb.dylib (MacOS) into the same directory. This is because endb requires libendb to run.

Operation

The endb executable aims to provide self-explanatory help for direct usage of the binary.

See Monitoring for more information about logging, metrics, and tracing.


$ endb --help
Usage: endb [OPTIONS]

Options:
  -d, --data-directory <DATA_DIRECTORY>  [env: ENDB_DATA_DIRECTORY=] [default: endb_data]
  -h, --help                             Print help
  -V, --version                          Print version

Authentication:
      --username <USERNAME>  [env: ENDB_USERNAME=]
      --password <PASSWORD>  [env: ENDB_PASSWORD=]

Network:
  -p, --port <PORT>                  [env: ENDB_PORT=] [default: 3803]
      --bind-address <BIND_ADDRESS>  [env: ENDB_BIND_ADDRESS=] [default: 0.0.0.0]
      --protocol <PROTOCOL>          [env: ENDB_PROTOCOL=] [default: http] [possible values: http, https]
      --cert-file <CERT_FILE>        [env: ENDB_CERT_FILE=]
      --key-file <KEY_FILE>          [env: ENDB_KEY_FILE=]

The -d option accepts a special value of :memory: to run an in-memory node, without persisting anything to disk.

The --cert-file and --key-file options are ignored when --protocol is set to http. When --protocol is set to https, they are both required.

Backup and Restore

If you would like to back up your Endb data, you can use any commodity copy or sync tool (such as rsync) to maintain a copy of your --data-directory (ENDB_DATA_DIRECTORY).

To restore that directory, stop Endb, replace or sync that directory where Endb is running, and restart Endb.

In the future, a backup option will be provided for an object store separated from compute.

Monitoring

Endb exposes plaintext logs, Prometheus metrics, and optional OpenTelemetry tracing. Any one, or a combination, of these can be used to monitor an Endb instance.

Some monitoring options are offered as flags. Set flags by setting the environment variable to 0 or 1 (ex. ENDB_LOG_ANSI=0).

Logging

By default, endb logs to STDOUT with a log level of endb=INFO.

Adjust the log level with the ENDB_LOG_LEVEL environment variable. Endb uses Rust Logging and more details about log levels are available in that document.

Other flags include:

ENDB_LOG_ANSI - turns ANSI terminal colour output on or off (on by default)
ENDB_LOG_THREAD_IDS - set to enable or disable logging of thread ids (enabled by default)

Example:


docker run --rm -e ENDB_LOG_LEVEL=endb=DEBUG -e ENDB_LOG_ANSI=0 -e ENDB_LOG_THREAD_IDS=1 -p 3803:3803 -v demo_data:/app/endb_data docker.io/endatabas/endb:latest

Prometheus

A Prometheus endpoint is available at /metrics. This endpoint is enabled by default. If your Endb instance is running locally, you can view metrics in a browser at http://localhost:3803/metrics

The Prometheus tracing level defaults to endb=DEBUG. Set the tracing level with ENDB_TRACING_LEVEL.

OpenTelemetry

To enable OpenTelemetry, set ENDB_TRACING_OTEL=1 and OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317/ Configuration options for the OTLP exporter and additional OTLP environment variables may also prove useful for configuration.

The OpenTelemetry tracing level defaults to endb=DEBUG. Set the tracing level with ENDB_TRACING_LEVEL.

At the moment, only gRPC metrics are exposed, not HTTP.

Metrics

Endb exposes a number of metrics to the Prometheus endpoint and OpenTelemetry. These can be useful in debugging performance of applications connected to Endb.

Histograms

query_real_time_duration_seconds
query_gc_run_time_duration_seconds
query_consed_bytes
http_request_duration_seconds

Counters

queries_active
interactive_transactions_active
buffer_pool_usage_bytes
dynamic_space_usage_bytes

Monotonic Counters

websocket_message_internal_errors_total
object_store_read_bytes_total
object_store_written_bytes_total
queries_total
transactions_conflicted_total
transactions_committed_total
transactions_prepared_total
transactions_retried_total
wal_read_bytes_total
wal_written_bytes_total
http_requests_total

Tracing

Endb exposes a variety of tracing spans to OpenTelemetry. Tracing data is mostly useful if your Endb instance is not performing in the way you expect.

buffer_pool_eviction
build_info
commit
compaction
constraints
gc
index
log_replay
log_rotation
object_store_delete
object_store_get
object_store_list
object_store_put
query
shutdown
snapshot
startup
wal_append_entry
wal_read_next_entry
wal_fsync
websocket_connections_active
websocket_message_duration_seconds
websocket_messages_total

HTTP API

If a client is not available for your programming language yet, your app can interact directly with the Endb HTTP API. Any HTTP client may be used but in the examples below we'll use curl to demonstrate the API without writing any code.

You can send SQL statements to endb over HTTP:


curl -d "INSERT INTO users (name) VALUES ('Tianyu')" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
curl -d "SELECT * FROM users" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql

You can send SQL to endb with standard HTTP Query Parameters, Verbs, Content Types, Accept Headers, and HTTP Basic Authentication. Each one is outlined below.

HTTP Query Parameters

The query parameters Endb's HTTP endpoint accepts are:

q - (q)uery: a SQL query, optionally parameterized
p - (p)arameters: named or positional parameters
m - (m)any parameter lists: bulk parameters, used for bulk insert/update

HTTP Verbs

POST allows explicit Content Types and Accept headers:


curl -d 'SELECT 1' -H "Content-Type: application/sql" -H "Accept: text/csv" -X POST http://localhost:3803/sql

GET allows a single, simple URL. GET does not permit DML.


curl -X GET "http://localhost:3803/sql?q=SELECT%201"

Content Types

The HTTP Content-Type header is used to specify what format the client is sending data to Endb.

`application/json`:


curl -d '{"q": "SELECT * from products;"}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "SELECT * from products WHERE name = ?;", "p": ["Salt"]}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "INSERT INTO products {name: :name};", "p": {"name": "Paprika"}}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql

NOTE: To enable strongly-typed values, payloads sent with the application/json content type have values resolved with JSON-LD scalars. Standard JSON values are a subset of JSON-LD scalars, so data sent as regular JSON is unaffected by this behaviour.

`application/ld+json`

Although values in the application/json content type are resolved using JSON-LD scalars, you can explicitly specify an application/ld+json content type to avoid all ambiguity. See JSON-LD.


curl -d '{"q": "INSERT INTO events {start: :start};", "p": {"start": {"@type": "xsd:dateTime", "@value": "2011-04-09T20:00:00Z"}}}' -H "Content-Type: application/ld+json" -X POST http://localhost:3803/sql

`application/sql`:


curl -d 'SELECT 1' -H "Content-Type: application/sql" -X POST http://localhost:3803/sql

Submit parameters to application/sql by providing form data or query parameters. Form data and query parameters can be combined, though it is not necessarily recommended.


curl -F q="INSERT INTO sauces {name: ?, color: ?};" -X POST http://localhost:3803/sql?p=%5B%22ketchup%22%2C%22purple%22%5D

`multipart/form-data`


curl -F q="SELECT * from products;" -H "Content-Type: multipart/form-data" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO products {name: ?};" -F p='["Sriracha"]' -X POST http://localhost:3803/sql

NOTE: Many HTTP clients (including curl) automatically assume a content type of multipart/form-data when form fields are provided. This is true for curl when the -F (--form) argument is used and it has been elided from further examples.

`application/x-www-form-urlencoded`

Although the other content types are preferable for obvious reasons, application/x-www-form-urlencoded is offered for completeness.


curl -d 'q=SELECT%20*%20FROM%20products;' -H "Content-Type: application/x-www-form-urlencoded" -X POST http://localhost:3803/sql

Accept Headers

The HTTP Accept header is used to specify how data is returned to the Endb client. The default Accept header content type is application/json.

text/csv

text/csv returns comma-separated rows. Column order from the SELECT clause is maintained.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,'csv')) t1" -H "Content-Type: application/sql" -H "Accept: text/csv" -X POST http://localhost:3803/sql

returns:


"column1","column2"
2,"csv"
1,"hello"

application/json

application/json returns rows as an array of JSON tuples. Column order from the SELECT clause is maintained.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/json" -X POST http://localhost:3803/sql

returns:


[[2,"2023-07-22"],[1,"hello"]]

application/x-ndjson

application/x-ndjson returns newline-delimited JSON documents. Column order from the SELECT clause is not maintained. JSON documents cannot guarantee column order.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/x-ndjson" -X POST http://localhost:3803/sql

returns:


{"column1":2,"column2":"2023-07-22"}
{"column1":1,"column2":"hello"}

application/ld+json

application/ld+json returns documents of strongly-typed ("Linking Data") JSON records. Column order from the SELECT clause is not maintained. JSON documents cannot guarantee column order.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/ld+json" -X POST http://localhost:3803/sql

returns:


{"@context":{"xsd":"http://www.w3.org/2001/XMLSchema#","@vocab":"http://endb.io/"},"@graph":[{"column1":2,"column2":{"@value":"2023-07-22","@type":"xsd:date"}},{"column1":1,"column2":"hello"}]}

See JSON-LD.

multipart/mixed

multipart/mixed returns documents as a valid multipart/mixed response, albeit all as LD-JSON.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: multipart/mixed" -X POST http://localhost:3803/sql

returns:


--369d7885-c6e9-4b01-93c9-9449a5fcd637
Content-Type: application/ld+json

{"@context":{"xsd":"http://www.w3.org/2001/XMLSchema#","@vocab":"http://endb.io/"},"@graph":[{"column1":2,"column2":{"@value":"2023-07-22","@type":"xsd:date"}},{"column1":1,"column2":"hello"}]}

--369d7885-c6e9-4b01-93c9-9449a5fcd637--

application/vnd.apache.arrow.file

application/vnd.apache.arrow.file returns columnar data as an Apache Arrow file.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/vnd.apache.arrow.file" -X POST http://localhost:3803/sql --output hello.arrow

The above command returns a file containing a single RecordBatch in an Apache Arrow file in IPC format. You can examine the file with functions like pyarrow.ipc.open_file, as seen in this gist.

application/vnd.apache.arrow.stream

application/vnd.apache.arrow.stream returns columnar data as an Apache Arrow stream.


curl -d "SELECT * FROM (VALUES (1,'hello'), (2,DATE('2023-07-22'))) t1" -H "Content-Type: application/sql" -H "Accept: application/vnd.apache.arrow.stream" -X POST http://localhost:3803/sql --output streamed.arrows

The above command returns a file containing an Apache Arrow IPC stream. You can examine the file with functions like pyarrow.ipc.open_stream, as seen in this gist.

HTTP Basic Authentication

Endb supports HTTP Basic Authentication as defined by RFC 7235. Pass --username and --password arguments to the endb binary to force basic authentication for HTTP connections. (See Operation for more details, including environment variables which can be passed to Docker images.)


./target/endb --username zig --password zag

Then, from any HTTP client, provide the username and password combination to execute queries.


curl --user zig:zag -d "SELECT 'Hello World';" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql

If the client passes an incorrect username or password, it will receive a 401 Authorization Required HTTP status code as a result, but no body. Be aware of this to ensure client code is written to detect 401 status codes.


$ curl -i --user zig:wrong -d "SELECT 'Hello World';" -H "Content-Type: application/sql" -X POST http://localhost:3803/sql
HTTP/1.1 401 Authorization Required

NOTE: HTTP Basic Auth is not secure without the additional use of HTTPS.

Parameters

SQL parameters are available to:

application/json and application/ld+json as part of the POST body
multipart/form-data as form data
application/x-www-form-urlencoded as URL query parameters
application/sql as form data and/or URL query parameters

Parameters can be JSON literals, JSON-LD scalars, or SQL literals. A JSON-LD scalar always has the form: {"@type": "xsd:TYPE", "@value": "DATA"}. JSON-LD types are listed under the Data Types table.

Named Parameters

Named parameters substitute parameter placeholders with the form :param by the parameter key with the corresponding name. Named parameters are represented as a JSON object.


curl -d '{"q": "INSERT INTO products {name: :name, price: :price};", "p": {"name": "Paprika", "price": 2.99}}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "INSERT INTO events {start: :start};", "p": {"start": {"@type": "xsd:dateTime", "@value": "2011-04-09T20:00:00Z"}}}' -H "Content-Type: application/ld+json" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO products {name: :sauce};" -F p='{"sauce": "Sriracha"}' -X POST http://localhost:3803/sql

Positional Parameters

Positional parameters substitute parameter placeholders with the form ? by the parameter values, in the order they appear. Positional parameters are respresented as a JSON array.


curl -d '{"q": "SELECT * FROM products WHERE name = ? AND price > ?;", "p": ["Salt", 3.99]}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -d '{"q": "INSERT INTO events {start: ?};", "p": [{"@type": "xsd:dateTime", "@value": "2011-04-09T20:00:00Z"}]}' -H "Content-Type: application/ld+json" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO products {name: ?};" -F p='["Sriracha"]' -X POST http://localhost:3803/sql

Bulk Parameters

Bulk operations are possible by setting the m flag to true. Bulk operations are available to both named and positional parameters. The list of parameters supplied in bulk must be nested in an array.


curl -d '{"q": "INSERT INTO products {name: :name};", "p": [{"name": "Soda"}, {"name": "Tonic"}], "m": true}' -H "Content-Type: application/json" -X POST http://localhost:3803/sql
curl -F q="INSERT INTO sauces {name: ?, color: ?};" -F p='[["Mustard", "Yellow"], ["Ketchup", "Red"]]' -F m=true -X POST http://localhost:3803/sql

Apache Arrow File Parameters

As it is possible to receive Apache Arrow data from an Endb query, it is possible to submit Apache Arrow as a statement parameter. The example below assumes the existence of a a table called names, which only contains one column (name). Apache Arrow Streams can also be used as parameters in the same way.


# create a sample Arrow file:
curl -d "SELECT * FROM names;" -H "Content-Type: application/sql" -H "Accept: application/vnd.apache.arrow.file" -X POST http://localhost:3803/sql --output names.arrow
# use the sample Arrow file:
curl -F m=true -F q="INSERT INTO projects {name: :name};" -F "p=@names.arrow;type=application/vnd.apache.arrow.file" -X POST http://localhost:3803/sql

NOTE: This feature should be used with caution. Do not submit arbitrary Arrow files as parameters. If a malformed Arrow file is submitted, the error message returned (if any) is unlikely to provide guidance. Preferably, Arrow files should be created using Endb itself, as in the example above. Most users will prefer to use a human-readable file format instead, such as a JSON variant or static SQL statements.

Bulk Insert

Bulk inserts are possible by combining the tools mentioned under Parameters.

For example, the OBJECTS keyword can insert an array of object literals. Note that each object used as a positional parameter must be wrapped in a JSON array, since there may be more than one positional parameter supplied. Similarly, each named parameter must be wrapped in an object containing a key of the corresponding name.


curl -F m=true -F q="INSERT INTO products OBJECTS ?" -F p="[[{name: 'jam'}], [{name: 'butter'}]]" -X POST http://localhost:3803/sql
curl -F m=true -F q="INSERT INTO products OBJECTS :product" -F p="[{product: {name: 'jelly'}}, {product: {name: 'ghee'}}]" -X POST http://localhost:3803/sql

Multiple Statements

It is possible to pass multiple SQL statements to Endb by delimiting them with semicolons. Parameters will be passed to all statements in order.

Only the result of the last statement is returned to the client. In the following example, the INSERT will be successful but will not return a result. The SELECT will return to the client.


curl --form-string q="INSERT INTO sauces {name: ?, color: ?}; SELECT {namo: ?, colour: ?};" -F p='["Mustard", "Yellow", "Ketchup", "Red"]' -X POST http://localhost:3803/sql

NOTE: --form-string is required instead of --form to send semicolon-delimited statements with curl.

WebSocket API

If a client is not available for your programming language yet, your app can interact directly with the Endb WebSocket API. Any WebSocket client can be used but in the examples below we'll use websocat to demonstrate the API without writing any code. Connect to Endb with:


websocat ws://localhost:3803/sql

You can send SQL statements to endb over WebSockets with JSON-RPC 2.0.


{"jsonrpc": "2.0", "id": 1, "method": "sql", "params": {"q": "INSERT INTO users (name) VALUES (?);", "p": [["Tsehay"], ["Iniku"]], "m": true}}
{"jsonrpc": "2.0", "id": 2, "method": "sql", "params": {"q": "SELECT * FROM users;", "p": [], "m": false}}

NOTE: To send the method calls above, paste them into a websocat session one-at-a-time and press <enter>. Further examples will assume JSON-RPC strings are sent in an existing websocat session.

JSON-RPC Request Object

Following the JSON-RPC 2.0 spec, a SQL statement or query to the Endb WebSocket API requires these components:

id - a string or integer identifier used to correlate requests to responses
method - always the string "sql"
params - request params are sent as JSON-RPC by-name request parameters, as listed below

JSON-RPC Request Parameters

q - (q)uery: a SQL query, optionally parameterized
p - (p)arameters: named or positional SQL parameters
m - (m)any parameter lists: bulk parameters, used for bulk insert/update

JSON-RPC Response Object

A response from the Endb WebSocket API will include these components:

id - the id provided in the request object, so it can be correlated
result - the response, encoded as JSON-LD, as in this example:


{"jsonrpc":"2.0", "id":111, "result":{"@context":{"xsd":"http://www.w3.org/2001/XMLSchema#","@vocab":"http://endb.io/"},"@graph":[{"name":"Hing","price":2.99}]}}

If your query is invalid or causes an internal error, you will receive an error instead, which includes:

id - the id of the request object, if one was provided (otherwise null)
error - a code and message, as returned by Endb


{"jsonrpc":"2.0", "id":222, "error":{"code":-32603,"message":"Error: Unknown table: users\n   ╭─[<unknown>:1:15]\n   │\n 1 │ SELECT * FROM users;\n   │               ──┬──  \n   │                 ╰──── Unknown table\n───╯\n"}}

WebSocket Basic Authentication

Endb supports HTTP Basic Authentication, as defined by RFC 7235, on top of RFC 6455. This is because the WebSocket Protocol RFC (6455) does not define authentication mechanisms itself.

Pass --username and --password arguments to the endb binary to force basic authentication for HTTP connections. (See Operation for more details, including environment variables which can be passed to Docker images.)


./target/endb --username zig --password zag

Then, use Basic Auth headers to connect to Endb:


websocat -H='Authorization: Basic' ws://zig:zag@localhost:3803/sql

Rather than Authorization: Basic, the Sec-WebSocket-Protocol header may be used. It is offered because web browsers do not support the Basic Auth header over WebSockets.

Parameters

SQL parameters to a WebSocket request are always JSON-LD. A JSON-LD scalar always has the form: {"@type": "xsd:TYPE", "@value": "DATA"}. JSON-LD types are listed under the Data Types table.

For example, if a strongly-typed DateTime is required, it can be submitted like so:


{"jsonrpc": "2.0", "id": 11, "method": "sql", "params": {"q": "INSERT INTO users (name, date) VALUES (?, ?);", "p": ["dosha", {"@value": "2024-01-29T18:18:30.129159", "@type": "xsd:dateTime"}], "m": false}}

Named Parameters

Named SQL parameters substitute parameter placeholders with the form :param by the parameter key with the corresponding name. Named parameters are represented as a JSON object.


{"jsonrpc": "2.0", "id": 111, "method": "sql", "params": {"q": "INSERT INTO products {name: :name, price: :price};", "p": {"name": "Hing", "price": 2.99}, "m": false}}
{"jsonrpc": "2.0", "id": 112, "method": "sql", "params": {"q": "INSERT INTO events {start: :start};", "p": {"start": {"@type": "xsd:dateTime", "@value": "2021-04-09T20:00:00Z"}}, "m": false}}

Positional Parameters

Positional SQL parameters substitute parameter placeholders with the form ? by the parameter values, in the order they appear. Positional parameters are respresented as a JSON array.


{"jsonrpc": "2.0", "id": 211, "method": "sql", "params": {"q": "SELECT * FROM products WHERE name = ? AND price > ?;", "p": ["Hing", 2.00], "m": false}}
{"jsonrpc": "2.0", "id": 213, "method": "sql", "params": {"q": "INSERT INTO events {start: ?};", "p": [{"@type": "xsd:dateTime", "@value": "2021-04-09T20:00:00Z"}], "m": false}}

Bulk Parameters


{"jsonrpc": "2.0", "id": 311, "method": "sql", "params": {"q": "INSERT INTO users {name: :name};", "p": [{"name": "Sungwon"}, {"name": "Olga"}], "m": true}}
{"jsonrpc": "2.0", "id": 312, "method": "sql", "params": {"q": "INSERT INTO sauces {name: ?, color: ?};", "p": [["Sriracha", "#FE6F5E"], ["Gochujang", "#FF8066"]], "m": true}}

Transactions

Unlike the HTTP API, the WebSocket API is stateful and thus capable of explicit transactions. See the Transactions documentation for further details on SQL syntax.


{"jsonrpc": "2.0", "id": 411, "method": "sql", "params": {"q": "BEGIN TRANSACTION;", "p": [], "m": false}}
{"jsonrpc": "2.0", "id": 412, "method": "sql", "params": {"q": "INSERT INTO delete_me {name: 'Roll Me Back'};", "p": [], "m": false}}
{"jsonrpc": "2.0", "id": 413, "method": "sql", "params": {"q": "ROLLBACK;", "p": [], "m": false}}
{"jsonrpc": "2.0", "id": 414, "method": "sql", "params": {"q": "SELECT * FROM delete_me;", "p": [], "m": false}}

Data Types

Scalars

Endb supports the following native scalar types for now.

JSON-LD values with a type of @json will be returned from Endb as JSON literals, verbatim. The @type listed in parentheses is for reference purposes only. JSON literal types are implicit so a @value/@type pair is not returned for a JSON literal.

SQL	JSON-LD	Example Literal	Description
`NULL`	`@json`	`null`	Null, nil, undefined, or missing
`TEXT`	`@json` (`xsd:string`)	`"hello"`	UTF-8 string
`BOOLEAN`	`@json` (`xsd:boolean`)	`FALSE`	8-bit boolean
`INTEGER`	`@json` (`xsd:int[eger]`)	`42`	64/128-bit two's complement integer
`REAL`	`@json` (`xsd:double`)	`9007199254740992.123`	64-bit IEEE 754 floating point number
`TIMESTAMP`	`xsd:dateTime`	`2007-01-01T00:00:00`	ISO microsecond precision timestamp
`DATE`	`xsd:date`	`2007-01-01`	ISO date
`TIME`	`xsd:time`	`23:30:00`	ISO time
`INTERVAL`	`xsd:duration`	`PT12H34M56.789S`	ISO interval
`BLOB`	`xsd:base64Binary`	`x'DEADBEEF'`	Binary large object

SQL Data Types are covered in detail in the SQL Reference.

Collections

SQL	JSON-LD	Example Literal	Description
`ARRAY`	`@json`	`["Joe", "Dan", "Dwayne"]`	Zero-based array
`OBJECT`	`@json`	`{n: 3, b: 2023-01-01}`	Object, map, dict, document

Unsupported Scalar Types

SQL	JSON-LD	Description
`DECIMAL`	`xsd:decimal`	Arbitrary precision decimal. Use 2 `BIGINT`s or `VARCHAR`
`URI`	`xsd:anyURI`	Unsupported. Use `VARCHAR`
`UUID`	`@json` (`xsd:string`)	Unsupported. Use `VARCHAR` or `BLOB`

If you strongly feel you need a native representation of one of these types, email us: hello@endatabas.com

Appendix

What is Endatabas?

From the outside, Endatabas (or Endb) is an open source SQL document database with full history.

On the inside, this means all records in Endatabas are immutable. An ERASE statement is provided for compliance with laws and policies like GDPR. Time-travel is possible, but queries default to "as-of-now". (Which is the thing you want 97% of the time.) Rows are "schema-last" documents and joins can be arbitrary, but queries are written in SQL. Endb stores data in Apache Arrow: scalars are strongly typed, the on-disk format is columnar, and the execution engine understands rows and columns. Endb separates storage from compute to provide unlimited disk space.

In Swedish, Endatabas means both "a database" and "one database". This One Database Dream is twofold: Endb hopes to provide HTAP, so a secondary analytics database is not required for most users. Endb plans to use AI (adaptive indexing) to provide fast OLTP and OLAP queries on cheap, elastic infrastructure.

Who wants One Database?

After years of market research, the complaint of database users is universal: "I want to stop babysitting the database." This can mean many things but they're all expensive and complex.

The database equivalent of Greenspun's 10th Rule might be "any sufficiently complicated backend system contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half a database." This was true for much of the work we did in the 2010s and 2020s. "Babysitting" is sometimes the energy wasted by repeatedly building and maintaining ad-hoc databases for ourselves instead of building tools for customers.

Buying data products also requires babysitting. DBAs babysit Oracle indexes. Developers babysit Postgres query optimizations. Data engineers babysit ETL pipelines. Analysts baybsit Redshift history. SREs babysit Aurora costs.

Endb can't solve all these problems, but it attempts to be a jack-of-all-trades database that solves as many as it can — for as many people as it can.

When is One Database possible?

After years of Computer Science research, it's also clear a sea change in database tech is due... right about now. (Give or take ten years. Our timing may be off.)

Hellerstein and Stonebraker's What Goes Around Comes Around remains true, decade after decade, since it was published in 2004. As always, the relational data model is still king and SQL is still the lingua franca of databases. Together, they assimilate new rivals every decade or so. Endatabas tries to stand right at the center of this upcoming collapse of the data toolchain.

If we, as an industry, can drop a decade's vestigial growth in favour of a tighter, simpler solution? Wonderful. But what if we could shed a half-century's vestiges? Perhaps event streams, relations, documents, graphs, temporal data, ETL and CDC can all live under one roof for many businesses.

Let's see. We're not sure if we can make this work. But it's exciting to try.

—

One Database: Clean. Simple. Less.

(Read more in Why Endatabas?)

Why Endatabas?

(Read more in What is Endatabas?)

Why did we build Endatabas (aka Endb) at all? Isn't one of the many (many) existing databases good enough?

Many incumbent databases serve their use cases and markets well. But the demands placed on databases are growing rapidly. These demands pull in multiple directions, all at once, and existing technology cannot support them without introducing enormous complexity. Metamorphosis is required.

Endb takes good ideas and makes them easier to access, while reducing operational headache. It does not try to be flashy or unnecessarily revolutionary. Instead, it tries to be simple and familiar on the surface while providing a lot of new power under the covers.

Let's talk about what that means in clear, concrete terms.

What is Endatabas, anyway?

The tagline "SQL Document Database With Full History" says a lot, but it doesn't say everything. Endatabas is, first and foremost, an immutable database. That's the Full History part. But storing all your data, forever, has clear implications.

We consider these implications to be the pillars of Endatabas. In 3D geometry, the legs of a tripod are mutually supportive; as long as all three feet are in contact with the ground, the tripod will not wobble or collapse. So it is with the pillars. Each supports and implies the others. The pillars are as follows:

Full History (requires: immutable data and erasure)
Timeline (requires: time-traveling queries)
Separation of Storage from Compute (requires: light and adaptive indexing)
Documents (requires: schemaless tables, "schema-per-row", arbitrary joins)
Analytics (requires: columnar storage and access)

At the top of this five-dimensional structure is SQL, the lingua franca of database queries. We believe it is the right time to integrate these ideas. But first let's go back a few decades to see how we got here.

History

None of the ideas in Endatabas are new.

George Copeland's What if mass storage were free? asked, back in 1980, what an immutable database might look like. His prescient vision for a database with full history enjoys the clarity of a researcher at the beginning of the database era. People have occasionally asked of Endatabas, "why bother retaining all history?" But this is the wrong question. The real question is: "why bother destroying data?" Copeland's answers, "The deletion concept was invented to reuse expensive computer storage." The software industry has grown so accustomed to the arbitrary deletion of historical data that we now take destroying data for granted.

Mass storage is not free yet — but it is cheap. Copeland himself addresses "a more realistic argument: if the cost of mass storage were low enough, then deletion would become undesirable." Any system that exploits the separation of storage and compute can enjoy these low costs.

An immutable dataset and a timeline of changing states are two sides of the same coin. Previous states carry the innate property of time (whether defined by familiar wall clocks or versions or logical clocks). Jensen and Snodgrass have thoroughly researched time-related database queries. Much of their work was published in the 1990s and early 2000s. Storing time, querying across time, time as a value ... these challenging subjects eventually grew to form SQL:2011. Most SQL databases have struggled to implement SQL:2011. Incorporating time as a core concept in mutable databases (those which support destructive updates and deletes) amplifies existing complexity. Time should simplify the database, not complicate it.

Document databases have a more convoluted story. Attempts at "schemaless", semi-structured, document, and object databases stretch from Smalltalk in the 1980s to C++ in the 1990s to Java and graphs in the 2000s to JSON in the 2010s. Despite all this, the most successful semi-structured document store, as of 2023, is a Postgres database with JSON columns. Database users desire flexible storage and querying — but yesterday's weather says they desire SQL more. Can't we have both?

SQL has four identities, four histories. There is an SQL of academia, born of Codd's relational algebra (1970) and Chamberlin/Boyce SEQUEL (1974), grown over decades with research like Snodgrass/Jensen's TSQL2. Then there is the SQL of industry, the many-tentacled leviathan of IBM, Oracle, and Microsoft: the SQL sold to businesses and governments, ceaselessly bifurcated into new dialects with each version and implementation. Between these two rests the SQL of the ISO specification — unified across 11 published standards, from SQL-86 to SQL:2023, spanning thousands of pages, adhered to by no single database.¹ Last, there is colloquial SQL, the language one refers to by the question, "do you know SQL?" These four threads are intertwined across four decades, making it very difficult to clearly define what is meant by "SQL", even in very narrow contexts. Colloquial SQL is of greatest interest to us. This is the ubiquitous query language any new database must implement to succeed.

Khoshafian and Copeland introduced the Decomposition Storage Model (DSM) in 1985. The four decades that followed saw any number of approaches to data analytics. Most of the time, however, these demanded expensive data acrobatics: data was piped, streamed, dumped, and copied into denormalized cubes and time-series databases. As humanity grew out of the batch processing of the 1980s into the always-online society of the 2020s, analytics data became another form of operational data and this pipeline was looped back to users and customers. Hybrid Transactional/Analytical Processing (HTAP) promises a simpler, natural successor to OLTP and OLAP systems. For many businesses, the transactional/analytical divide is as arbitrary as destroying data with every state change.

Humanity will arbitrarily destroy data in 2026 because hard disks were expensive in 1986. Similarly, we will wastefully query data with multiple systems in 2026 because CPUs were slow in 1986.

Why SQL?

When you approach Endb from a distance, you won't see the pillars at first. You'll see the structure they support. This structure also binds the pillars together. The query language of a database is its glue. It is the user interface. It defines the user experience. It is at once a programming language, a dialogue, an envelope, a protocol. But a query engine is not divorced from its query language, and so the language choice also informs system design and behaviour.

There are mundane reasons for choosing SQL. If asked "do you know SQL?" there's a high probability the answer is "yes." SQL is the language of data science and third-party tools. If any sufficiently successful database has to provide SQL access anyway, one may as well start there.

But this is not why we chose SQL. We believe SQL can be beautiful.

As mentioned in History, we are primarily concerned with colloquial SQL. The other threads of SQL are relevant but Endb will never implement the entire SQL specification. Instead, the Endb SQL dialect unifies the pillars under one strongly-dynamically-typed umbrella, a variation of the SQLite dialect. SQLite's applications are quite different from those of Endatabas, so Endb SQL is not weakly-typed and Endb types are not a free-for-all. Instead, Endb internally adopts types from Apache Arrow, which it uses as a storage medium.

When considering alternatives, there are no direct contenders. Cypher, Datalog, MongoDB query documents, and other schemaless query languages work well for one database implementation but lack both mindshare and standards. PartiQL, SQL++, and other NewSQL languages that depart from SQL suffer precisely because they are almost SQL. One can fantasize about designing a query language from scratch but it's very easy to get wrong and takes decades to build mindshare. SQL has been through this decades-long gauntlet.

Just as PL/SQL and T-SQL differ, so will Endb SQL from other dialects. However, colloquial SQL is comparable to colloquial Hindi — at higher levels, it bifurcates into Urdu and Sanskrit but speakers of both lineages understand one another. Endb SQL will be familiar to users of other SQL dialects.

With its long, rich history SQL not only has the necessary theoretical underpinnings but the battle scars of technology that lasts. It sits alongside POSIX, TCP/IP, LISP, C, and the QWERTY keyboard layout. It will see its centenary.

Why Full History?

Even if we ignore Copeland's dream of mass storage from 1980, it is easy to see why destroying data is harmful. To destroy data is to destroy facts — to lie about the truth of what happened.

Few modern systems permit the total destruction of data for this obvious reason. Some choose to create audit tables: users_audits, sales_audits, and so on. Some choose to log anything and everything. "It's on disk somewhere." It's not uncommon for companies to extract metrics from logs to create invoices and reports, turning a log file into a bespoke immutable database.

Industries which take their data very seriously (banking, healthcare) already store immutable records. They just do so in a mutable database. Append-only tables are not new, but they're an order of magnitude easier to work with — for both users and operators — if the database is append-only from the ground up.

These same industries will resist the destruction of data unless absolutely necessary, but they will bend to necessity. Erasure is concomitant with immutability — we cannot have one without the other. The existing designs of databases create serious problems for privacy. DELETE, when overloaded to mean both "save disk space" and "explicitly remove this data", becomes opaque. It does not leave any queryable record of the deletion. Removing data should keep tombstones so it's at least known that some data was removed.

Why a timeline?

Keeping your data's entire history is the write-side of the equation. If you didn't care about getting this data back, you could just dump it into an unintelligible pile. But you not only want your data back, you want to query it in the easiest way possible.

One very sensible way to see and query immutable data is along a timeline. Endb makes no assumptions about your desire to participate in this timeline. By default, everything is visible as-of-now but querying the past should feel effortless.


-- without time travel:
SELECT * FROM products;
-- time travel to 2020:
SELECT * FROM products FOR SYSTEM_TIME AS OF 2020-08-25T00:00:00;

Why Separation of Storage and Compute?

Separating storage from compute is an implementation detail. AWS customers don't choose Amazon Aurora because they're craving this separation. Decoupling storage from compute makes scale (both up and down) trivial. It also introduces the possibility of "reducing network traffic, ... fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage."

This decoupling is concomitant with Light and Adaptive Indexing. It is undesirable to manually construct expensive indexes for unknown future schemas over (effectively) infinite data. Instead, we should let machine learning handle this job.

Why documents?

It can be argued that "why documents?" is really multiple questions: why schemaless? why nested data? why dynamic SQL?

First, the challenges. It is extremely difficult to force global schema onto every row in a table in an immutable world. Even if there were a simple mechanism in SQL to alter table schema only for certain durations (there isn't), the database user would still be burdened with querying based on a particular schema at a particular time in the history of the table. This complexity is compounded by the fact that static schemas have less and less meaning in a temporal world. Endb introduces SQL:2011 time-travel and period predicates. The difficulty, mentioned earlier, that other databases encounter when introducing SQL:2011 is twofold: dealing with "time" in a world where history can be violently rewritten and managing an unbending schema across time.

Nested data is equally unnatural in incumbent databases. SQL:99, SQL:2016, SQL:2023 all offer some way of shoehorning nested data into flat tables. DynamoDB, Couchbase, and MongoDB are the most common document databases but they don't speak SQL.² Not only will no one ever decompose relational data into 6NF, it is unlikely we'll ever see a return to the classic BCNF of business entity-relationship diagrams. Nested data is here to stay. Foreign, embedded JSON (or XML) is little more than a band-aid. Nested data should be native to SQL queries.

Second, the joys. Schema-per-row can be incredibly liberating. Not only does this feel more natural, it embraces the messy truth of the real world. Schema-on-write can be added later, when the business is ready to lock down what it knows about a domain. But many use cases demand flexible schemas. What if this week's project requires downloading huge amounts of semi-structured JSON pricing data from the APIs of competing online stores to compare them? Endb can handle this case out of the box. Most databases would require a great deal of manipulation first.

Dynamic SQL is required to support schemaless, nested data — but it also brings its own joys. When dynamic SQL rests on top of a flexible data model which can ingest any data, it is capable of exploring nested data and easily constructing arbitrary joins users could normally only construct in a graph database.

Why "One Database"?

It is often the job of analytics databases to record and query all the data of a business, denormalized for speed. There will always be analytical jobs which require data to be transformed. But many businesses already allow data scientists, analysts, and even CEOs read-only access to an OLTP replica.

HTAP is right on the horizon. Endb hopes to participate in that evolution by making all business data accessible to easy-to-produce, cheap, ephemeral read replicas.

Why Commercial Open Source?

We cannot reconcile building and selling a product we would never buy ourselves.

Why Now?

All of the pillars outlined above have their own strike-when-the-iron-is-hot moment. The SQLite we know today began to materialize in 2001. Amazon S3 was launched in 2006. MongoDB was first released in 2007. Immutable data and functional programming reached the mainstream in the 2010s. Datomic (arguably the first immutable OLTP database) was release in 2012. Amazon Aurora was released in 2015 and Google's AlloyDB in 2022. Apache Arrow saw its first release in 2016. Over the past decade, many financial firms built their own in-house temporal databases. SQL:2011, SQL:2016, and SQL:2023 were ratified in their respective eponymous years. HTAP hasn't quite happened yet. AI-driven indexes haven't quite happened yet.

The moment for something like Endatabas is now... but it is a very long moment. Endatabas cannot be built in a Postgres-compatible fashion. Technically, it would be prohibitively expensive to build Endatabas from existing components. It's time for something new.

The complete SQL specification is very difficult to implement in full, though Mimer likely comes closest to this goal.

There is XQuery, of course. But most businesses today do not build their nested data storage on XQuery.

Architecture

Immutable

All records in Endatabas are immutable. Historical data is not lost when an UPDATE or DELETE statement is executed. You can think of endb as one giant log of transactions with fast queries made possible by adaptive indexing.

Erasure

The only time data in Endatabas is truly deleted is with the ERASE statement. This is used for user safety and compliance with laws like GDPR.

Dynamic SQL

Both the heart and "UI" of Endatabas is a dynamic SQL engine which natively understands strongly-typed, semi-structured rows (documents). Here is an example to illustrate what that means:


INSERT INTO stores {brand: "Alonzo's Analog Synthesizers",
                    addresses: [{city: "New Jersey",
                                 country: "United States",
                                 opened: 1929-09-01},
                                {city: "Göttingen",
                                 country: "Germany",
                                 opened: 1928-09-01}]};

-- recursive query of ISO dates from the nested document:
SELECT addresses..opened FROM stores;

Endb SQL bases its core SQL dialect on SQLite. It also draws inspiration from SQL:99, SQL:2011, SQL:2016, and SQL:2023. Read more in our bibliography.

Columnar: Hybrid Transactional Analytic Processing (HTAP)

Endatabas stores and processes data as columns. Endb's columnar storage is built on Apache Arrow. Endb SQL allows users to retrieve data as documents. The ultimate goal is for Endatabas to serve many (hybrid) purposes: day-to-day transactions and analytical jobs.

Query Execution

Queries are compiled to Common Lisp and use hash joins. Join ordering is done heuristically, by base table size. Correlated queries are memoized by building indexes in-memory during query execution.

Future

More detailed information about the future of Endb can be found in the roadmap.

Cloud: Separation of Storage from Compute

If you're not sure what this is, think Amazon Aurora, Google AlloyDB, and Neon. Compute nodes are the classic database (Postgres, MongoDB, etc.) — in an immutable world, these are just caches. Storage is elastic object or blob storage (S3, Azure Blobs, etc.).

Endatabas is built to live in the clouds, alongside infinite disk.

Although the groundwork for separating storage from compute exists in Endb today, elastic storage backends are not yet implemented.

Adaptive Indexing

For more information on light and adaptive indexing, you can watch Håkan's talk from 2022: "Light and Adaptive Indexing for Immutable Databases"

Endb does not yet support adaptive indexing.

Roadmap

You can read our high-level roadmap on GitHub.

License

All Endatabas code is Copyright Håkan Råberg and Steven Deobald and licensed GNU Affero General Public License v3.0, unless otherwise specified.

To contribute to Endatabas, please refer to CONTRIBUTING.

The Endatabas Book