ncruces 2 days ago

Taking full advantage of SQLite's extensibility is a center piece of my Go driver.

It started due to having to reimplement the OS layer in Go because of tech constraints, but it means you _can_ implement VFSes, UDFs (scalar, aggregates and windows), and virtual tables in Go, with reasonable performance and nice APIs.

I also made a point of dogfooding this as much as possible, with a bunch of extensions and a few custom VFSes that use the same APIs available to clients of the library.

https://github.com/ncruces/go-sqlite3/tree/main/ext

https://github.com/ncruces/go-sqlite3/tree/main/vfs#custom-v...

  • oefrha 2 days ago

    Interesting, I need to take a look at your vfs implementations. Some of my projects could benefit from a transparent zstd vfs layer, but compiling and loading one of the existing C options in a Go project is yucky even if possible.

    • ncruces a day ago

      I've never done it, but for the readonly case, I think all you need to is to combine the existing reader VFS with zstd-seekable-format-go.

      https://github.com/ncruces/go-sqlite3/discussions/117#discus...

      For read/write, I'm honestly not sure. The Zipvfs is an… erm… architectural mess that only really works because it accesses private SQLite APIs. Which is fine, but history has shown the SQLite team is willing to break those APIs, as they did for the ones they build the SQLite encryption extension on.

      https://sqlite.org/zipvfs/doc/trunk/www/howitworks.wiki

      The zstandard alternative is sqlite_zstd_vfs, which faces the same architectural issues. So, I'd rather not go there. But should be doable, as long as you're not needing private APIs.

      https://github.com/mlin/sqlite_zstd_vfs

      • hinkley 39 minutes ago

        The problem with those API breaks is rarely the fact that they got broken, it's the fire drill that happens after they do.

        If you get lucky, the 2 necessary upgrades happen at a time that fits well into your schedule. If you don't get lucky, then the SQLite upgrade you need contains a CERT advisory for a zero day attack, and not only does that not fit into your schedule but it also doesn't fit into the schedule of the person who did the customization.

        These are rare events but over the course of a project, that low priority taken to the exponent of the number of vendors you decide to play that game with, approaches or exceeds a probability of 1.00 (>1 meaning 'happened to us twice')

      • oefrha a day ago

        Thanks for the pointers!

  • newusertoday 19 hours ago

    i wish there was something like this for wasm. There are wasm builds available but its not easy to extend them with your own functions.

bob1029 a day ago

You can hypothetically write your entire product in SQL with appropriate bindings to SQLite. For me, SQL with CTEs is a very compelling way to model tricky business logic. Building a SQL-based rules engine is trivial if your host language has a good SQLite library.

This thing starts to grow legs once you realize you can recursively get into the rabbit hole by binding something like an Execute_Sql UDF - You can store the actual scripts within the same schema they operate on. Treating your code as data means you can do things like transactional updates of business logic while the system is serving live requests. You also get simple reflection & search over the business logic.

tobyhinloopen 2 days ago

Ive nothing to say but “that’s cool”. I want to try this in NodeJS!

I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network. I wonder what the overhead is.

Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?

  • julik a day ago

    > Why not use UUIDv7 if you want time-ordered UUIDs?

    It is our flavour of NIH, that said - Tou has a finer-resolution timestamp. We also didn't do our homework right and assumed the v7 UUIDs won't be accepted by Postgres because of a different "version" value.

  • masklinn a day ago

    > I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network.

    “Infeasible” is very fast. sqlite runs in process so you can register a function pointer or five, with a trampoline back into the runtime.

    Can’t do that over the network, you can create functions but only using the database’s procedural langage(s in the case of Postgres).

  • dragonwriter 2 days ago

    > Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?

    It looks like the rationale for the Tou library is that some systems do not accept unfamiliar UUID variants as UUIDs, so a time-ordered ID that looks like a UUIDv4 is safer for some legacy systems than a (newer, and less likely recognized) UUIDv7.

    • julik a day ago

      This. We did discover, however, that Postgres will, in fact, swallow a UUIDv7 just fine. After having written that library :-)

      • dragonwriter a day ago

        I kind of thought it would ingest but not generate unknown UUID versions that nonetheless fit the broader UUID structure, but not having tried I didn't want to bring that up.

  • julik a day ago

    It is much cheaper, because you won't have roundtrips or requirements for the availability of extensions on the database server end. It's really a very very sweet capability that SQLite is able to provide exactly because it is hosted by the application.

  • relistan 2 days ago

    > Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?

    I’m not the author but UUIDv7 came out in about 2022. Guessing this is legacy stuff that long predated that. There were lots of solutions to solve this problem before there was a standard.

    • dragonwriter 2 days ago

      > Guessing this is legacy stuff that long predated that.

      I’m guessing its not, since the tou library seems to be 8 months old and mentions avoiding the need for extensions if you are using it with Postgres as an advantage over using UUIDv7.

  • postepowanieadm 2 days ago

    better-sqlite3 got you covered!

    • tobyhinloopen 2 days ago

      better-sqlite3 is (was?) my favorite sqlite3 lib, it is incredibly fast.

      NodeJS has SQLITE3 support these days!

      https://nodejs.org/api/sqlite.html

      Interestingly it is NOT async, like better-sqlite3. I wonder why. I've been looking for any public remarks about it, but found nothing.

      • darthShadow 2 days ago

        These may be useful reading:

        https://github.com/TryGhost/node-sqlite3/issues/408#issue-57...

        https://github.com/WiseLibs/better-sqlite3/issues/32#issueco...

        Copying a quote from the second:

        > The sqlite3 C API serializes all operations (even reads) within a single process. You can parallelize reads to the database but only by having multiple processes, in which case one process being blocked doesn't affect the other processes anyways. In other words, because sqlite3 serializes everything, doing things asynchronously won't speed up database access within a process. It would only free up time for your app to do other things (like HTTP requests to other servers). Unfortunately, the overhead imposed on sqlite3 to serialize asynchronous operations is quite high, making it disadvantageous 95% of the time.

        • ncruces 2 days ago

          The sqlite3 C API very much does not serialize "all operations within a single process."

          The way threading and concurrency work in SQLite may not mesh well with NodeJS's concurrency model. I dunno, I'm not an NodeJS/libuv expert.

          But at the C API level that statement is just wrong. Normally you cannot share a single connection across threads. If you compile SQLite to allow this, yes, it'll serialize operations using locks. The solution is to create additional database connections, not (necessarily) launch another process. With multiple database connections, you can have concurrency, with or without threads.

          https://sqlite.org/threadsafe.html

          Again, whether this is viable in NodeJS, I have no idea. But it's a Node issue, not a C API issue.

          BTW, we're commenting on a Ruby article, and SQLite in Ruby has seen "recent" advances that increase concurrency through implementing SQLite's BUSY handler in Ruby, which allows the GVL lock to be released, and other Ruby and SQLite code to run while waiting on a BUSY connection.

          https://fractaledmind.github.io/2023/12/11/sqlite-on-rails-i...

          • julik a day ago

            The user-supplied busy handler has been available for a long while, it's just that the Rails connection adapters did not quite use it right. Indeed, there is elevated interest in SQLite these days.

dennisvdvliet 7 hours ago

That is pretty cool. And good to see you still doing talks.

melvinroest 2 days ago

I happen to work with BigQuery since recently and heard you can do UDFs with JavaScript. Good to know that this is a thing with more databases. I didn't need to use UDFs up until this point but now that I know about it a bit more, I just might as JS happens to be a strong language of mine (currently not using it professionally).

  • julik a day ago

    We actually use a couple Ruby UDFs with BigQuery, deployed as Google Cloud Functions. Works pretty well.

Alifatisk a day ago

What an interesting find

igravious a day ago

from the comments:

   [Python](https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.create_function)
https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...

   [Lua](http://lua.sqlite.org/index.cgi/doc/tip/doc/lsqlite3.wiki#db_create_function)
http://lua.sqlite.org/index.cgi/doc/tip/doc/lsqlite3.wiki#db...

   [Node.js](https://nodejs.org/api/sqlite.html#databasefunctionname-options-function)
https://nodejs.org/api/sqlite.html#databasefunctionname-opti...

   [PHP](https://www.php.net/manual/en/sqlite3.createfunction.php)
https://www.php.net/manual/en/sqlite3.createfunction.php