Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

MySQL 5.1 Plugin: HTML Storage Engine—Reads and Writes

Save for later
  • 480 min read
  • 2010-08-31 00:00:00

(For more resources on MySQL, see here.)

An idea of the HTML engine

Ever thought about what your tables might look like? Why not represent a table as a <TABLE>? You would be able to see it, visually, in any browser. Sounds cool. But how could we make it work?

We want a simple engine, not an all-purpose Swiss Army Knife HTML-to-SQL converter, which means we will not need any existing universal HTML or XML parsers, but can rely on a fixed file format. For example, something like this:

<html><head><title>t1</title></head><body><table border=1>
<tr><th>col1</th><th>other col</th><th>more cols</th></tr>
<tr><td>data</td><td>more data</td><td>more data</td></tr>
<!-- this row was deleted ... -->
<tr><td>data</td><td>more data</td><td>more data</td></tr>
... and so on ...
</table></body></html>

But even then this engine is way more complex than the previous example, and it makes sense to split the code. The engine could stay, as usual, in the ha_html.cc file, the declarations in ha_html.h, and if we need any utility functions to work with HTML we can put them in the htmlutils.cc file.

Flashback

A storage engine needs to declare a plugin and an initialization function that fills a handlerton structure. Again, the only handlerton method that we need here is a create() method.

#include "ha_html.h"
static handler* html_create_handler(handlerton *hton,
TABLE_SHARE *table, MEM_ROOT *mem_root)
{
return new (mem_root) ha_html(hton, table);
}
static int html_init(void *p)
{
handlerton *html_hton = (handlerton *)p;
html_hton->create = html_create_handler;
return 0;
}
struct st_mysql_storage_engine html_storage_engine =
{ MYSQL_HANDLERTON_INTERFACE_VERSION };
mysql_declare_plugin(html)
{
MYSQL_STORAGE_ENGINE_PLUGIN,
&html_storage_engine,
"HTML",
"Sergei Golubchik",
"An example HTML storage engine",
PLUGIN_LICENSE_GPL,
html_init,
NULL,
0x0001,
NULL,
NULL,
NULL
}
mysql_declare_plugin_end;

Now we need to implement all of the required handler class methods. Let's start with simple ones:

const char *ha_html::table_type() const
{
return "HTML";
}
const char **ha_html::bas_ext() const
{
static const char *exts[] = { ".html", 0 };
return exts;
}
ulong ha_html::index_flags(uint inx, uint part, bool all_parts) const
{
return 0;
}
ulonglong ha_html::table_flags() const
{
return HA_NO_TRANSACTIONS | HA_REC_NOT_IN_SEQ | HA_NO_BLOBS;
}
THR_LOCK_DATA **ha_html::store_lock(THD *thd,
THR_LOCK_DATA **to, enum thr_lock_type lock_type)
{
if (lock_type != TL_IGNORE && lock.type == TL_UNLOCK)
lock.type = lock_type;
*to ++= &lock;
return to;
}

These methods are familiar to us. They say that the engine is called "HTML", it stores the table data in files with the .html extension, the tables are not transactional, the position for ha_html::rnd_pos() is obtained by calling ha_html::position(), and that it does not support BLOBs. Also, we need a function to create and initialize an HTML_SHARE structure:

static HTML_SHARE *find_or_create_share(
const char *table_name, TABLE *table)
{
HTML_SHARE *share;
for (share = (HTML_SHARE*)table->s->ha_data;
share; share = share->next)
if (my_strcasecmp(table_alias_charset,
table_name, share->name) == 0)
return share;
share = (HTML_SHARE*)alloc_root(&table->s->mem_root,
sizeof(*share));
bzero(share, sizeof(*share));
share->name = strdup_root(&table->s->mem_root, table_name);
share->next = (HTML_SHARE*)table->s->ha_data;
table->s->ha_data = share;
return share;
}

It is exactly the same function, only the structure is now called HTML_SHARE, not STATIC_SHARE.

Creating, opening, and closing the table

Having done the basics, we can start working with the tables. The first operation, of course, is the table creation. To be able to read, update, or even open the table we need to create it first, right? Now, the table is just an HTML file and to create a table we only need to create an HTML file with our header and footer, but with no data between them. We do not need to create any TABLE or Field objects, or anything else—MySQL does it automatically. To avoid repeating the same HTML tags over and over we will define the header and the footer in the ha_html.h file as follows:

#define HEADER1 "<html><head><title>"
#define HEADER2 "</title></head><body><table border=1>\n"
#define FOOTER "</table></body></html>"
#define FOOTER_LEN ((int)(sizeof(FOOTER)-1))

As we want a header to include a table name we have split it in two parts. Now, we can create our table:

int ha_html::create(const char *name, TABLE *table_arg,
HA_CREATE_INFO *create_info)
{
char buf[FN_REFLEN+10];
strcpy(buf, name);
strcat(buf, *bas_ext());

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime

We start by generating a filename. The "table name" that the storage engine gets is not the original table name, it is converted to be a safe filename. All "troublesome" characters are encoded, and the database name is included and separated from the table name with a slash. It means we can safely use name as the filename and all we need to do is to append an extension. Having the filename, we open it and write our data:

FILE *f = fopen(buf, "w");
if (f == 0)
return errno;
fprintf(f, HEADER1);
write_html(f, table_arg->s->table_name.str);
fprintf(f, HEADER2 "<tr>");

First, we write the header and the table name. Note that we did not write the value of the name argument into the header, but took the table name from the TABLE_SHARE structure (as table_arg->s->table_name.str), because name is mangled to be a safe filename, and we would like to see the original table name in the HTML page title. Also, we did not just write it into the file, we used a write_html() function—this is our utility method that performs the necessary entity encoding to get a well-formed HTML. But let's not think about it too much now, just remember that we need to write it, it can be done later.

Now, we iterate over all fields and write their names wrapped in <th>...</th> tags. Again, we rely on our write_html() function here:

for (uint i = 0; i < table_arg->s->fields; i++) {
fprintf(f, "<th>");
write_html(f, table_arg->field[i]->field_name);
fprintf(f, "</th>");
}
fprintf(f, "</tr>");
fprintf(f, FOOTER);
fclose(f);
return 0;
}

Done, an empty table is created.

Opening it is easy too. We generate the filename and open the file just as in the create() method. The only difference is that we need to remember the FILE pointer to be able to read the data later, and we store it in fhtml, which has to be a member of the ha_html object:

int ha_html::open(const char *name, int mode,
uint test_if_locked)
{
char buf[FN_REFLEN+10];
strcpy(buf, name);
strcat(buf, *bas_ext());
fhtml = fopen(buf, "r+");
if (fhtml == 0)
return errno;

When parsing an HTML file we will often need to skip over known patterns in the text. Instead of using a special library or a custom pattern parser for that, let's try to use scanf()—it exists everywhere, has a built-in pattern matching language, and it is powerful enough for our purposes. For convenience, we will wrap it in a skip_html() function that takes a scanf() format and returns the number of bytes skipped. Assuming we have such a function, we can finish opening the table:

skip_html(fhtml, HEADER1 "%*[^<]" HEADER2 "<tr>");
for (uint i = 0; i < table->s->fields; i++) {
skip_html(fhtml, "<th>%*[^<]</th>");
}
skip_html(fhtml, "</tr>");
data_start = ftell(fhtml);

We skip the first part of the header, then "everything up to the opening angle bracket", which eats up the table name, and the second part of the header. Then we skip individual row headers in a loop and the end of row </tr> tag. In order not to repeat this parsing again we remember the offset where the row data starts. At the end we allocate an HTML_SHARE and initialize lock objects:

share = find_or_create_share(name, table);
if (share->use_count++ == 0)
thr_lock_init(&share->lock);
thr_lock_data_init(&share->lock,&lock,NULL);
return 0;
}

Closing the table is simple, and should not come as a surprise to us:

int ha_html::close(void)
{
fclose(fhtml);
if (--share->use_count == 0)
thr_lock_delete(&share->lock);
return 0;
}