oid vs bytea in Postgresql

Okay, here is the deal, I am working on a project which requires me to store files in my database. But I have recently just switched to postgresql from mysql, so I had to do some research on what methods postgresql offers for file storage. In mysql GLOB would have been my choice, but postgresql offers two methods, BYTEA which is postgresql’s answer to GLOB, or store the files in Large Objects (oid).

I asked around to find out what method would be the fastes, but I could not find the answer I was looking for, so I ended up spending the night testing it for myself, this blog will show the results.

The scripts

byteaTest_save.php

ini_set('max_execution_time', '0');
ini_set('memory_limit', '128M');
$time = microtime();
$time = explode(' ', $time);
$time = $time[1] + $time[0];
$begintime = $time;

/* bytea test
DB:
CREATE TABLE images
(
id serial NOT NULL,
"name" text,
image bytea,
CONSTRAINT images_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
ALTER TABLE images OWNER TO "storeTest";
*/

function saveBytea($Conn, $Filename)
{
global $Dir;
$FileContent = pg_escape_bytea(file_get_contents($Dir.'/'.$Filename));
$Query = "INSERT INTO images (name, image) VALUES ('".$Filename."', '".$FileContent."');";
pg_query($Conn, $Query);
unset($FileContent);
}
$strConn = "host=localhost dbname=byteaTest user=storeTest password=test sslmode=require";


$Conn = pg_connect($strConn);
global $Dir;


$Dir = "/mnt/hdb/Pictures/BU billeder";
$Handle = opendir($Dir);
while(false !== ($File = readdir($Handle)))
{
if($File !== "." && $File !== "..")
{
saveBytea($Conn, $File);
}
}
closedir($Handle);
pg_close($Conn);
$time = microtime();


$time = explode(" ", $time);
$time = $time[1] + $time[0];
$endtime = $time;
$totaltime = ($endtime - $begintime);
$totaltime = sprintf("%.4f", $totaltime);
$Memory = memory_get_peak_usage();
echo "Execution time: " .$totaltime. " seconds.
";
echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";

byteaTest_load.php

ini_set('max_execution_time', '0');     
ini_set('memory_limit', '128M');
$time = microtime();    
$time = explode(' ', $time);     
$time = $time[1] + $time[0];     
$begintime = $time;
/* bytea test    
DB:     
CREATE TABLE images     
(     
id serial NOT NULL,     
"name" text,     
image bytea,     
CONSTRAINT images_pkey PRIMARY KEY (id)     
)     
WITH (OIDS=FALSE);     
ALTER TABLE images OWNER TO "storeTest";
*/
function loadBytea($Conn, $ID)    
{     
global $Dir;     
$Query = pg_query($Conn, "SELECT name, image FROM images WHERE id=".$ID.";");     
$Row = pg_fetch_row($Query);     
$FileContent = pg_unescape_bytea($Row[1]);     
file_put_contents($Dir.'/'.$Row[0], $FileContent);     
unset($FileContent);     
}
$strConn = "host=localhost dbname=byteaTest user=storeTest password=test sslmode=require";    
$Conn = pg_connect($strConn);
global $Dir;    
$Dir = "/mnt/hdb/oidVsbytea";     
$Query = pg_query("SELECT id FROM images;") or die(pg_last_error($Conn));     
while($Row = pg_fetch_assoc($Query))     
{     
loadBytea($Conn, $Row['id']);     
}
pg_close($Conn);
$time = microtime();    
$time = explode(" ", $time);     
$time = $time[1] + $time[0];     
$endtime = $time;     
$totaltime = ($endtime - $begintime);     
$totaltime = sprintf("%.4f", $totaltime);     
$Memory = memory_get_peak_usage();     
echo "Execution time: " .$totaltime. " seconds.\n";     
echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";

oidTest_save.php

ini_set('max_execution_time', '0');     
ini_set('memory_limit', '128M');
$time = microtime();    
$time = explode(' ', $time);     
$time = $time[1] + $time[0];     
$begintime = $time;
/* oid test    
DB:     
CREATE TABLE images     
(     
id serial NOT NULL,     
"name" text,     
image oid,     
CONSTRAINT images_pkey PRIMARY KEY (id)     
)     
WITH (OIDS=FALSE);     
ALTER TABLE images OWNER TO "storeTest";
*/
function saveOid($Conn, $Filename)    
{     
global $Dir;     
$FileContent = file_get_contents($Dir.'/'.$Filename);     
pg_query($Conn, "begin;");     
$Oid = pg_lo_create($Conn);     
$Query = "INSERT INTO images (name, image) VALUES ('".$Filename."', '".$Oid."');";     
pg_query($Conn, $Query);
$OidHandle = pg_lo_open($Conn, $Oid, "w");    
pg_lo_write($OidHandle, $FileContent);     
pg_lo_close($OidHandle);     
pg_query($Conn, "commit;");
unset($FileContent);    
}
$strConn = "host=localhost dbname=oidTest user=storeTest password=test sslmode=require";    
$Conn = pg_connect($strConn);
global $Dir;    
$Dir = "/mnt/hdb/Pictures/BU billeder";     
$Handle = opendir($Dir);     
while(false !== ($File = readdir($Handle)))     
{     
if($File !== "." && $File !== "..")     
{     
saveOid($Conn, $File);     
}     
}     
closedir($Handle);     
pg_close($Conn);
$time = microtime();    
$time = explode(" ", $time);     
$time = $time[1] + $time[0];     
$endtime = $time;     
$totaltime = ($endtime - $begintime);     
$totaltime = sprintf("%.4f", $totaltime);     
$Memory = memory_get_peak_usage();     
echo "Execution time: " .$totaltime. " seconds.\n";     
echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";

oidTest_load.php

ini_set('max_execution_time', '0');     
ini_set('memory_limit', '128M');
$time = microtime();    
$time = explode(' ', $time);     
$time = $time[1] + $time[0];     
$begintime = $time;
/* bytea test    
DB:     
CREATE TABLE images     
(     
id serial NOT NULL,     
"name" text,     
image bytea,     
CONSTRAINT images_pkey PRIMARY KEY (id)     
)     
WITH (OIDS=FALSE);     
ALTER TABLE images OWNER TO "storeTest";
*/
function loadOid($Conn, $ID)    
{     
global $Dir;     
$Query = pg_query($Conn, "SELECT name, image FROM images WHERE id=".$ID.";");     
$Row = pg_fetch_row($Query);     
pg_query($Conn, "begin;");     
$Handle = pg_lo_open($Conn, $Row[1], "r");     
$FileContent = pg_lo_read($Handle, 4194304);     
pg_query($Conn, "commit;");     
file_put_contents($Dir.'/'.$Row[0], $FileContent);     
unset($FileContent);     
}
$strConn = "host=localhost dbname=oidTest user=storeTest password=test sslmode=require";    
$Conn = pg_connect($strConn);
global $Dir;    
$Dir = "/mnt/hdb/oidVsbytea";     
$Query = pg_query("SELECT id FROM images;") or die(pg_last_error($Conn));     
while($Row = pg_fetch_assoc($Query))     
{     
loadOid($Conn, $Row['id']);     
}
pg_close($Conn);
$time = microtime();    
$time = explode(" ", $time);     
$time = $time[1] + $time[0];     
$endtime = $time;     
$totaltime = ($endtime - $begintime);     
$totaltime = sprintf("%.4f", $totaltime);     
$Memory = memory_get_peak_usage();     
echo "Execution time: " .$totaltime. " seconds.\n";     
echo "Peak memory usage: " . $Memory/1024/1024 . " mb.";

oidTestClear.php

$strConn = "host=localhost dbname=oidTest user=storeTest password=test sslmode=require";     
$Conn = pg_connect($strConn);
$Query = pg_query($Conn, "SELECT image FROM images");    
while($R = pg_fetch_assoc($Query))     
{     
pg_lo_unlink($Conn, $R['image']);     
}

NB: These scripts might / or might not be optimized, if you got any changes that might give a better result, please leave a comment.

The test

Every script was run five times, everytime with a clean database and a database restart. The oidTest took abit longer, since I had to unlink every oid that was created and then clear the database (oidTestClear.php).

The scripts was using a dir I have with 1075 pictures, filesizes is from 6kb to 3724kb.

Server hardware:

AMD Athlon XP 2200+ (Clocked to 1.62Ghz)

256Mb Ram

The results

1 2 3 4 5 Average Memory
bytea – save 402,79s 379,63s 380,25s 378,39s 377,82s 383,78s 41,21MB
bytea – load 228,42s 228,09s 227,85s 228,35s 228,48s 228,24s 18,51MB
oid – save 114,17s 133,75s 121,21s 138,75s 127,59s 127,09s 3,7MB
oid – load 45,53s 47,45s 45,58s 46,97s 46,6s 46,43s 7,7MB

Conclusion

From my tests, I most say I am suprised, I was expecting that bytea would be a little slower then oid, since it’s data had to be escaped before inserting it to the database, but I was not expecting it to be 3-4 times slower then oid.

Another thing I realized was that bytea creates a huge overhead while processing the data, 41,21MB vs 3,7MB.

From my point of view, oid is at all time the best choice if data storage in a postgresql database is a requirement, bytea is just too slow and memory hungry.

Notes

Bytea might have an advantage over oid, bytea is, from what I know, easier to maintaine and handle, since it’s data is stored inside of the table, instead of in separate place in the database. But then again, if you lose your table with oid’s, your data would still be present, this is not the case with bytea.

4 kommentarer til “oid vs bytea in Postgresql”

  1. Does delete from a row from your table images generate a delete cascade to the pg_largeobject table associated to the deleted oid?
    Thanks

  2. This is most likely one #of the# best article that ever cross my reference. I do not see why anybody should disagree. It may be too easy #for them# to comprehend…anyway nice work i’m coming again right here for More Great Stuff!!

Skriv et svar

Din e-mailadresse vil ikke blive publiceret. Krævede felter er markeret med *