boardix


Post New Topic  New Poll  Post A Reply
my profile | directory login | | search | faq | forum home
  next oldest topic   next newest topic
» boardix   » Groups   » Codecrave   » Compression

 - UBBFriend: Email this page to someone!    
Author Topic: Compression
bigO
He Who Knows All
Member # 7

 - posted      Profile for bigO     Send New Private Message       Edit/Delete Post   Reply With Quote 
Hey hey hey, is it time to add another feature to the CodeCraveCnnctn standard?

"But Mike," you say, "You haven't even finished encryption!"

Well BACK OFF, I say. I'm getting there, I just need to add certificate support.

Anyway, me and dave have recently been discussion changing the file-sending portion of our chat program, due to ridiculously low upload speeds and inspired by the new form of high speed dialup that is becoming popular. We realized "Hey, modern computers are fast enough to compress and decompress information just as fast as they can send and receive, we could probably save a lot of time if we compress large files before sending them."

So we started looking into lossless data compression. Dave found an algorithm in his math book, but I found one that claims to be better, an algorithm used by most popular compression programs. However, ours may turn out a little different.

Here's the page for that: http://www.data-compression.com/lossless.html#lz . The method that dave descibed is above (Huffmann) but the lower one seems better, and probably easier to do.

However, we need a way to maximize the efficiency, since we are compressing only temporarily to save time. If the file is large enough, it may take a long time on one end to compress it, and then decompress it on the other end. What I think may be ideal is to be doing the compression while sending the file. The problem with this is that the "library" is not finished until the file is fully compressed. gay. So maybe do it in chunks? but no, then we'd need to send a new library everytime, and the compression would suffer greatly.

I think I am on the tip of a breakthrough, but alas, i have to work.

--------------------
Anybody can be cool, but AWESOME takes practice.

Posts: 4522 | From: VP luv | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
Haven't looked at the link yet but first impression is, can't we make the library without encrypting the file, and then do little bits at a time when we send with the same library?

With the method in the math book one should be able to do this. Ah, I'll get back to this after finals.

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
Ah, I see. You can't because it encrypts as it builds the "library".

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged
bigO
He Who Knows All
Member # 7

 - posted      Profile for bigO     Send New Private Message       Edit/Delete Post   Reply With Quote 
Right, and doing the huffman method is similar. Either way, if you build a library to act as standard, you lose a lot of efficiency, the same patterns dont necesarily appear as much in different files.

One thing I was thinking about is that the recver can decompress as he receives, but the sender can't do the same, and really the sender is the limiting factor. However, we could just compress on the sender end before even notifying the receiver. Or...send the notification and compress while we wait for confirmation.

Or.......

I'm wondering homw much disk read/write time is a factor in compression. i understand that it is a lot slower than memory read/write...So...waht if we scan the file, create the library in memory, send the library, and then insted of ever re-writing the compressed file to disk, we read chunks of the file, compress them in memory, and then send. That way we are never writing the compressed garbage to the disk as a temp file, and it may sve a ton of time, depending on the size of the file.

Note: This probably isn't a breakthrough, but i put it into words now.

Also, I would just like to say that compression needs to be an option, as it would slow things down if you are on the same network, and there should be a size limit. Small files transfer in seconds, no need to compress them.

--------------------
Anybody can be cool, but AWESOME takes practice.

Posts: 4522 | From: VP luv | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
The huffman method, I guess what its called, doesn't require you to compress as you build. And I didn't mean it to sound as though you have one library for ever. The library would be file dependent, remade with every file.

And yes, staying the hell away from disk would be the best. And reading large portions of the disk would be beneficial rather than tiny amounts bit by bit (meaning the chunks we send via sockets are too small to be efficient). So we'd probably want to read a couple of megabytes of the file into memory, compress it into memory, and then send it in peices.

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged
bigO
He Who Knows All
Member # 7

 - posted      Profile for bigO     Send New Private Message       Edit/Delete Post   Reply With Quote 
That's the thing with the LZ method though, it is possible to mix it with the huffman method. The huffman method builds a library, then compresses. The advantage of that is having the library before compression. Though the lz method builds the library as it compresses, we could modify it to not compress it, just build the library. Then we're good to go.

--------------------
Anybody can be cool, but AWESOME takes practice.

Posts: 4522 | From: VP luv | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
Heyo!

Long time no post.

Anyhow, I had a couple of free hours tonight and found a compression "library" but not really, just some guys code, that uses what he refers to it as LZRW1. It works very well and appears to work very fast. I compressed the CnC generals bin in the time it took me to take a leak + maybe another 20 seconds. Since the bin is already compressed a decent amount I was supprised that it shrunk it by about 10MB, from 734MB to 724MB. Binaries and other such things seem to compress very well. I even got a divx'ed scrubs episode to shrink 1MB. Any tarballz don't compress at all and sometimes are slightly (~2KB for a 500KB file) larger. Looks good but its in C. I figured if we want to take a quick way into this we would try and dissect this code and make/port his code to C++ and also make some changes so we can implement this in hunks. I took a quick look at the source and I had a hard time following some of it, manily because of the foreign types used.

When you get time, let me know what you think mike, and anyone else for that matter who is interested. We could at least use this source (the cleanest and simplest source I've seen out of about 10 libraries I downloaded) as a diving/reference board.

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged
Mystery Man
vote YES to the Heroes bill
Member # 1

 - posted      Profile for Mystery Man     Send New Private Message       Edit/Delete Post   Reply With Quote 
if you haven't started it yet, port to Java/C# instead. new languages for new tech.

--------------------
 -

Posts: 2726 | From: VP | Registered: Apr 2002  |  IP: Logged
bigO
He Who Knows All
Member # 7

 - posted      Profile for bigO     Send New Private Message       Edit/Delete Post   Reply With Quote 
What?
Java sucks. It is a slow, bulky language. Both Dave and I hate it. No java ports.

Dave uses Linux. C# is a windows only language, and is just like java, without being cross platform. Maybe there are some improvements to the slowness. Still no where near C++ in awesomeness.

Java *cannot* ever be better than c++. I think of java as a learning tool. It assumes you are a crappy programmer and therefore doesn't allow you to do things that you might screw up, such as direct access to memory or overloaded operators. Plus, passing variables in not explicit, you have to look up whether it is passed by value or reference, and don't have the choice to do the other. I hate it, i hate it, I hate it. And I hate Sun for making crappy keyboards.

So really, why port it to java/c#?

--------------------
Anybody can be cool, but AWESOME takes practice.

Posts: 4522 | From: VP luv | Registered: Apr 2002  |  IP: Logged
Atomic-Ant
In Real Life Now
Member # 2

 - posted      Profile for Atomic-Ant     Send New Private Message       Edit/Delete Post   Reply With Quote 
how about a web app port? now that's a cool idea. Like AIMexpress or something.

Also, don't some cell phones have java interpretters?

--------------------
 -

Posts: 1768 | From: Nashville, TN | Registered: Apr 2002  |  IP: Logged
bigO
He Who Knows All
Member # 7

 - posted      Profile for bigO     Send New Private Message       Edit/Delete Post   Reply With Quote 
I thought that was your job, sir good [Wink] but i know you've got 1000 other things to do. adam also mentioned something about doing that, but no more than a mention

--------------------
Anybody can be cool, but AWESOME takes practice.

Posts: 4522 | From: VP luv | Registered: Apr 2002  |  IP: Logged
Atomic-Ant
In Real Life Now
Member # 2

 - posted      Profile for Atomic-Ant     Send New Private Message       Edit/Delete Post   Reply With Quote 
haha OH CRAP... I had forgotten about that.

--------------------
 -

Posts: 1768 | From: Nashville, TN | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
Java is poopy.

I was going to say something about why I think so but my statement stands on its own. Besides, I know I got mike who will back me up.

I just hate java like the plague. Java is like a class file written as a wrapper for c++.

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
I'll post the code in here tomorrow when I find it on my laptop for all to see.

Mike you still haven't answered my inquires in my post. What chew think?

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged
bigO
He Who Knows All
Member # 7

 - posted      Profile for bigO     Send New Private Message       Edit/Delete Post   Reply With Quote 
Yeah, i was gonna post, but then i figured i'd just talk to you in person, but since i forgot, yes, let's take a looksy at the codesy.

--------------------
Anybody can be cool, but AWESOME takes practice.

Posts: 4522 | From: VP luv | Registered: Apr 2002  |  IP: Logged
j0n
resident cyclist (aka - dude w/ the pole)
Member # 11

 - posted      Profile for j0n     Send New Private Message       Edit/Delete Post   Reply With Quote 
lzrw1kh.h

code:
/*   ###################################################################

## ##

## ## ##### ##### ## ## ## ## ## ## ## ## ##

## ## ### ## ## ## # ## ### ## ## ## ## ## ##

## ## ### ##### ####### ## ## #### ###### ##

## ## ### ## ## ### ### ## ## ## ## ## ## ##

## ##### ##### ## ## ## ## #### ## ## ## ## ## ##

## ##

## EXTREMELY FAST AND EASY TO UNDERSTAND COMPRESSION ALGORITM ##

## ##

###################################################################

## ##

## This header file implements the LZRW1/KH algoritm which ##

## also implements some RLE coding which is usefull when ##

## compressing files containing a lot of consecutive bytes ##

## having the same value. The algoritm is not as good as ##

## LZH, but can compete with Lempel-Ziff. It's the fasted ##

## one I've encountered upto now. ##

## ##

## Kurt HAENEN ##

## ##

################################################################### */



#define FLAG_Copied 0x80

#define FLAG_Compress 0x40

#define TRUE 1

#define FALSE 0

typedef signed char byte;

typedef signed short word;

typedef unsigned char ubyte;

typedef unsigned short uword;

typedef unsigned long ulong;

typedef unsigned char bool;



bool GetMatch(Source,X,SourceSize,Hash,Size,Pos)

ubyte *Source;

uword X;

uword SourceSize;

word *Hash;

uword *Size;

word *Pos;

{ uword HashValue = (40543L*((((Source[X] << 4) ^

Source[X+1]) << 4) ^ Source[X+2]) >> 4) &

0xfff;

*Pos = Hash[HashValue];

Hash[HashValue] = X;

if ((*Pos != -1) && ((X-*Pos) < 4096))

{ for (*Size = 0; ((*Size < 18)

&& (Source[X+*Size] == Source[*Pos+*Size])

&& ((X+*Size) < SourceSize)); (*Size)++);

return(*Size >= 3);

}

return(FALSE);

}



uword Compression(Source,Dest,SourceSize)

ubyte *Source,*Dest;

uword SourceSize;

{ word Hash[4096];

uword Key,Size,Pos;

ubyte Bit = 0;

uword Command;

uword X = 0;

uword Y = 3;

uword Z = 1;

for (Key = 0; Key < 4096; Key++)

Hash[Key] = -1;

Dest[0] = FLAG_Compress;

for (; (X < SourceSize) && (Y <= SourceSize);)

{ //printf("X = %i Y = %i Z = %i",X,Y,Z);

if (Bit > 15)

{ Dest[Z++] = (Command >> 8) & 0x00ff;

Dest[Z] = Command & 0x00ff;

Z = Y;

Bit = 0;

Y += 2;

}

for (Size = 1; (Source[X] == Source[X+Size]) && (Size < 0x0fff)

&& (X+Size < SourceSize); Size++);

if (Size >= 16)

{ Dest[Y++] = 0;

Dest[Y++] = ((Size - 16) >> 8) & 0x00ff;

Dest[Y++] = (Size - 16) & 0x00ff;

Dest[Y++] = Source[X];

X += Size;

Command = (Command << 1) + 1;

}

else if (GetMatch(Source,X,SourceSize,Hash,&Size,&Pos))

{ Key = ((X-Pos) << 4) + (Size - 3);

Dest[Y++] = (Key >> 8) & 0x00ff;

Dest[Y++] = Key & 0x00ff;

X += Size;

Command = (Command << 1) + 1;

}

else { Dest[Y++] = Source[X++];

Command = (Command << 1);

}

Bit++;

}

Command <<= (16-Bit);

Dest[Z++] = (Command >> 8) & 0x00ff;

Dest[Z] = Command & 0x00ff;

if (Y > SourceSize)

{ for(Y = 0; Y < SourceSize; Dest[Y+1] = Source[Y++]);

Dest[0] = FLAG_Copied;

return(SourceSize+1);

}

return(Y);

}



uword Decompression(Source,Dest,SourceSize)

ubyte *Source,*Dest;

uword SourceSize;

{ uword X = 3;

uword Y = 0;

uword Pos,Size,K;

uword Command = (Source[1] << 8) + Source[2];

ubyte Bit = 16;

if (Source[0] == FLAG_Copied)

{ for (Y = 1; Y < SourceSize; Dest[Y-1] = Source[Y++]);

return(SourceSize-1);

}

for (; X < SourceSize;)

{ if (Bit == 0)

{ Command = (Source[X++] << 8);

Command += Source[X++];

Bit = 16;

}

if (Command & 0x8000)

{ Pos = (Source[X++] << 4);

Pos += (Source[X] >> 4);

if (Pos)

{ Size = (Source[X++] & 0x0f)+3;

for (K = 0; K < Size; K++)

Dest[Y+K] = Dest[Y-Pos+K];

Y += Size;

}

else { Size = (Source[X++] << 8);

Size += Source[X++] + 16;

for (K = 0; K < Size; Dest[Y+K++] =

Source[X]);

X++;

Y += Size;

}

}

else Dest[Y++] = Source[X++];

Command <<= 1;

Bit--;

}

return(Y);

}

Compression.c

code:
/***********************************************************************
** **
** COMPRESS.C A fast compression utility **
** **
***********************************************************************/

#include <stdio.h>
#include "lzrw1kh.h"

#define OurIdentifier 0xa5b6c7d8

#define Header "\
\n\
This program is only purpose is to demonstrate the working of the\n\
LZRW1KH (de)compression routines which can be found in the header\n\
file LZRW1KH.h ! This program simply takes two parameters :\n\
the name of a source file and the name of the destination file.\n\
If the source file has already been compressed, it will be de-\n\
compressed and stored into the destination file. Otherwise the\n\
process will compress the file !\n\
\n\
Usage : COMPRESS <Source> <Destination>\n\
\n\
Program and compression routines written by Kurt Haenen.\n\
\n"

main(ac,av)
int ac;
char *av[];
{ FILE *FP1,*FP2;
ulong Identifier;
ubyte Source[16388],Dest[16388];
uword SourceSize,DestSize;
printf(Header);
if (ac != 3)
{ printf("Illegal number of parameters !\n");
exit(1);
}
if ((FP1 = fopen(av[1],"rb")) == NULL)
{ printf("File %s not found !\n",av[1]);
exit(1);
}
if ((FP2 = fopen(av[2],"wb")) == NULL)
{ fclose(FP1);
printf("Couldn't open %s for output !\n",av[2]);
exit(1);
}
if (((SourceSize = fread(&Identifier,sizeof(ulong),1,FP1)) != 1)
|| (Identifier != OurIdentifier))
{ Identifier = OurIdentifier;
printf("Trying to compress %s to %s !\n",av[1],av[2]);
puts("writing filesize");
if (fwrite(&Identifier,sizeof(ulong),1,FP2) != 1)
{ fclose(FP1);
fclose(FP2);
printf("Couldn't write to %s !\n",av[2]);
exit(1);
}
if (fseek(FP1,0L,SEEK_SET) != 0)
{ fclose(FP1);
fclose(FP2);
printf("Couldn't seek in %s !\n",av[1]);
exit(1);
}
for (; (SourceSize = fread(Source,1,16384,FP1)) != 0;)
{ DestSize = Compression(Source,Dest,SourceSize);
//puts("writing block to disk");
if ((fwrite(&DestSize,sizeof(uword),1,FP2) != 1)
|| (fwrite(Dest,1,DestSize,FP2) != DestSize))
{ fclose(FP1);
fclose(FP2);
printf("Couldn't write to %s !\n",av[2]);
exit(1);
}
}
}
else { printf("Trying to decompress %s to %s !\n",av[1],av[2]);
for (; (fread(&SourceSize,sizeof(uword),1,FP1) == 1);)
{ if (fread(Source,1,SourceSize,FP1) != SourceSize)
{ fclose(FP1);
fclose(FP2);
printf("Unexspected EOF in file %s
!\n",av[1]);
exit(1);
}
DestSize = Decompression(Source,Dest,SourceSize);
if (fwrite(Dest,1,DestSize,FP2) != DestSize)
{ fclose(FP1);
fclose(FP2);
printf("Couldn't write to %s !\n",av[2]);
exit(1);
}
}
}
fclose(FP1);
fclose(FP2);
}



--------------------
Ninety percent of the time it doesn't happen, and probably it's just as well it doesn't, though if you can't get a level of candor on sex and you choose to behave instead as if this isn't ever on your mind, the male friendship is incomplete
P. Roth

Posts: 1349 | Registered: Apr 2002  |  IP: Logged
SkyRat
Resident Director of Personal
Member # 10

 - posted      Profile for SkyRat     Send New Private Message       Edit/Delete Post   Reply With Quote 
Stupid jon, that was me posting.

--------------------
 -  -
quote:
With the stupid limit on sigs my googlism won't fit!!


Posts: 2518 | From: BFH | Registered: Apr 2002  |  IP: Logged


 
Post New Topic  New Poll  Post A Reply Close Topic   Feature Topic   Move Topic   Delete Topic next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:

kordix.com

(C)2000-2011 The Boardix Community

Powered by Infopop Corporation
UBB.classicTM 6.5.0