Best Practices/Big Ideas

Protocol Redifinition

I think it is time we separate data into 2 general groups, "large, static data" and "small, dynamic data."


Satellite internet is OK for the large transfer of files. Latency isn't a concern and you could theoretically pump a ton of data per cycle, transferring a 1gig file in a short amount of time. But satellite sucks for the small stuff, like ever changing datasets on a map during online FPSs. Because of the 750ms latency that satellite has, real time dataset transfers don't really work. This includes short-set TTL's for VPN and other such technologies that rely on the timing of the response.


Wire-based internet does great for both the large and small stuff, although starts to choke at millions of small files in a single data stream.


Wireless is almost as good as wired, however you still have that extra layer of protocols to deal with. As response times get better and data chuncks get larger, wireless will probably be the way to go.


So then, why don't we define data in 2 ways, how about 'data' for the larger files and 'communication' for all the small stuff such as link negotiations, DNS lookups, file syncs, protocol information, timestamps, etc. Most of this 'communication' should behave wonderfully over even a dial up connection as long as multiple streams can be run and the latency is as low as can be. If you could multi-stream over various networks at the same time, such as wireless for all your small transfers but satellite for your large files, you'd probably be able to see some stability across all forms of internet links.


An example is just way too cumbersome to write up here. But basically if you optimize the data to be transferred over the best possible pipe(s), then you can theoretically eliminate a lot of latency that maybe other single technologies have.


This could also offload the transfers of 'data' from underperforming networks to efficient, purpose built networks and have the 'communication' of said data ported to the underperforming networks that work fine for latency, but don't have the capacity for fast, large transfers.


It's like having a RAID 10 15k RPM SAS hard drive setup for your SQL data, but use an old Win2000 server w/FAT32 and 5200RPM drives to host the front end. It still looks pretty despite the old front end hardware, but the data is on a high performing setup with response times that the server itself wouldn't be able to handle. Result? Great multi-user access to hammer the heck out of the SQL data but with minimal impact on the front end with performances and tweaks split across 2 different platforms. And the best part, losing one of the technologies doesn't completely break the system.


How do we do this? You got me, I'm just the idea guy :-)


Could also start using hash tables a little bit more and granulate the data. If you have 100, 10MB datablocks that each has a particular hash signature, you don't need the original file to necessarily match that 10MB datablock, you just need the exact hash match, which may happen to be on a higher performing network infrastructure (think BitTorrent). But different from BitTorrent, where the particular host has all or part of the same file you are looking for, these hosts will have that 10MB block in a file completely unrelated to your target file (now think of monkeys typing the complete works of Shakespeare). There is enough data on the internet to reconstruct any file from every public access point there is. I'd bet if you downloaded all the front facing pages of 100 random sites, you'd be able to piece together, with minimal data reorganization, at least one file on your computer including an exact hash match. Now take the billions of websites, the trillions of datasets, the ever growing size...That sounds a lot like a hard drive...Now take the quadrillions of page indexes from sites like Google, Bing, Yahoo, etc...And that sounds a lot like a FAT Table or NTFS index...Why couldn't you build an iSCSI type protocol that links directly to a Google index. You submit a call to a specific file, your computer breaks it into several files with respective hashes, the hashes are submitted to the index, the index builds the location paths and returns those locations to the computer where it goes and captures that data and then rebuilds it locally(*). You could build in very good security so it would be (nearly) impossible to rebuild the file based on the hash request, since those sequences would be held locally to the requesting system. (*)There is a catch, the host system needs to have the generated hash sequence so the target system knows what to get. This could optimize the current infrastructure in place AND enhance; remote workplace, corporate extranets, secured communications, long distance data transfers, satellite data communications (high-use datasets w/hashes could be stored locally, cutting travel time in half). In time, you could reverse this process and with all the pictures, music, documents, tv shows, movies, etc. stored on the local system, the target system could just transfer a hash sequence, and your computer rebuilds the data using locally stored files. Now dial-up doesn't sound too bad if the data being transfered is a string of hash variables and nothing else.



-1 votes
Idea No. 204