split: And the curse of the asinine defaults

I have mentioned GNU split quite a few times since starting out on this little journey.

And for good reason. I use split quite a bit when my family members live on another continent, and the easiest way to relay personal files is to encrypt, split them across cheap USB drives or SD cards, and send them through the post.

It probably sounds barbaric to you, to think that even within the past few years I have relied on the sneakernet to send files around the world, when telecommunications have advanced so far in the past decade.

Just remember not everyone in the world is as fortunate as you are, you who has enough technology to read this simple blog. 👿

Back to the task at hand. split is one of those programs, like top or dmesg or even sort, that is easy to overlook because so many johnny-come-latelies can catch your eye.

And to be honest, by default, split is a little obtuse. Consider:

kmandla@6m47421: ~$ split -n 4 test.txt.gpg

kmandla@6m47421: ~$ ls
xaa  xab  xac  xad

Okay, let’s be painfully honest. split‘s defaults are just plain cockeyed. Who decided that the natural output would be an x, followed in quick succession by an alphabetic sequence? Is there no logic in keeping the original filename, or for that matter using numbers to show order? Doesn’t that make more sense? And why not use a dot separator, which is a classic mark for showing version difference, fractional parts and other simple numeric increments?

Sigh. Perhaps there is a reason. Regardless, this is one of those times when Linux seems to have fallen down for me. 😦

Luckily it is Linux though, and not some other, more obtuse operating system that would require me to buy an upgrade or third-party software to get the job done right. It’s just a matter of getting my options in place.

So here it is, step by step. You can look at the help flags for split if you want more information, but this should do the trick. First, as you saw above, if just want to split out by a certain number of resulting files, -n is all you need to start. Otherwise, if it’s by size (like the size of a USB drive), you’ll need to crack your knuckles.

First, let’s throw in the --numeric-suffixes flag, since alphabetical suffixes are just plain fascist. We set the starting number with this flag, and 1 is the obvious choice.

kmandla@6m47421: ~$ split --numeric-suffixes=1

Now let’s add the --suffix-length flag, and make them zero-padded. This is set to 2 by default, but we’ll use the entire business here, just for illustration’s sake.

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2

Now I’m going to tell it I want a 1K file size for the split. If you have a drive you’re writing to, you should scale back a little bit to account for the megabyte-mebibyte scam. Storage device manufacturers: First against the wall when the revolution comes.

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2 -b 1KB

Lastly, I add the --verbose flag, so I can see what’s happening. The final command looks like this:

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2 -b 1KB --verbose test.txt.gpg test.txt.gpg.

Notice that last bit, where the original file name is repeated with a dot after it. That’s not the filename, that’s the prefix of the new file set. This is how we do away with that “xaa” garbage, once and for all.

And what’s the final output?

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2 -b 1KB --verbose test.txt.gpg test.txt.gpg.
creating file ‘test.txt.gpg.01’
creating file ‘test.txt.gpg.02’
creating file ‘test.txt.gpg.03’
creating file ‘test.txt.gpg.04’
creating file ‘test.txt.gpg.05’
creating file ‘test.txt.gpg.06’
creating file ‘test.txt.gpg.07’
creating file ‘test.txt.gpg.08’
creating file ‘test.txt.gpg.09’
creating file ‘test.txt.gpg.10’

kmandla@6m47421: ~$ ls test.txt*
test.txt      test.txt.gpg.01  test.txt.gpg.03  test.txt.gpg.05  test.txt.gpg.07  test.txt.gpg.09
test.txt.gpg  test.txt.gpg.02  test.txt.gpg.04  test.txt.gpg.06  test.txt.gpg.08  test.txt.gpg.10

And all is right with the world.

split is good stuff, even if the default arrangement is just plain wacky. Hopefully with this and a little more tinkering, you can come up with the results you want.

Oh, and one more thing: Where do we find split, like every nifty console tool available to us? In coreutils, of course. 😉

5 thoughts on “split: And the curse of the asinine defaults

  1. darkstarsword

    I remember many years ago when I used to use Windows and floppy disks were the most common form of removable media I used to use a program called Chainsaw to split up files. Obviously that’s not particularly relevant for Linux users, but it did have a nice feature where it would produce a windows batch file to put the chunks back together so all the receiver had to do was copy the chunks to their hard drive and double click on the batch file.

    It would be trivial to do something similar on Linux – the Windows shell script was a bit convoluted to get that OS to put the chunks back together, but thanks to the cat command it would pretty much be a one-liner on Linux (maybe with a new extra lines to check the files exist and tell the user what to do if they don’t).

  2. Pingback: Bonus: A dozen more remainders | Inconsolation

  3. Pingback: Bonus: A dozen more remainders | Linux Admins

  4. Pingback: truncate: Arbitrarily chopping things off | Inconsolation

Comments are closed.