Tag Archives: split

truncate: Arbitrarily chopping things off

When I mentioned that there were useful and interesting tools in coreutils and util-linux and bsd-games (and I should probably add binutils), I wasn’t exactly thinking of truncate.

2014-09-16-6m47421-truncate

truncate wasn’t on my list when it began five years ago, or even in later additions. I can see why: It’s a rather arbitrary and vicious tool, snapping off files at predetermined lengths and leaving the remainder to flutter away in the wind.

I can’t think of any exact use for truncate aside from determining an exact, to-the-byte length of a file, perhaps for some sort of network testing or disk performance check. And considering the leftovers are summarily discarded, it’s a lethal decision to use it.

truncate follows the same flags for size and units as split and some other toys from coreutils. If you’re familiar with much of what’s in the suite, it will only take you a second to get used to truncate.

And that’s about all I can think of to say about truncate. Use wisely. 😉

splitvt: Under the most dire of circumstances

Between tmux and screen, there’s really not much space for upstart splitscreen console tools, unless they can do things really, really well.

splitvt is really, really not that tool.

2014-05-07-6m47421-splitvt

That’s the prettiest, cleanest results I could get from splitvt, although I admit I hardly tried beyond the first few console tools that came to mind.

Yes, splitvt can run two console apps in the same frame. And yes, you can spin it up with two very different applications, and get both of them going with reasonable fidelity.

But anything outside of the simplest, most basic output gets sickeningly mangled, like a ten-car horrorcrash, or a drunken prom queen turned loose on a makeup counter in an abandoned department store.

htop came out looking like Van Gogh’s The Starry Night. Midnight Commander was a gob of scrambled eggs sliding off a plate. Don’t even ask me about elinks. I don’t like to think about it. 😯 😥

And splitvt is clearly intended for non-interactive programs. I find splitvt traps you in the upper bracket, meaning any input intended for the lower half is effectively ignored.

There’s some sort of “command mode” for splitvt, which I found by accident. If you hit CTRL+O and then enter a question mark, you’ll get a brief list of commands. From there you can adjust the property boundary, copy and paste (supposedly) or lock the screen. Other tips are listed.

But I’ve seen enough. I know it’s not fair to pick on a program that’s beyond its freshness date, and it may be that when splitvt was in its prime, it was neck-and-neck with the best that screen or tmux could offer.

But these days, with tmux leading the pack featurewise, and screen coming out of retirement to do battle with the usurper … splitvt is easy to dismiss. Try it, but only under the most dire of circumstances. 😐

split: And the curse of the asinine defaults

I have mentioned GNU split quite a few times since starting out on this little journey.

And for good reason. I use split quite a bit when my family members live on another continent, and the easiest way to relay personal files is to encrypt, split them across cheap USB drives or SD cards, and send them through the post.

It probably sounds barbaric to you, to think that even within the past few years I have relied on the sneakernet to send files around the world, when telecommunications have advanced so far in the past decade.

Just remember not everyone in the world is as fortunate as you are, you who has enough technology to read this simple blog. 👿

Back to the task at hand. split is one of those programs, like top or dmesg or even sort, that is easy to overlook because so many johnny-come-latelies can catch your eye.

And to be honest, by default, split is a little obtuse. Consider:

kmandla@6m47421: ~$ split -n 4 test.txt.gpg

kmandla@6m47421: ~$ ls
xaa  xab  xac  xad

Okay, let’s be painfully honest. split‘s defaults are just plain cockeyed. Who decided that the natural output would be an x, followed in quick succession by an alphabetic sequence? Is there no logic in keeping the original filename, or for that matter using numbers to show order? Doesn’t that make more sense? And why not use a dot separator, which is a classic mark for showing version difference, fractional parts and other simple numeric increments?

Sigh. Perhaps there is a reason. Regardless, this is one of those times when Linux seems to have fallen down for me. 😦

Luckily it is Linux though, and not some other, more obtuse operating system that would require me to buy an upgrade or third-party software to get the job done right. It’s just a matter of getting my options in place.

So here it is, step by step. You can look at the help flags for split if you want more information, but this should do the trick. First, as you saw above, if just want to split out by a certain number of resulting files, -n is all you need to start. Otherwise, if it’s by size (like the size of a USB drive), you’ll need to crack your knuckles.

First, let’s throw in the --numeric-suffixes flag, since alphabetical suffixes are just plain fascist. We set the starting number with this flag, and 1 is the obvious choice.

kmandla@6m47421: ~$ split --numeric-suffixes=1

Now let’s add the --suffix-length flag, and make them zero-padded. This is set to 2 by default, but we’ll use the entire business here, just for illustration’s sake.

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2

Now I’m going to tell it I want a 1K file size for the split. If you have a drive you’re writing to, you should scale back a little bit to account for the megabyte-mebibyte scam. Storage device manufacturers: First against the wall when the revolution comes.

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2 -b 1KB

Lastly, I add the --verbose flag, so I can see what’s happening. The final command looks like this:

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2 -b 1KB --verbose test.txt.gpg test.txt.gpg.

Notice that last bit, where the original file name is repeated with a dot after it. That’s not the filename, that’s the prefix of the new file set. This is how we do away with that “xaa” garbage, once and for all.

And what’s the final output?

kmandla@6m47421: ~$ split --numeric-suffixes=1 --suffix-length=2 -b 1KB --verbose test.txt.gpg test.txt.gpg.
creating file ‘test.txt.gpg.01’
creating file ‘test.txt.gpg.02’
creating file ‘test.txt.gpg.03’
creating file ‘test.txt.gpg.04’
creating file ‘test.txt.gpg.05’
creating file ‘test.txt.gpg.06’
creating file ‘test.txt.gpg.07’
creating file ‘test.txt.gpg.08’
creating file ‘test.txt.gpg.09’
creating file ‘test.txt.gpg.10’

kmandla@6m47421: ~$ ls test.txt*
test.txt      test.txt.gpg.01  test.txt.gpg.03  test.txt.gpg.05  test.txt.gpg.07  test.txt.gpg.09
test.txt.gpg  test.txt.gpg.02  test.txt.gpg.04  test.txt.gpg.06  test.txt.gpg.08  test.txt.gpg.10

And all is right with the world.

split is good stuff, even if the default arrangement is just plain wacky. Hopefully with this and a little more tinkering, you can come up with the results you want.

Oh, and one more thing: Where do we find split, like every nifty console tool available to us? In coreutils, of course. 😉

mpgtx: Slicing and dicing, mpegs of all varieties

Having already cruised past such heavyweights as mplayer, mencoder, handbrake, avidemux, inkscape and imagemagick makes me a little more comfortable approaching mpgtx.

2014-01-25-lv-r1fz6-mpgtx

A tool specifically for carving away at mpeg files of all varieties is not intimidating in itself.

Knowing full well that it’s a console-only tool, with nothing graphical aside from specific control characters … that might be daunting to some.

mpgtx wisely subordinates some obvious functions to quick mnemonics, taking its biggest functions and relegating them to ancillary “programs.” So mpgtx -s is the same is mpgsplit, and so forth.

I mention that only because I enjoy little conveniences like those.

Once you get used to how mpgtx represents ranges and times, it becomes a piece of cake to get it to split or join as you like.

Knowing some of the mechanics of a video file are important too though. I had a lot of false starts with split video files until I used the -P flag, to preserve the metadata between the original and the chunk output.

Of course, it’s not quite correct, but it helps get playback started.

mpgtx claims it can handle mp3 files too, and tagmp3 — supposedly the same as mpgtx -T — has a lot of the same functions as in mp3rename and similar tools.

So what you get with mpgtx is a wide variety of tools that approach a wide variety of media files. Not a bad tool to have around. 😐

One final note: I feel somehow obligated to mention that the last posted update to mpgtx was in 2005.

Ordinarily I don’t mind if a program is out of date, even if it stretches back to the late 1980s.

Part of me wonders how well this is keeping up with newer file types and media standards, and if that would be an issue with more recently encoded files.

Take care and keep backups, would be my advice. Not to be a scaremonger, just that prudence is the better part of valor.

mp3wrap: What has been rent asunder shall be forged anew

Mentioning mp3splt before mp3wrap was putting things out of order, in a manner of speaking.

After all, mp3wrap bundles a series of mp3 files as a single, playable, continuous mp3 file, while retaining all the tag data. Magical.

2014-01-20-lv-r1fz6-mp3wrap

And of course, mp3splt reverses that action, like we talked about earlier today.

mp3wrap handles itself nicely, keeps flags to a minimum, gives you plenty of information while it’s working, and keeps the concatenated file around the same size as a compressed file of the same info.

I’m not sure why a wrapped mp3 file would be preferable to split singles, but that shouldn’t perplex me. To each his own.

But what will keep me awake tonight is the question … could I wrap several wrapped files, and retain the individual file information? Can I wrap every song in my collection? Is there a mega-mp3wrap solution to all my music backup needs?

The mind boggles. 😯

mp3splt: A rare animal indeed

I can attest to having personally used mp3splt. Once. About three years ago.

2014-01-20-lv-r1fz6-mp3splt

In its graphical form. 😳

2014-01-20-lv-r1fz6-mp3splt-gtk

Yes, I know, it’s a huge failure on my part. I suppose I just should have lied and said I was a die-hard terminal-only fan. No one would have known I was lying. On the Internet, nobody knows you’re a dog.

But I’ll tell the truth this time, since it’s only a misdemeanor and not a felony to use mp3splt graphically.

The context may also be relevant. I asked a friend to rip a CD for me, and she was not a terrific computer person, and came back with a single, enormous bin-cue set of all the audio tracks in one lump.

I tried not to be disappointed since she had done a favor for me, and you can imagine the rest of the story. mp3splt came to the rescue, and the credits rolled.

In my defense, splitting an entire album out of one file is a rather tedious task for the command line, and so again, I feel no compunction about admitting it was easier to import and split the bin-cue that way.

And it may be that the GTK rendition is really the main tool here, and text-only mp3splt is just a backend tool. Like we talked about with inkscape‘s text only mode, or maybe mplayer or lilypond.

These are all just tools, not religious icons. Use them or don’t. Blah blah blah Linux freedom blah blah blah. You know the score. 😉

hoz: Speedy, but with quirks

I’m not exactly flush with file splitters these days, but I have hoz on my list as a program to revisit.

hoz is almost 10 years old now; it was not a newcomer when I first found it four years ago. Since then, as best I can tell, there hasn’t been an update, meaning it’s much the same as it was when I first found it.

2013-11-24-lv-r1fz6-hoz

The original file there is just a dump of /dev/urandom, to fit a 512Mb file. hoz is splitting into fourths, which oddly leaves a single zero-byte file at the end. I’m not sure if that’s a glitch in the software or just a side effect of my math.

hoz is fast and fairly straightforward; everything you need to know is in the help flags. I can tell you to cut with -c and paste back together with -p, and I’ve given away probably half of what hoz has there.

It is not without its irritations though. For one, hoz doesn’t understand human-readable sizes — hence the ungainly string of numbers in the command near the top, to get something close to 512Mb divided by four.

That’s a major inconvenience considering I usually want to split files into dimensions that will fit a CD, or squeeze onto a flash drive.

Second, hoz’s output files don’t seem reattachable, except with hoz. I tried to cat the same series of files back into a new file, and got a very different md5sum as a result.

Meaning, if you’re looking for something to split files and send them on to someone else, they’ll likely need something compatible with hoz to get them back together.

On the other hand that could be a good thing, if you’re looking for an obscure format to further obfuscate split files.

Third, and this might just be my complaint, but I prefer zero-padded numbers. hoz just tacks the arbitrary number on the end, which spatters them all out of sequence in a directory. Not huge, but it rankles my OCPD. 😈

That’s probably enough about hoz now. For my money, GNU split or lxsplit are better solutions, just because they seem to fill the three gaps I mention above. hoz is an option though, if any of the others fail to satisfy.

csplit: split, but with a little more control

Remember lxsplit? I mentioned GNU split when I brought it up back in May. split is good stuff — breaks apart a file based on size or count, and you can reattach it with cat.

But what if you don’t care about the size, but want to break files depending on the contents?

That’s where csplit comes in.

2013-09-26-v5-122p-csplit

Another goodie from coreutils, csplit skims through the contents of a file, looks for matching strings, and where it finds it, breaks the file apart.

In the screenshot above, csplit is skimming through (yet another) file of random words, and breaking every time it finds the letter x … as many times as it takes. Cool, huh?

csplit is marvelously flexible; you can give it patterns, repeat those patterns, control the output filenames, numbering system … you name it.

csplit is really worth investigating. split on its own is a great tool, but csplit gives the same effect, with a little more control. 🙂

lxsplit: A file splitter for like-minded people

GNU split is the gold standard for splitting standard files at the console, I will admit that. What I don’t like about GNU split though, is that its default behavior is a bit esoteric.

Maybe you don’t mind if your file, painstakingly named to suit a pattern of dates, sequences and hostnames, is mangled viciously and ends up as xaa, xab, xac and so forth. 👿

Yes, I know. This probably hearkens back to the Unix of the 1960s, and we all know how perfect they were. 🙄

Look, here’s what I want: a splitter that keeps the basename, and just tacks on a dot and a series of zero-padded numbers, to keep things in sequence. Is that so much to ask?

2013-04-28-solo-2150-lxsplit

Well look at that. I guess someone out there had the same idea as me.

That’s lxsplit, if you haven’t read the title of this post yet. 🙄 Very small, very light, very quick and with sane defaults.

Now I know you’re going to tell me how to manage split to get those results. I am going to cut you off at the pass and tell you I know how to manage split. I can read its help message.

I’m just pining for something quicker … something less obtuse. I am allowed to do that, you know. 😐

P.S.: As you can see in the screenshot, lxsplit will rejoin files too.