Splitting Large Text Files

by chris bose on June 15, 2011

Our focused web crawler produces large raw data text files up to 150 Gb. These have to be split to work in a Text editor like BBEdit whose top limit is 300Mb.

You can use a utility like Split and Concat but this can only be used manually and has no batch facility.

I wanted to schedule split to run overnight, and found the ‘at’ command.

However, the ‘at’ command is used to schedule the commands and the’atrun’ utility is used to execute the jobs.

By default, the ‘atrun’ utility is not enabled so any jobs scheduled just sit there and never run. To enable ‘atrun’ execute the following:
sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.atrun.plist

Now you can use the ‘at’ command.

Thanks to DeveloperCoach for this

To disable the atrun command from running, you can again manipulate launchd settings via launchctl.  You would do this if you were not actively using at and wanted to prevent extraneous disk access and to ensure the computer sleeps properly.

sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.atrun.plist

Splitting a 27Gb into 300Mb chunks took 16 minutes on my desktop, running in the background.

Previous post:

Next post: