GNU/Linux / File management
From WhyNotWiki
GNU/Linux / File management edit (Category edit)
Also covers disk management.
Aliases: File commands in GNU/Linux, File management on GNU/Linux
Contents |
[edit] Finding files
GNU/Linux / Finding files edit
[edit] Finding all files that aren't named __
Say you want a list of all files and directories that are not named .svn...
You might think that the command to do this would simply be:
> find . \( -type d -name .svn -prune \)
But that pretty much does the exact opposite of what you want (why?). So even though you said -prune, it actually did the equivalent of find . -name .svn!:
rails_root> find . \( -type d -name .svn -prune \) ./public/javascripts/.svn ./public/.svn ./public/stylesheets/.svn ./public/images/.svn
Weird.
Instead, you apparently need to or it with an action, such as -print or -exec...
rails_root> find . \( -type d -name .svn -prune \) -o -exec echo {} \;
rails_root> find . \( -type d -name .svn -prune \) -o -print
.
./public
./public/dispatch.cgi
./public/javascripts
./public/javascripts/prototype.js
./public/javascripts/effects.js
./public/javascripts/dragdrop.js
./public/javascripts/controls.js
./public/javascripts/application.js
./public/favicon.ico
./public/index.html
./public/404.html
./public/robots.txt
./public/500.html
./public/dispatch.fcgi
./public/stylesheets
./public/.htaccess
./public/images
./public/images/rails.png
./public/dispatch.rb
[edit] Finding files based on modified date
Files modified today (0 days ago), ordered by date:
$ sudo find . -ctime 0 -printf "%AT %p\n" | sort | nosvn 09:28:43 ./.bash_history 09:44:12 ./.psql_history 10:29:01 ./public_html/logs/db_errors.log 10:29:01 ./public_html/logs/frontend.log 10:29:01 ./public_html/logs/test.log 10:29:01 ./public_html/logs/tests.log 14:09:07 ./.viminfo 14:25:28 ./devNotInRepo.tgz 14:40:55 ./.lesshst 14:41:38 . 14:41:38 ./data/faxes/tmp_attach 14:41:38 ./data/faxes/to_send 14:41:39 ./bin
Files modified today in the 12:00 hour.
$ sudo find ~anthony -ctime 0 -printf "%AT %p\n" | grep "^12:" | grep -v svn
[edit] Find files larger than a certain minimum size
> # Find files bigger than 10,000 KB :
> find / -xdev -size +10000k -exec du --megabytes {} \;
26 /var/lib/slocate/slocate.db
13 /var/lib/rpm/Packages
[edit] Find all files that have CRLF line endings
> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF
./init.rb: ASCII text, with CRLF line terminators
Now that you've found all of those files, what if you wanted to convert them to Unix format (LF)?
Unfortunately, there's no easy way (that I know of) to take the output from the previous command and pipe it to another command which will convert it for you.
Why is this? Well, because the text output by file that comes after the filename (ASCII text, with CRLF line terminators, for instance, will itself be interpreted as filenames by any program we pipe this output into. Demonstration:
> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | xargs -n1 echo
./init.rb:
ASCII
text,
with
CRLF
line
terminators
None of those are valid filenames, actually. So it would do no good for us to construct this command:
> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | xargs -n1 dos2unix
How would we construct a command like that that worked then?
Well, we could write a Ruby script to do it...
...
Or we could put a [sed (category)] command in the command pipe chain...
> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | sed -e 's/^\(.*\): .*$/\1/g' | xargs -n1 dos2unix
Or we could use another filter program that filters out just the filename part... (http://svn.tylerrick.com/public/shell/bin/filename)
> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | filename | xargs -n1 dos2unix
Or we can just convert the individually files manually:
> dos2unix init.rb dos2unix: converting file init.rb to UNIX format ...
[edit] Finding files that all hard links to a file
http://erik.thauvin.net/wiki/pages/viewpage.action?pageId=67.
Hardlink files are identified by the second field of a long list. For example:
> ls -l total 4 -rw-r-r- 1 erik erik 1 Feb 16 02:07 test> ln test test2 > ls -l total 8 -rw-r-r- 2 erik erik 1 Feb 16 02:07 test -rw-r-r- 2 erik erik 1 Feb 16 02:07 test2Notice field 2 changed from 1 to 2 indicating that a hard link has been created. There are now 2 filenames pointing to the same data.
To list all hardlinked files in the current directory:
> find . ! -type d -links +1 -ls | sort -n 18587 4 -rw-r-r- 2 erik erik 1 Feb 16 02:07 ./test 18587 4 -rw-r-r- 2 erik erik 1 Feb 16 02:07 ./test2Note: Use: ls -i to also list directories.
To find all hardlink to the same inode:
> find / -xdev -inum 18587 /home/erik/test/test /home/erik/test/test2Note: Hardlinks have to be located on the same file system, use df . to determine the current filesystem.
[edit] Copying files
[edit] cp : Copy a file or directory
[edit] ecp
http://www.nongnu.org/ecp/manual.html
ecp is intended to be an extended version (extended cp) of the GNU coreutils cp. What it does do is the following:
- display progress indication as it copies (on a per file basis)
- copy to and from ftps (only single files at the moment)
- works like mv with the proper command line switches enabled
The idea behind ecp is that wget, scp and cp can efficiently be combined into one program, unifying the syntax and usage of them all.
[edit] comparison of rsync and scp
| rsync | scp | |
|---|---|---|
| Speed | Faster | |
| Symlinks | dereferenced unless -l, --links (or -a, --archive) |
always dereferenced |
| Power/flexibility | Seems superior | |
| Default verbosity | no output at all | shows progress meter for every file copied |
command source_dir dest_dir |
Results in dest_dir/source_dir/ (copies source_dir and its contents) |
Results in dest_dir/source_dir/ |
command source_dir/ dest_dir |
Results in dest_dir/ (copies contents of source_dir) |
Results in dest_dir/source_dir/? (copies source_dir and its contents) |
command source_dir/ dest_dir/source_dir |
Results in dest_dir/source_dir/ |
Results in dest_dir/source_dir/? |
[edit]
http://www.zmonkey.org/blog/node/132.
The question came up today about relative speeds of scp, tar and rsync (the latter two using ssh as a transport mechanism). [...]
I set up a script to copy a directory 5 times from my laptop to a server on the same subnet. I routinely pull 3MB/s from that server (over wifi), so bandwidth wasn't an issue. I used /var/lib/dpkg as my source directory. It weighed in a 57MB and contained 6896 files. Because rsync will compare changes between source and destination, I made sure to nuke the directory off the server after every run.
Method: scp rsync+ssh tar+ssh Average Time: 269.75s 33.6s 24.43s Bandwidth (mbps): 1.69 13.57 18.66The results are what I expected, at least as far as scp is concerned. It does not do well with large numbers of small files. It copied each file over completely before it started with the next one. Tar of course put the whole thing together and then shipped it off. Rsync read all the files first, then compared them to the server and then shipped them all in one go. Apparently there were some significant I/O savings to be had that way.
One other important item of note is that scp did not handle symlinks the way tar and rsync did. It dereferenced the symlink and copied the contents of that link rather than copying the link itself. That was a problem because I had picked some self-referential directories before I settled on /var/lib/dpkg.
For your reference, here are the commands I ran to test:
for i in 1 2 3 4 5; do time scp -qrp /var/lib/dpkg [server]:/tmp; ssh [server] rm -fr /tmp/dpkg; done for i in 1 2 3 4 5; do time rsync -ae ssh /var/lib/dpkg [server]:/tmp; ssh [server] rm -fr /tmp/dpkg; done for i in 1 2 3 4 5; do time tar -cf - /var/lib/dpkg |ssh [server] tar -C /tmp -xf - ; ssh [server] rm -fr /tmp/dpkg; done
[edit] rsync
This is a great way to copy files and whole directories from one host to another.
Directories:
$ rsync -a sourcehost:/home/blah/test ./
Permissions: It will log into sourcehost as whatever you are currently logged in on the local host. You may have to enter a password. You can tell it to use a different username, like this:
$ rsync -a otheruser@sourcehost:/home/blah/dir ./
If the source is a directory, it will copy the contents of that directory. So if /home/blah/dir contained files a and b then this would copy those files right to ./ . You might rather use:
$ rsync -a otheruser@sourcehost:/home/blah/dir ./dir
One problem I've had with rsync is that it will say things like
skipping non-regular file ".htaccess"
It looks like it only does that for files that are symlinks...
- Is there a way to make it dereference the symlink before copying?
- How does scp do it? How does it handle symlinks?
man rsync
rsync -avz foo:src/bar /data/tmp
This would recursively transfer all files from the directory src/bar on the machine foo into the /data/tmp/bar directory on the local machine. The
files are transferred in “archive” mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in
the transfer. Additionally, compression will be used to reduce the size of data portions of the transfer.
rsync -avz foo:src/bar/ /data/tmp
A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a
trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy the directory by name”, but in both cases the
attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following com‐
mands copies the files in the same way, including their setting of the attributes of /dest/foo:
rsync -av /src/foo /dest
rsync -av /src/foo/ /dest/foo
man rsync
-a, --archive
This is equivalent to -rlptgoD. It is a quick way of saying you want recursion and want to preserve almost everything (with -H being a
notable omission). The only exception to the above equivalence is when --files-from is specified, in which case -r is not implied.
Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive. You must separately specify -H.
-r, --recursive
This tells rsync to copy directories recursively. See also --dirs (-d).
-l, --links
When symlinks are encountered, recreate the symlink on the destination.
-p, --perms
This option causes the receiving rsync to set the destination permissions to be the same as the source permissions. (See also the --chmod
option for a way to modify what rsync considers to be the source permissions.)
-t, --times
This tells rsync to transfer modification times along with the files and update them on the remote system. Note that if this option is not
used, the optimization that excludes files that have not been modified cannot be effective; in other words, a missing -t or -a will cause
the next transfer to behave as if it used -I, causing all files to be updated (though the rsync algorithm will make the update fairly effi‐
cient if the files haven’t actually changed, you’re much better off using -t).
-o, --owner
This option causes rsync to set the owner of the destination file to be the same as the source file, but only if the receiving rsync is
being run as the super-user (see also the --super option to force rsync to attempt super-user activities). Without this option, the owner
is set to the invoking user on the receiving side.
The preservation of ownership will associate matching names by default, but may fall back to using the ID number in some circumstances (see
also the --numeric-ids option for a full discussion).
-g, --group
This option causes rsync to set the group of the destination file to be the same as the source file. If the receiving program is not run‐
ning as the super-user (or if --no-super was specified), only groups that the invoking user on the receiving side is a member of will be
preserved. Without this option, the group is set to the default group of the invoking user on the receiving side.
The preservation of group information will associate matching names by default, but may fall back to using the ID number in some circum‐
stances (see also the --numeric-ids option for a full discussion).
-D The -D option is equivalent to --devices --specials.
--devices
This option causes rsync to transfer character and block device files to the remote system to recreate these devices. This option has no
effect if the receiving rsync is not run as the super-user and --super is not specified.
--specials
This option causes rsync to transfer special files such as named sockets and fifos.
man rsync
-v, --verbose
This option increases the amount of information you are given during the transfer. By default, rsync works silently. A single -v will give
you information about what files are being transferred and a brief summary at the end. Two -v flags will give you information on what files
are being skipped and slightly more information at the end. More than two -v flags should only be used if you are debugging rsync.
Note that the names of the transferred files that are output are done using a default --out-format of "%n%L", which tells you just the name
of the file and, if the item is a link, where it points. At the single -v level of verbosity, this does not mention when a file gets its
attributes changed. If you ask for an itemized list of changed attributes (either --itemize-changes or adding "%i" to the --out-format
setting), the output (on the client) increases to mention all items that are changed in any way. See the --out-format option for more
details.
-z, --compress
With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmit‐
ted — something that is useful over a slow connection.
Note that this option typically achieves better compression ratios than can be achieved by using a compressing remote shell or a compress‐
ing transport because it takes advantage of the implicit information in the matching data blocks that are not explicitly sent over the con‐
nection.
[edit] Exclude directories
It is not uncommon that I want to copy an entire directory tree -- except for some large subdirectories containing logs or temporary files or something. This is where the --exclude option comes in...
One thing to note is that you need to specify paths relative to your source path, so:
$ rsync -a ~/source/ --exclude db_backups dest/
, not:
$ rsync -a ~/source/ --exclude ~/source/db_backups dest/
[edit] scp
I think I like scp slightly better than rsync -- at least for smaller transfers.
By default it shows a progress meter for every file transferred. This can be really annoying for large transfers. (How do you turn it off?)
Example:
scp -r ./dir target_host:target_dir/
I think target_dir is relative to $HOME if an absolute path is not given.
[edit]
Osamu Aoki. Debian Reference Chapter 8 - Debian tips (http://www.debian.org/doc/manuals/reference/ch-tips.en.html).
8.3.1 Basic commands for copying a whole subdirectory
If you need to rearrange file structure, move content including file links by:
Standard method: # cp -a /source/directory /dest/directory # requires GNU cp # (cd /source/directory && tar cf - . ) | \ (cd /dest/directory && tar xvfp - ) If a hard link is involved, a pedantic method is needed: # cd /path/to/old/directory # find . -depth -print0 | afio -p -xv -0a /mount/point/of/new/directory If remote: # (cd /source/directory && tar cf - . ) | \ ssh user@host.dom (cd /dest/directory && tar xvfp - ) If there are no linked files: # scp -pr user1@host1.dom:/source/directory \ user2@host2.dom:/dest/directory
[edit] Links
[edit] Hard links
http://en.wikipedia.org/wiki/Hard_link.
...
There are some issues with hard links that can sometimes make them unsuitable. First of all, because the link is identical to the thing it points to, it becomes difficult to give a command such as "list all the contents of this directory recursively but ignore any links". [...] Another drawback of hard links is that they have to be located within the same file system, and most large systems today consist of multiple file systems.
Symbolic links and hard links (http://www.wellho.net/mouth/334_Symbolic-links-and-hard-links.html).
A Hard Link is where a file has two names which are both on an equal weighting, and both of the file names in the "inode table" point directly to the blocks on the disc that contain the data.
...
A Symbolic Link is where a file has one main name, but there's an extra entry in the file name table that refers any accesses back to the main name. This is slighly slower at runtime that a hard link, but it's more flexible and much more often used in day to day admin work.
...
[edit] ls command: list files
The man page doesn't tell you what each of the columns in a long (ls -l) listing mean, nor are there any headers above the columns.
http://docweb.cns.ufl.edu/docs/d0107/ar07s04.html.
File Type. The first position/character in this column describes what type of entry each horizontal line represents. The first character will usually be either a - or a d. Symbols in the first (left most) position on the line represent:
-
- = Regular File
d
- = Directory
You can tell, just by looking at the output from ls -l which entries are files and which are directories.
Permissions. The rest of the first column (after the - or d) indicates access permissions which have been set either explicitly, by you, using the chmod command (which we'll look at later) or automatically, by the system, when the file (or directory) was created. We'll discuss this part of the listing in more detail when we get to the chmod command, which deals with setting file access permissions.
Directory Entries (and Hard-Link Count). The next column (immediately to the right of the access permissions) is a number which tells how many directory entries are under that item. For a regular file, this will typically be 1. For a directory, this will always be at least 2. The reason for this is that every directory always contains pointers to both itself, and its parent directory. You can see these two entries as the first two items in our example listing in Figure 12: the entries for . and .. (i.e. one period, and two periods). The single period ( . ), is the pointer to the current directory--the directory that you are "in" right now. Two periods ( .. ), point to the parent directory--the directory which contains the current directory. You can use these as convenient nicknames/shortcuts in some commands, when you want to describe a relative pathname.
[edit] Renaming a batch of files according to a pattern
[edit] Example 1: Changing a prefix
Say you want to rename a bunch of files beginning with IMG_:
> ls IMG_* IMG_4731.jpg IMG_4732.jpg IMG_4733.jpg ...
and you want to change the "IMG_" part to "Image_". How would you do it?
First you should preview what your command will do to make sure it looks like what you want:
> for file in IMG* ; do echo mv $file `echo $file | sed 's/IMG\(.*\)\.jpg/Image\1.jpg/'` ; done mv IMG_4731.jpg Image_4731.jpg mv IMG_4732.jpg Image_4732.jpg mv IMG_4733.jpg Image_4733.jpg ...
Then you can go ahead and proceed with confidence:
> for file in IMG* ; do mv $file `echo $file | sed 's/IMG\(.*\)\.jpg/Image\1.jpg/'` ; done > ls IMG* ls: IMG*: No such file or directory
[edit] Example 1b
> for file in Button* ; do echo mv $file `echo $file | sed 's/Button\(.*\)/button\1/'` ; done
[edit] Example 2: Change the extension
(Source: http://lab.artlung.com/unix-batch-file-rename/)
# change .htm files to .html for file in *.htm ; do mv $file `echo $file | sed 's/\(.*\.\)htm/\1html/'` ; done
[edit] Merging two directory trees
[edit] Problem: you want to merge/blend the directories, not just move/copy dir1 into dir2
This doesn't work to merge directories:
> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2 > mv dir1 dir2 > ls dir2 dir1 file2
> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2 > cp -R dir1 dir2 > ls dir2 dir1 file2
[edit] "the slash dot trick"
or: "the slashdot trick"
Merging two directory trees should be doable using the cp command.
I believe there is a way of using ``cp -R'' so that it populates an
existing directory. For example, the GNU cp seems to do it when you
specify the directory like this:
cp -R source/. target
Then rather than copying, say, ``source/myfile'' to
``target/source/myfile'', it copies it to target/myfile. No
subdirectory is created. I have no idea how portable that trick is to
other cp's but it seems to be specified by The Single UNIX
Specification:
The cp utility will copy the contents of each source_file to the
destination path named by the concatenation of target, a slash
character and the last component of source_file
—Source unknown (a mailing list, I think)
> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2 > cp -R dir1/. dir2 > ls dir2 file1 file2
Also works with hidden files:
> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3 > cp -R dir1/. dir2 > ls -a dir2 . .. file1 .file2 file3
In Ruby (http://ruby-doc.org/core/classes/FileUtils.html#M004350)
# If you want to copy all contents of a directory instead of the
# directory itself, c.f. src/x -> dest/x, src/y -> dest/y,
# use following code.
FileUtils.cp_r 'src/.', 'dest' # cp_r('src', 'dest') makes src/dest,
# but this doesn't.
[edit] How do you move everything from one directory into another?
mv dir1/. dir2 doesn't work the same way cp -R dir1/. dir2 does:
> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2 > mv dir1/. dir2 mv: cannot move `dir1/.' to `dir2/.': Device or resource busy
mv dir1/* works, but it doesn't include hidden files:
> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3 > mv dir1/* dir2 > ls -a dir2 . .. file1 file3
This sort of works:
> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3 > mv dir1/* dir2; mv dir1/.* dir2 mv: cannot move `dir1/.' to `dir2/.': Device or resource busy mv: cannot remove `dir1/..': Is a directory > ls -a dir2 . .. file1 file3
Perhaps the best solution: cp, then rm
> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3 > cp -R dir1/. dir2; rm -r dir1 > ls -a dir2 . .. file1 .file2 file3
[edit] Creating directories
Create all components of a specified path:
> mkdir -p ~/path/to/somewhere or > install -d ~/path/to/somewhere > ls ~/path/to/somewhere
[edit] Checking space usage/availability
[edit] How big is this directory? du
$ du -sh 660M .
$ du -sh /home/* 5.5G /home/tyler 306M /home/harry 775M /home/jane
Which folders are the biggest?
$ du --summarize --megabytes * | sort --numeric-sort --reverse | head 737 dev 729 Xsvndumps 298 svn 194 Tyler Rick.zip 158 dump 82 temp 62 public_html
[edit] How much space is available on my disks? df
$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 252240 135968 116272 54% / /dev/sda3 501244 118712 382532 24% /root /dev/sda5 3906996 3738920 168076 96% /usr /dev/sda10 19538240 16768916 2769324 86% /home /dev/sda7 501216 23628 477588 5% /tmp /dev/sda6 983164 392932 590232 40% /var /dev/sda8 2935060 1421504 1513556 49% /var/log /dev/sda11 1481180 20320 1460860 2% /var/tmp /dev/md0 35278456 33480668 5744 100% /home/userdata none 452396 0 452396 0% /dev/shm $ df --megabytes
[edit] Working with .tgz files
Un-tar-zip something:
tar -xzvf filename.tar.gz
[edit] Mounts / mounting / unmounting / file system table
[edit] fstab
http://en.wikipedia.org/wiki/Fstab
http://www.tuxfiles.org/linuxhelp/fstab.html
[edit] Problem: Nautilus complains that the file is an "executable text file".
2008-01-15 18:29
Credits: https://bugs.launchpad.net/ubuntu/+source/partman-basicfilesystems/+bug/78505
It gives you these 4 options: Run in Terminal, Display, Cancel, Run
One wonders why it thinks it is "executable". By all appearances, it's just a normal text file, not a script or anything fancy. The answer is that when mounting the NTFS drive, it used umask=007, which set all files to have the x (executable) permission bit set.
The desired behavior is to just have it Display the file in your associated text editor. Here's how:
I changed my /etc/fstab from this:
to this:
# <file system> <mount point> <type> <options> <dump> <pass> UUID=C20C380D0C37FB4D /media/sdb6 ntfs defaults,umask=007,gid=46 0 1
# <file system> <mount point> <type> <options> <dump> <pass> UUID=C20C380D0C37FB4D /media/sdb6 ntfs defaults,dmask=007,fmask=117,gid=46 0 1
And then remounted it:
$ sudo umount /media/sdb6 $ sudo mount -a
[edit] Problem: Can't unmount: what to do if you get "device is busy" errors
2008-01-15 18:29
http://lkml.org/lkml/2006/1/12/450.
fuser -m /mnt/datawill list the process ID of any process using the mount.If your feeling brave,
fuser -mkwill just kill them.
$ fuser -m /media/sdb6 /media/sdb6: 9901 $ ps 9901 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND tyler 9901 0.0 3.2 287408 67768 ? Sl 00:10 0:07 evince file:///home/tyler/Data/Books/....pdf
http://wiki.linuxquestions.org/wiki/Umount
http://www.idevelopment.info/data/Unix/General_UNIX/GENERAL_Troubleshootingthedeviceisbusy.shtml
