GNU/Linux / File management

From WhyNotWiki

Jump to: navigation, search

GNU/Linux / File management  edit   (Category  edit)


Also covers disk management.

Aliases: File commands in GNU/Linux, File management on GNU/Linux

Contents

[edit] Finding files

GNU/Linux / Finding files edit

[edit] Finding all files that aren't named __

Say you want a list of all files and directories that are not named .svn...

You might think that the command to do this would simply be:

> find . \( -type d -name .svn -prune \)

But that pretty much does the exact opposite of what you want (why?). So even though you said -prune, it actually did the equivalent of find . -name .svn!:

rails_root> find . \( -type d -name .svn -prune \)
./public/javascripts/.svn
./public/.svn
./public/stylesheets/.svn
./public/images/.svn

Weird.

Instead, you apparently need to or it with an action, such as -print or -exec...

rails_root> find . \( -type d -name .svn -prune \) -o -exec echo {} \;
rails_root> find . \( -type d -name .svn -prune \) -o -print
.
./public
./public/dispatch.cgi
./public/javascripts
./public/javascripts/prototype.js
./public/javascripts/effects.js
./public/javascripts/dragdrop.js
./public/javascripts/controls.js
./public/javascripts/application.js
./public/favicon.ico
./public/index.html
./public/404.html
./public/robots.txt
./public/500.html
./public/dispatch.fcgi
./public/stylesheets
./public/.htaccess
./public/images
./public/images/rails.png
./public/dispatch.rb


[edit] Finding files based on modified date

Files modified today (0 days ago), ordered by date:

$ sudo find . -ctime 0 -printf "%AT %p\n" | sort | nosvn
09:28:43 ./.bash_history
09:44:12 ./.psql_history
10:29:01 ./public_html/logs/db_errors.log
10:29:01 ./public_html/logs/frontend.log
10:29:01 ./public_html/logs/test.log
10:29:01 ./public_html/logs/tests.log
14:09:07 ./.viminfo
14:25:28 ./devNotInRepo.tgz
14:40:55 ./.lesshst
14:41:38 .
14:41:38 ./data/faxes/tmp_attach
14:41:38 ./data/faxes/to_send
14:41:39 ./bin

Files modified today in the 12:00 hour.

$ sudo find ~anthony -ctime 0 -printf "%AT %p\n" | grep "^12:" | grep -v svn

[edit] Find files larger than a certain minimum size

> # Find files bigger than 10,000 KB :
> find / -xdev -size +10000k -exec du --megabytes {} \;
26      /var/lib/slocate/slocate.db
13      /var/lib/rpm/Packages

[edit] Find all files that have CRLF line endings

> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF
./init.rb: ASCII text, with CRLF line terminators

Now that you've found all of those files, what if you wanted to convert them to Unix format (LF)?

Unfortunately, there's no easy way (that I know of) to take the output from the previous command and pipe it to another command which will convert it for you.

Why is this? Well, because the text output by file that comes after the filename (ASCII text, with CRLF line terminators, for instance, will itself be interpreted as filenames by any program we pipe this output into. Demonstration:

> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | xargs -n1 echo
./init.rb:
ASCII
text,
with
CRLF
line
terminators

None of those are valid filenames, actually. So it would do no good for us to construct this command:

> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | xargs -n1 dos2unix

How would we construct a command like that that worked then?

Well, we could write a Ruby script to do it...

...

Or we could put a [sed (category)] command in the command pipe chain...

> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | sed -e 's/^\(.*\): .*$/\1/g' | xargs -n1 dos2unix

Or we could use another filter program that filters out just the filename part... (http://svn.tylerrick.com/public/shell/bin/filename)

> find . \( -type d -name .svn -prune \) -o -exec file {} \; | grep CRLF | filename | xargs -n1 dos2unix


Or we can just convert the individually files manually:

> dos2unix init.rb
dos2unix: converting file init.rb to UNIX format ...

[edit] Finding files that all hard links to a file

http://erik.thauvin.net/wiki/pages/viewpage.action?pageId=67. Retrieved on 2007-05-11 11:18.

Hardlink files are identified by the second field of a long list. For example:

> ls -l
total 4
-rw-r-r-    1 erik     erik            1 Feb 16 02:07 test
> ln test test2
> ls -l
total 8
-rw-r-r-    2 erik     erik            1 Feb 16 02:07 test
-rw-r-r-    2 erik     erik            1 Feb 16 02:07 test2

Notice field 2 changed from 1 to 2 indicating that a hard link has been created. There are now 2 filenames pointing to the same data.

To list all hardlinked files in the current directory:

> find . ! -type d -links +1 -ls | sort -n
 18587    4 -rw-r-r-   2 erik     erik            1 Feb 16 02:07 ./test
 18587    4 -rw-r-r-   2 erik     erik            1 Feb 16 02:07 ./test2

Note: Use: ls -i to also list directories.

To find all hardlink to the same inode:

> find / -xdev -inum 18587
/home/erik/test/test
/home/erik/test/test2

Note: Hardlinks have to be located on the same file system, use df . to determine the current filesystem.

 


[edit] Copying files

[edit] cp : Copy a file or directory

[edit] ecp

http://www.nongnu.org/ecp/manual.html

ecp is intended to be an extended version (extended cp) of the GNU coreutils cp. What it does do is the following:

  • display progress indication as it copies (on a per file basis)
  • copy to and from ftps (only single files at the moment)
  • works like mv with the proper command line switches enabled

The idea behind ecp is that wget, scp and cp can efficiently be combined into one program, unifying the syntax and usage of them all.

[edit] comparison of rsync and scp

rsync scp
Speed Faster
Symlinks dereferenced unless -l, --links (or -a, --archive) always dereferenced
Power/flexibility Seems superior
Default verbosity no output at all shows progress meter for every file copied
command source_dir dest_dir Results in dest_dir/source_dir/ (copies source_dir and its contents) Results in dest_dir/source_dir/
command source_dir/ dest_dir Results in dest_dir/ (copies contents of source_dir) Results in dest_dir/source_dir/? (copies source_dir and its contents)
command source_dir/ dest_dir/source_dir Results in dest_dir/source_dir/ Results in dest_dir/source_dir/?


[edit]

http://www.zmonkey.org/blog/node/132. Retrieved on 2007-05-11 11:18.

The question came up today about relative speeds of scp, tar and rsync (the latter two using ssh as a transport mechanism). [...]

I set up a script to copy a directory 5 times from my laptop to a server on the same subnet. I routinely pull 3MB/s from that server (over wifi), so bandwidth wasn't an issue. I used /var/lib/dpkg as my source directory. It weighed in a 57MB and contained 6896 files. Because rsync will compare changes between source and destination, I made sure to nuke the directory off the server after every run.

Method:            scp  rsync+ssh   tar+ssh
Average Time:  269.75s      33.6s    24.43s
Bandwidth (mbps): 1.69      13.57     18.66

The results are what I expected, at least as far as scp is concerned. It does not do well with large numbers of small files. It copied each file over completely before it started with the next one. Tar of course put the whole thing together and then shipped it off. Rsync read all the files first, then compared them to the server and then shipped them all in one go. Apparently there were some significant I/O savings to be had that way.

One other important item of note is that scp did not handle symlinks the way tar and rsync did. It dereferenced the symlink and copied the contents of that link rather than copying the link itself. That was a problem because I had picked some self-referential directories before I settled on /var/lib/dpkg.

For your reference, here are the commands I ran to test:

for i in 1 2 3 4 5; do time scp -qrp /var/lib/dpkg [server]:/tmp; ssh [server] rm -fr /tmp/dpkg; done
for i in 1 2 3 4 5; do time rsync -ae ssh /var/lib/dpkg [server]:/tmp; ssh [server] rm -fr /tmp/dpkg; done
for i in 1 2 3 4 5; do time tar -cf - /var/lib/dpkg |ssh [server] tar -C /tmp -xf - ; ssh [server] rm -fr /tmp/dpkg; done


[edit] rsync

This is a great way to copy files and whole directories from one host to another.

Directories:

$ rsync -a sourcehost:/home/blah/test ./

Permissions: It will log into sourcehost as whatever you are currently logged in on the local host. You may have to enter a password. You can tell it to use a different username, like this:

$ rsync -a otheruser@sourcehost:/home/blah/dir ./

If the source is a directory, it will copy the contents of that directory. So if /home/blah/dir contained files a and b then this would copy those files right to ./ . You might rather use:

$ rsync -a otheruser@sourcehost:/home/blah/dir ./dir

One problem I've had with rsync is that it will say things like

skipping non-regular file ".htaccess"

It looks like it only does that for files that are symlinks...

Is there a way to make it dereference the symlink before copying?
How does scp do it? How does it handle symlinks?


man rsync

              rsync -avz foo:src/bar /data/tmp

       This would recursively transfer all files from the directory src/bar on the machine foo into the /data/tmp/bar directory on the local machine. The
       files are transferred in “archive” mode, which ensures that symbolic links, devices, attributes, permissions, ownerships, etc.  are  preserved  in
       the transfer.  Additionally, compression will be used to reduce the size of data portions of the transfer.

              rsync -avz foo:src/bar/ /data/tmp

       A  trailing  slash  on  the  source  changes this behavior to avoid creating an additional directory level at the destination.  You can think of a
       trailing / on a source as meaning “copy the contents of this directory” as opposed to “copy  the  directory  by  name”,  but  in  both  cases  the
       attributes of the containing directory are transferred to the containing directory on the destination.  In other words, each of the following com‐
       mands copies the files in the same way, including their setting of the attributes of /dest/foo:

              rsync -av /src/foo /dest
              rsync -av /src/foo/ /dest/foo

man rsync

       -a, --archive
              This  is  equivalent  to  -rlptgoD.  It is a quick way of saying you want recursion and want to preserve almost everything (with -H being a
              notable omission).  The only exception to the above equivalence is when --files-from is specified, in which case -r is not implied.

              Note that -a does not preserve hardlinks, because finding multiply-linked files is expensive.  You must separately specify -H.

       -r, --recursive
              This tells rsync to copy directories recursively.  See also --dirs (-d).

       -l, --links
              When symlinks are encountered, recreate the symlink on the destination.

       -p, --perms
              This  option causes the receiving rsync to set the destination permissions to be the same as the source permissions.  (See also the --chmod
              option for a way to modify what rsync considers to be the source permissions.)

       -t, --times
              This tells rsync to transfer modification times along with the files and update them on the remote system.  Note that if this option is not
              used, the optimization that excludes files that have not been modified cannot be effective; in other words, a missing -t or -a  will  cause
              the next transfer to behave as if it used -I, causing all files to be updated (though the rsync algorithm will make the update fairly effi‐
              cient if the files haven’t actually changed, you’re much better off using -t).

       -o, --owner
              This option causes rsync to set the owner of the destination file to be the same as the source file, but only if  the  receiving  rsync  is
              being  run as the super-user (see also the --super option to force rsync to attempt super-user activities).  Without this option, the owner
              is set to the invoking user on the receiving side.

              The preservation of ownership will associate matching names by default, but may fall back to using the ID number in some circumstances (see
              also the --numeric-ids option for a full discussion).
       -g, --group
              This  option causes rsync to set the group of the destination file to be the same as the source file.  If the receiving program is not run‐
              ning as the super-user (or if --no-super was specified), only groups that the invoking user on the receiving side is a member  of  will  be
              preserved.  Without this option, the group is set to the default group of the invoking user on the receiving side.

              The  preservation  of  group information will associate matching names by default, but may fall back to using the ID number in some circum‐
              stances (see also the --numeric-ids option for a full discussion).


       -D     The -D option is equivalent to --devices --specials.
       --devices
              This option causes rsync to transfer character and block device files to the remote system to recreate these devices.  This option  has  no
              effect if the receiving rsync is not run as the super-user and --super is not specified.
       --specials
              This option causes rsync to transfer special files such as named sockets and fifos.


man rsync

       -v, --verbose
              This option increases the amount of information you are given during the transfer.  By default, rsync works silently. A single -v will give
              you information about what files are being transferred and a brief summary at the end. Two -v flags will give you information on what files
              are being skipped and slightly more information at the end. More than two -v flags should only be used if you are debugging rsync.

              Note that the names of the transferred files that are output are done using a default --out-format of "%n%L", which tells you just the name
              of  the  file and, if the item is a link, where it points.  At the single -v level of verbosity, this does not mention when a file gets its
              attributes changed.  If you ask for an itemized list of changed attributes (either --itemize-changes or adding  "%i"  to  the  --out-format
              setting),  the  output  (on  the  client) increases to mention all items that are changed in any way.  See the --out-format option for more
              details.
       -z, --compress
              With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmit‐
              ted — something that is useful over a slow connection.

              Note that this option typically achieves better compression ratios than can be achieved by using a compressing remote shell or a  compress‐
              ing transport because it takes advantage of the implicit information in the matching data blocks that are not explicitly sent over the con‐
              nection.

[edit] Exclude directories

It is not uncommon that I want to copy an entire directory tree -- except for some large subdirectories containing logs or temporary files or something. This is where the --exclude option comes in...

One thing to note is that you need to specify paths relative to your source path, so:

$ rsync -a ~/source/ --exclude db_backups dest/

, not:

$ rsync -a ~/source/ --exclude ~/source/db_backups dest/

[edit] scp

I think I like scp slightly better than rsync -- at least for smaller transfers.

By default it shows a progress meter for every file transferred. This can be really annoying for large transfers. (How do you turn it off?)

Example:

scp -r ./dir target_host:target_dir/

I think target_dir is relative to $HOME if an absolute path is not given.


[edit]

Osamu Aoki. Debian Reference Chapter 8 - Debian tips (http://www.debian.org/doc/manuals/reference/ch-tips.en.html). Retrieved on 2007-05-11 11:18.

8.3.1 Basic commands for copying a whole subdirectory

If you need to rearrange file structure, move content including file links by:

     Standard method:
     # cp -a /source/directory /dest/directory # requires GNU cp

     # (cd /source/directory && tar cf - . ) | \
             (cd /dest/directory && tar xvfp - )

     If a hard link is involved, a pedantic method is needed:
     # cd /path/to/old/directory
     # find . -depth -print0 | afio -p -xv -0a /mount/point/of/new/directory


     If remote:
     # (cd /source/directory && tar cf - . ) | \
             ssh user@host.dom (cd /dest/directory && tar xvfp - )

     If there are no linked files:
     # scp -pr user1@host1.dom:/source/directory \
               user2@host2.dom:/dest/directory

[edit] Links

[edit] Hard links

http://en.wikipedia.org/wiki/Hard_link. Retrieved on 2007-05-11 11:18.

...

There are some issues with hard links that can sometimes make them unsuitable. First of all, because the link is identical to the thing it points to, it becomes difficult to give a command such as "list all the contents of this directory recursively but ignore any links". [...] Another drawback of hard links is that they have to be located within the same file system, and most large systems today consist of multiple file systems.


Symbolic links and hard links (http://www.wellho.net/mouth/334_Symbolic-links-and-hard-links.html). Retrieved on 2007-05-11 11:18.

A Hard Link is where a file has two names which are both on an equal weighting, and both of the file names in the "inode table" point directly to the blocks on the disc that contain the data.

...

A Symbolic Link is where a file has one main name, but there's an extra entry in the file name table that refers any accesses back to the main name. This is slighly slower at runtime that a hard link, but it's more flexible and much more often used in day to day admin work.

...

[edit] ls command: list files

The man page doesn't tell you what each of the columns in a long (ls -l) listing mean, nor are there any headers above the columns.

http://docweb.cns.ufl.edu/docs/d0107/ar07s04.html. Retrieved on 2007-05-11 11:18.

File Type. The first position/character in this column describes what type of entry each horizontal line represents. The first character will usually be either a - or a d. Symbols in the first (left most) position on the line represent:

-

= Regular File

d

= Directory

You can tell, just by looking at the output from ls -l which entries are files and which are directories.

Permissions. The rest of the first column (after the - or d) indicates access permissions which have been set either explicitly, by you, using the chmod command (which we'll look at later) or automatically, by the system, when the file (or directory) was created. We'll discuss this part of the listing in more detail when we get to the chmod command, which deals with setting file access permissions.

Directory Entries (and Hard-Link Count). The next column (immediately to the right of the access permissions) is a number which tells how many directory entries are under that item. For a regular file, this will typically be 1. For a directory, this will always be at least 2. The reason for this is that every directory always contains pointers to both itself, and its parent directory. You can see these two entries as the first two items in our example listing in Figure 12: the entries for . and .. (i.e. one period, and two periods). The single period ( . ), is the pointer to the current directory--the directory that you are "in" right now. Two periods ( .. ), point to the parent directory--the directory which contains the current directory. You can use these as convenient nicknames/shortcuts in some commands, when you want to describe a relative pathname.

[edit] Renaming a batch of files according to a pattern

[edit] Example 1: Changing a prefix

Say you want to rename a bunch of files beginning with IMG_:

> ls IMG_*
IMG_4731.jpg
IMG_4732.jpg
IMG_4733.jpg
...

and you want to change the "IMG_" part to "Image_". How would you do it?

First you should preview what your command will do to make sure it looks like what you want:

[sed (category)]

> for file in IMG* ; do echo mv $file `echo $file | sed 's/IMG\(.*\)\.jpg/Image\1.jpg/'` ; done
mv IMG_4731.jpg Image_4731.jpg
mv IMG_4732.jpg Image_4732.jpg
mv IMG_4733.jpg Image_4733.jpg
...

Then you can go ahead and proceed with confidence:

> for file in IMG* ; do mv $file `echo $file | sed 's/IMG\(.*\)\.jpg/Image\1.jpg/'` ; done
> ls IMG*
ls: IMG*: No such file or directory

[edit] Example 1b

> for file in Button* ; do echo mv $file `echo $file | sed 's/Button\(.*\)/button\1/'` ; done

[edit] Example 2: Change the extension

(Source: http://lab.artlung.com/unix-batch-file-rename/)

# change .htm files to .html
for file in *.htm ; do mv $file `echo $file | sed 's/\(.*\.\)htm/\1html/'` ; done

[edit] Merging two directory trees

[edit] Problem: you want to merge/blend the directories, not just move/copy dir1 into dir2

This doesn't work to merge directories:

> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2
> mv dir1 dir2
> ls dir2
dir1  file2
> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2
> cp -R dir1 dir2
> ls dir2
dir1  file2

[edit] "the slash dot trick"

or: "the slashdot trick"

Merging two directory trees should be doable using the cp command.
I believe there is a way of using ``cp -R'' so that it populates an
existing directory. For example, the GNU cp seems to do it when you
specify the directory like this:

  cp -R source/. target

Then rather than copying, say, ``source/myfile'' to
``target/source/myfile'', it copies it to target/myfile.  No
subdirectory is created.  I have no idea how portable that trick is to
other cp's but it seems to be specified by The Single UNIX
Specification:

    The cp utility will copy the contents of each source_file to the
    destination path named by the concatenation of target, a slash
    character and the last component of source_file

—Source unknown (a mailing list, I think)

> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2
> cp -R dir1/. dir2
> ls dir2
file1  file2

Also works with hidden files:

> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3
> cp -R dir1/. dir2
> ls -a dir2
.  ..  file1  .file2  file3

In Ruby (http://ruby-doc.org/core/classes/FileUtils.html#M004350)

  # If you want to copy all contents of a directory instead of the
  # directory itself, c.f. src/x -> dest/x, src/y -> dest/y,
  # use following code.
  FileUtils.cp_r 'src/.', 'dest'     # cp_r('src', 'dest') makes src/dest,
                                     # but this doesn't.

[edit] How do you move everything from one directory into another?

mv dir1/. dir2 doesn't work the same way cp -R dir1/. dir2 does:

> mkdir dir1; touch dir1/file1; mkdir dir2; touch dir2/file2
> mv dir1/. dir2
mv: cannot move `dir1/.' to `dir2/.': Device or resource busy

mv dir1/* works, but it doesn't include hidden files:

> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3
> mv dir1/* dir2
> ls -a dir2
.  ..  file1  file3

This sort of works:

> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3
> mv dir1/* dir2; mv dir1/.* dir2
mv: cannot move `dir1/.' to `dir2/.': Device or resource busy
mv: cannot remove `dir1/..': Is a directory
> ls -a dir2
.  ..  file1  file3

Perhaps the best solution: cp, then rm

> mkdir dir1; touch dir1/file1; touch dir1/.file2; mkdir dir2; touch dir2/file3
> cp -R dir1/. dir2; rm -r dir1
> ls -a dir2
.  ..  file1  .file2  file3

[edit] Creating directories

Create all components of a specified path:

> mkdir -p ~/path/to/somewhere
or
> install -d ~/path/to/somewhere

> ls  ~/path/to/somewhere

[edit] Checking space usage/availability

[edit] How big is this directory? du

$ du -sh
660M    .
$ du -sh /home/*
5.5G    /home/tyler
306M    /home/harry
775M    /home/jane

Which folders are the biggest?

$ du --summarize --megabytes * | sort --numeric-sort --reverse | head

737     dev
729     Xsvndumps
298     svn
194     Tyler Rick.zip
158     dump
82      temp
62      public_html

[edit] How much space is available on my disks? df

$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2               252240    135968    116272  54% /
/dev/sda3               501244    118712    382532  24% /root
/dev/sda5              3906996   3738920    168076  96% /usr
/dev/sda10            19538240  16768916   2769324  86% /home
/dev/sda7               501216     23628    477588   5% /tmp
/dev/sda6               983164    392932    590232  40% /var
/dev/sda8              2935060   1421504   1513556  49% /var/log
/dev/sda11             1481180     20320   1460860   2% /var/tmp
/dev/md0              35278456  33480668      5744 100% /home/userdata
none                    452396         0    452396   0% /dev/shm

$ df --megabytes

[edit] Working with .tgz files

Un-tar-zip something:

tar -xzvf filename.tar.gz


[edit] Mounts / mounting / unmounting / file system table

[edit] fstab

http://en.wikipedia.org/wiki/Fstab

http://www.tuxfiles.org/linuxhelp/fstab.html

[edit] Problem: Nautilus complains that the file is an "executable text file".

2008-01-15 18:29

Credits: https://bugs.launchpad.net/ubuntu/+source/partman-basicfilesystems/+bug/78505

It gives you these 4 options: Run in Terminal, Display, Cancel, Run

One wonders why it thinks it is "executable". By all appearances, it's just a normal text file, not a script or anything fancy. The answer is that when mounting the NTFS drive, it used umask=007, which set all files to have the x (executable) permission bit set.

The desired behavior is to just have it Display the file in your associated text editor. Here's how:

I changed my /etc/fstab from this:

to this:

# <file system>                             <mount point>   <type>  <options>                           <dump>  <pass>
UUID=C20C380D0C37FB4D                       /media/sdb6     ntfs    defaults,umask=007,gid=46           0       1   
# <file system>                             <mount point>   <type>  <options>                           <dump>  <pass>
UUID=C20C380D0C37FB4D                       /media/sdb6     ntfs    defaults,dmask=007,fmask=117,gid=46 0       1

And then remounted it:

$ sudo umount /media/sdb6
$ sudo mount -a

[edit] Problem: Can't unmount: what to do if you get "device is busy" errors

2008-01-15 18:29

http://lkml.org/lkml/2006/1/12/450. Retrieved on 2007-05-11 11:18.

fuser -m /mnt/data will list the process ID of any process using the mount.

If your feeling brave, fuser -mk will just kill them.

$ fuser -m /media/sdb6
/media/sdb6:          9901

$ ps 9901
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
tyler     9901  0.0  3.2 287408 67768 ?        Sl   00:10   0:07 evince file:///home/tyler/Data/Books/....pdf

http://wiki.linuxquestions.org/wiki/Umount

http://www.idevelopment.info/data/Unix/General_UNIX/GENERAL_Troubleshootingthedeviceisbusy.shtml

Personal tools