Ls Command (Linux)

Options which I find most useful, and use often.

[1] -l : long format, displaying Unix file types, permissions, number of hard links, owner, group, size, last-modified date and filename

[2] -a : lists all files in the given directory, including those whose names start with “.” (which are hidden files in Unix). By default, these files are excluded from the list.

[3] -R : recursively lists subdirectories. The command ls -R / would therefore list all files

[4] -h : print sizes in human readable format. (e.g., 1K, 234M, 2G, etc.) This option is not part of the POSIX standard, although implemented in several systems, e.g., GNU coreutils in 1997,[1] FreeBSD 4.5 in 2002,[2] and Solaris 9 in 2002.[3]

 

Find Command (Linux)

I often use the Find command to look for filenames / directories  which match some substring.

Note: Unlike grep, Find only looks at filenames. It doesn’t look at the contents of the files themselves.

Here’s a nice little use of find for getting a  breakdown of how many files are in each dir under your current dir:

 

Tips:

[1]  Find can be quite tricky when it comes to looking at substrings. Unlike grep, it doesnt look at substrings  unless it is specified precisely using wildcards.

e.g.  There is a difference between the two:

  • find . -name ‘my*’
    • This searches in the current directory (represented by the dot character) and below it, for files and directories with names starting with my
  • find . -name ‘*my*’
    • This searches in the current directory (represented by the dot character) and below it, for files and directories with names containing substring my.

[2]  Find can be especially useful with the -exec option.  Another possibility is to use xargs with find.

One usage I have found is really helpful is when Find and Copy Certain Type Of Files From One Directory To Another.

  • This link has more pointers
    • e.g.

find -iname “*.dll” -exec cp {} dlls_master_02272019/ \;

  • Apparently its extremely risky to use either of these two  with the find command   .  Read more here.
  • Almost always there will be other ways to achieve the same thing. Do explore those before using  the -exec option  or before you use it with the xargs  command.

[3]  ignore case. By default find is case sensitive. However, recent versions of GNU find have an -iname flag, for case-insensitive name searches

$ find . -iname ‘udo‘ -type d
./Scope.UDOs
./Scope.UDOs/UDO_Training

 

Examples:

[1] From current directory

find . -name ‘my*’

This searches in the current directory (represented by the dot character) and below it, for files and directories with names starting with my. The quotes avoid the shell expansion — without them the shell would replace my* with the list of files whose names begin with my in the current directory. In newer versions of the program, the directory may be omitted, and it will imply the current directory.

 

[2] Files only

find . -name ‘my*’ -type f

This limits the results of the above search to only regular files, therefore excluding directories, special files, pipes, symbolic links, etc. my* is enclosed in single quotes (apostrophes) as otherwise the shell would replace it with the list of files in the current directory starting with my……

 

 

[3] Search several directories

find local /tmp -name mydir -type d -print

This searches for directories named mydir in the local subdirectory of the current working directory and the /tmp directory.

 

References:

 

 

 

 

 

 

Grep Command (Linux)

Pro-Tip:

The other day, i needed to look for a string recursively in my directory structure – but only in in files with .py extension. (because there were other subfolders containing images etc, which i wanted to ignore).

  • It seems grep has an option for that : –include
  •  https://stackoverflow.com/questions/12516937/grep-but-only-certain-file-extensions

Options I find useful:

[1]  -r : Read all files under each directory, recursively, following symbolic links only if they are on the command line

[2] -i : Ignore case distinctions in both the PATTERN and the input files

[3] -n : Prefix each line of output with the 1-based line number within its input file

[4] -a : Process a binary file as if it were text

[5] -l : Suppress normal output; instead print the name of each input file from which output would normally have been printed.

[6] -h :  Suppress the prefixing of file names on output.  This is the default when there is only one file (or only standard input) to search

Example using a zip directory:

$ unzip -l _RawData.zip

$ unzip _RawData.zip -d _RawData

$ grep -rinal India _RawData

$grep India _RawData/900.csv _RawData/924.csv  Note: am searching for word containing substring India in 2 files

Tips :

[1]   For BSD or GNU grep you can use -B num to set how many lines before the match and -A num for the number of lines after the match.

  • grep -B 3 -A 2 foo README.txt

If you want the same number of lines before and after you can use -C num.

  • grep -C 3 foo README.txt

This will show 3 lines before and 3 lines after.

[2]    egrep

 

Reference:

 

Zip / Unzip / Tar Command (Linux)

I often end up getting zipped files which I supposed to crack open and look into.  Here are some tips working with compression in Linux.

ZIP:

[1]   Use -l option to see the files it contains.

$ unzip -l _RawData.zip

[2]  -d  to unzip into a new directory.

$ unzip _RawData.zip -d _RawData

To zip a directory recursively, use the -r option.

[3] zip -r _NewData.zip _RawData

Note: looking at the sized, I found zip does quite a good job.

$ ls -lh _NewData.zip
-rwxrwx—+ 1 agoswami Domain Users 109M Dec 16 13:10 _NewData.zip

$ ls -lh _RawData
total 1.2G

[4]  zipping up all folders containing word “azure” and their contents

zip -r azpfg.zip *azure*

To zip urllib along with all azure folders, I do :

agoswami@agoswami-msft2 /cygdrive/c/Users/agoswami/AppData/Local/Continuum/Anaconda/Lib/site-packages
$ zip -r azurllib3pkg.zip *azure* urllib3

 

TAR.GZ 

Note : Because of various reasons some of which hearken back to the era of tape drives, Unix uses a program named tar to archive data, which can then be compressed with a compression program like gzipbzip27zip, etc

In order to “zip” a directory, the correct command would be

$ tar -zcvf _rawdata.tar.gz _RawData

$ ls -lh

-rwxrwxr–+ 1 agoswami Domain Users 109M Dec 16 15:43 _rawdata.tar.gz

This will tell tar to c (create) an archive from the files in directory (tar is recursive by default), compress it using the z (gzip) algorithm, store the output as a f (file) named archive.tar.gz, and v(verbosely) list all the files it adds to the archive.

To decompress and unpack the archive into the current directory you would use

tar -zxvf archive.tar.gz

 

GZ

gzip is a utility to compress / decompress individual files

$ ls -lh latency.txt
-rwxrwx—+ 1 agoswami Domain Users 3.0K Nov 22 20:07 latency.txt

$ ls -lh latency.txt.gz
-rwxrwx—+ 1 agoswami Domain Users 914 Nov 22 20:07 latency.txt.gz

to decompress : -d

$ gzip -d latency.txt.gz

to keep original file : -k

$ gzip -d -k rcv1.test.raw.txt.gz