A Unix Utility You Should Know About: Pipe Viewer

Hi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about. The articles will discuss one Unix program at a time. I'll try to write a good introduction to the tool and give as many examples
as I can think of.

Before I start, I want to clarify one thing - Why am I starting so many article series? The answer is that I want to write about many topics simultaneously and switch between them as I feel inspired.

The first post in this series is going to be about not so well known Unix program called
Pipe Viewer or pv for short. Pipe viewer is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how
quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Update: French translation available.

Pipe viewer is written by Andrew Wood, an experienced Unix sysadmin. The homepage of pv utility is here:
pv utility.

If you feel like you are interested in this stuff, I suggest that you subscribe to
my rss feed to receive my future posts automatically.

How to use pv?

Ok, let's start with some really easy examples and progress to more complicated ones.

Suppose that you had a file "access.log" that is a few gigabytes in size and contains web logs. You want to compress it into a smaller file, let's say a gunzip archive (.gz). The obvious way would be to do:

$ gzip -c access.log > access.log.gz

As the file is so huge (several gigabytes), you have no idea how long to wait. Will it finish soon? Or will it take another 30 mins?

By using pv you can precisely time how long it will take. Take a look at doing the same through pv:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=>      ] 15% ETA 0:00:59

Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

You may stick several pv processes in between. For example, you can time how fast the data is being read from the disk and how much data is gzip outputting:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source:  760MB 0:00:15 [37.4MB/s] [=>     ] 19% ETA 0:01:02
  gzip: 34.5MB 0:00:15 [1.74MB/s] [  <=>  ]

Here we specified the "-N" parameter to pv to create a named stream. The "-c" parameter makes sure the output is not garbaged by one pv process writing over the other.

This example shows that "access.log" file is being read at a speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!

Notice how the gzip does not include how much data is left or how fast it will finish. It's because the pv process after gzip has no idea how much data gzip will produce (it's just outputting compressed data from input stream). The first pv process, however,
knows how much data is left, because it's reading it.

Another similar example would be to pack the whole directory of files into a compressed tarball:

$ tar -czf - . | pv > out.tgz
 117MB 0:00:55 [2.7MB/s] [>         ]

In this example pv shows just the output rate of "tar -czf" command. Not very interesting and it does not provide information about how much data is left. We need to provide the total size of data we are tarring to pv, it's done this way:

$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
 253MB 0:00:05 [46.7MB/s] [>     ]  1% ETA 0:04:49

What happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk '{print $1}'" returns number
of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs
to finish.

Another fine example is copying large amounts of data over network by using help of "nc" utility that I will write about some other time.

Suppose you have two computers A and B. You want to transfer a directory from A to B very quickly. The fastest way is to use tar and nc, and time the operation with pv.

# on computer A, with IP address 192.168.1.100
$ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
# on computer B
$ nc 192.168.1.100 6666 | pv | tar -xf -

That's it. All the files in /path/to/dir on computer A will get transferred to computer B, and you'll be able to see how fast the operation is going.

If you want the progress bar, you have to do the "pv -s $(...)" trick from the previous example (only on computer A).

Another funny example is by my blog reader alexandru. He shows how to time how fast the computer reads from /dev/zero:

$ pv /dev/zero > /dev/null
 157GB 0:00:38 [4,17GB/s]

That's about it. I hope you enjoyed my examples and learned something new. I love explaining things and teaching! :)

How to install pv?

If you're on Debian or Debian based system such as Ubuntu do the following:

$ sudo aptitude install pv

If you're on Fedora or Fedora based system such as CentOS do:

$ sudo yum install pv

If you're on Slackware, go to pv homepage, download the pv-version.tar.gz archive and do:

$ tar -zxf pv-version.tar.gz
$ cd pv-version
$ ./configure && sudo make install

If you're a Mac user:

$ sudo port install pv

If you're OpenSolaris user:

$ pfexec pkg install pv

If you're a Windows user on Cygwin:

$ ./configure
$ export DESTDIR=/cygdrive/c/cygwin
$ make
$ make install

The manual of the utility can be found here man pv.

Have fun measuring your pipes with pv, and until next time!

A question to my readers: what other not so well known Unix utilities do you use and/or know about?

时间: 2024-09-24 11:16:19

A Unix Utility You Should Know About: Pipe Viewer的相关文章

A Unix Utility You Should Know About: Netcat

This is the second post in the article series about Unix utilities that you should know about. In this post I will introduce you to the netcat tool or simply nc. Netcat is often referred to as a "Swiss Army knife" utility, and for a good reason.

A Unix Utility You Should Know About: lsof

This is the third post in the article series about Unix and Linux utilities that you should know about. In this post I will take you through the usefullsof tool. If netcat was called the Swiss Army Knife of Network Connections, then I'd call lsof the

Android源代码结构分析

Google提供的Android包含了:Android源代码,工具链,基础C库,仿真环境,开发环境等,完整的一套. 第一级别的目录和文件如下所示: [cpp] view plaincopy ----------------   ├── Makefile            全局的Makefile   ├── build               系统编译规则和配置所需要的脚本和工具   ----------------   ├── prebuilt        各种平台编译工具链   ├─

Android源码中的目录结构详解_Android

Android 2.1 |-- Makefile |-- bionic                        (bionic C库) |-- bootable                (启动引导相关代码) |-- build                        (存放系统编译规则及generic等基础开发包配置) |-- cts                        (Android兼容性测试套件标准) |-- dalvik                    

【OS】OSWbb(OSWatcher Black Box)的简介和使用

[OS]OSWbb(OSWatcher Black Box)的简介和使用 OSWatcher Black Box, 简称OSW,是Oracle提供的一个小但是非常有用的工具,它通过调用OS自己提供的命令来记录OS运行时的一些性能参数,比如CPU/Memory/Swap/Network IO/Disk IO相关的信息. +++ 为什么一定要部署OSW? OSW并不是强制要部署的,并且有很多工具可以提供一样的功能,比如说mrtg, cacti, sar, nmon, enterprise mange

使用 XZ Utils 获得更高的压缩率

关于 XZ Utils XZ Utils 是为 POSIX 平台开发具有高压缩率的工具.它使用 LZMA2 压缩算法,生成的压缩文件比 POSIX 平台传统使用的 gzip.bzip2 生成的压缩文件更小,而且解压缩速度也很快.最初 XZ Utils 的是基于 LZMA-SDK 开发,但是 LZMA-SDK 包含了一些 WINDOWS 平台的特性,所以 XZ Utils 为以适应 POSIX 平台作了大幅的修改.XZ Utils 的出现也是为了取代 POSIX 系统中旧的 LZMA Utils.

ORACLE 面试问题-技术篇(2)

oracle|问题                    ORACLE 面试问题-技术篇(2) 21. 如何判断数据库的时区?解答:SELECT DBTIMEZONE FROM DUAL; 22. 解释GLOBAL_NAMES设为TRUE的用途解答:GLOBAL_NAMES指明联接数据库的方式.如果这个参数设置为TRUE,在建立数据库链接时就必须用相同的名字连结远程数据库 23.如何加密PL/SQL程序?解答:WRAP 24. 解释FUNCTION,PROCEDURE和PACKAGE区别解答:f

Linux系统下pv命令的一些使用技巧小结

  如果你是一个 linux 系统管理员,那么毫无疑问你必须花费大量的工作时间在命令行上:安装和卸载软件,监视系统状态,复制.移动.删除文件,查错,等等.很多时候都是你输入一个命令,然后等待很长时间直到执行完成.也有的时候你执行的命令挂起了,而你只能猜测命令执行的实际情况. 通常 linux 命令不提供和进度相关的信息,而这些信息特别重要,尤其当你只有有限的时间时.然而这并不意味着你是无助的--现在有一个命令,pv,它会显示当前在命令行执行的命令的进度信息.在本文我们会讨论它并用几个简单的例子说

如何使用pv命令监控Linux命令的执行进度

如果你是一个 Linux 系统管理员,那么毫无疑问你必须花费大量的工作时间在命令行上:安装和卸载软件,监视系统状态,复制.移动.删除文件,查错等等.很多时候都是你输入一个命令,然后等待很长时间直到执行完成.也有的时候你执行的命令挂起了,而你只能猜测命令执行的实际情况. 通常 linux 命令不提供和进度相关的信息,而这些信息特别重要,尤其当你只有有限的时间时.然而这并不意味着你是无助的--现在有一个命令,pv,它会显示当前在命令行执行的命令的进度信息.在本文我们会讨论它并用几个简单的例子说明其特