If I have a list of filenames (with paths), what's the best way to work out what package most likely provided them?
This will be offline, so I can't use dpkg -S etc.
The original O/S is Raspbian if that helps.
(Background: I'm trying to look at SD card backups and work out what packages were installed without booting them.)
dpkg -S doesn't use the network.
S
On 05/07/2019, Mark Rogers mark@more-solutions.co.uk wrote:
If I have a list of filenames (with paths), what's the best way to work out what package most likely provided them?
This will be offline, so I can't use dpkg -S etc.
The original O/S is Raspbian if that helps.
(Background: I'm trying to look at SD card backups and work out what packages were installed without booting them.) -- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!
On Fri, 5 Jul 2019 at 14:18, Steve Mynott steve.mynott@gmail.com wrote:
dpkg -S doesn't use the network.
Sorry, my question was badly worded.
By offline I mean that I won't have the system booted, but will be looking at a backup of the files on a different computer.
(Eg: Remove SD card from Raspberry Pi, insert into my desktop (Ubuntu), and look at the files on the SD card to determine what packages were installed.)
The actual use case is to compare tarballs taken from multiple Pis to see what changes between archives. On a complete OS backup a lot of files will have changed, but the vast majority of them are OS files that can be accounted for by a handful of packages. So for example I don't want to know that one backup has /usr/bin/411toppm, /usr/bin/anytopnm, /usr/bin/asciitopgm, etc in it but the other doesn't, I want to say that the first backup had netpbm installed bu the second didn't. (netpbm is just one example: its .deb alone comprises 486 files, so installing it one unit creates 486 file differences that can be accounted for by that one package install; scale that up to an entire O/S...)
Since I've expanded on this I'll add: ideally I want to do as much as possible from just the file listing that I get from tar -jt to avoid extracting multi-GB files to do the comparison. Hence specifically wanting to map "/usr/bin/411toppm" to "netpbm" by the filename/path alone.
Ah that makes sense.
I did a simple "strace kg -S /usr/bin/411toppm|grep open" and looked at some of the paths and came up with
basename `grep -rl "^/usr/bin/411toppm$" /var/lib/dpkg/info` .list
I've tended to use rsync -n to compare directories of files and a scripting language (or go) is probably a better tool than shell.
S
On Sat, 6 Jul 2019 at 11:26, Mark Rogers mark@more-solutions.co.uk wrote:
On Fri, 5 Jul 2019 at 14:18, Steve Mynott steve.mynott@gmail.com wrote:
dpkg -S doesn't use the network.
Sorry, my question was badly worded.
By offline I mean that I won't have the system booted, but will be looking at a backup of the files on a different computer.
(Eg: Remove SD card from Raspberry Pi, insert into my desktop (Ubuntu), and look at the files on the SD card to determine what packages were installed.)
The actual use case is to compare tarballs taken from multiple Pis to see what changes between archives. On a complete OS backup a lot of files will have changed, but the vast majority of them are OS files that can be accounted for by a handful of packages. So for example I don't want to know that one backup has /usr/bin/411toppm, /usr/bin/anytopnm, /usr/bin/asciitopgm, etc in it but the other doesn't, I want to say that the first backup had netpbm installed bu the second didn't. (netpbm is just one example: its .deb alone comprises 486 files, so installing it one unit creates 486 file differences that can be accounted for by that one package install; scale that up to an entire O/S...)
Since I've expanded on this I'll add: ideally I want to do as much as possible from just the file listing that I get from tar -jt to avoid extracting multi-GB files to do the comparison. Hence specifically wanting to map "/usr/bin/411toppm" to "netpbm" by the filename/path alone.
-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!
strace dpkg -S I mean
On Sat, 6 Jul 2019 at 12:32, Steve Mynott steve.mynott@gmail.com wrote:
Ah that makes sense.
I did a simple "strace kg -S /usr/bin/411toppm|grep open" and looked at some of the paths and came up with
basename `grep -rl "^/usr/bin/411toppm$" /var/lib/dpkg/info` .list
I've tended to use rsync -n to compare directories of files and a scripting language (or go) is probably a better tool than shell.
S
On Sat, 6 Jul 2019 at 11:26, Mark Rogers mark@more-solutions.co.uk wrote:
On Fri, 5 Jul 2019 at 14:18, Steve Mynott steve.mynott@gmail.com wrote:
dpkg -S doesn't use the network.
Sorry, my question was badly worded.
By offline I mean that I won't have the system booted, but will be looking at a backup of the files on a different computer.
(Eg: Remove SD card from Raspberry Pi, insert into my desktop (Ubuntu), and look at the files on the SD card to determine what packages were installed.)
The actual use case is to compare tarballs taken from multiple Pis to see what changes between archives. On a complete OS backup a lot of files will have changed, but the vast majority of them are OS files that can be accounted for by a handful of packages. So for example I don't want to know that one backup has /usr/bin/411toppm, /usr/bin/anytopnm, /usr/bin/asciitopgm, etc in it but the other doesn't, I want to say that the first backup had netpbm installed bu the second didn't. (netpbm is just one example: its .deb alone comprises 486 files, so installing it one unit creates 486 file differences that can be accounted for by that one package install; scale that up to an entire O/S...)
Since I've expanded on this I'll add: ideally I want to do as much as possible from just the file listing that I get from tar -jt to avoid extracting multi-GB files to do the comparison. Hence specifically wanting to map "/usr/bin/411toppm" to "netpbm" by the filename/path alone.
-- Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
main@lists.alug.org.uk http://www.alug.org.uk/ https://lists.alug.org.uk/mailman/listinfo/main Unsubscribe? See message headers or the web site above!
-- Steve Mynott steve.mynott@gmail.com cv25519/ECF8B611205B447E091246AF959E3D6197190DD5
On Sat, 6 Jul 2019 at 12:32, Steve Mynott steve.mynott@gmail.com wrote:
I did a simple "strace dpkg -S /usr/bin/411toppm|grep open" and looked at some of the paths and came up with
basename `grep -rl "^/usr/bin/411toppm$" /var/lib/dpkg/info` .list
Interesting, thanks. If I understand correctly this would only work on the live system (or at least a similar system with the same packages installed) but I could, at a push, look at the files in the var/lib/dpkg/info directory in the tarball. Whilst that would be slow if done for each file separately, I could just extract and parse all the .list files in that directory (of the tarball) up front which would be pretty workable.
I've tended to use rsync -n to compare directories of files and a scripting language (or go) is probably a better tool than shell.
I'm working with a python script that can directly parse the compressed tarball (I found a sample script which could calculate checksums of all files in a tarball pretty quickly which I'm using as a base, because "tar -jtv" doesn't give enough information to detect if a file has changed), so my intention is to build on that. Parsing the .list files on the fly should be an easy add.
And in fact, I can compare just the list of .list files in that directory to get a diff of installed packages, so that's a good start. (But I do need to then exclude the files from each package from further comparisons, so I'd still need to parse them.)
[Actually on a quick dig, the .md5sums files are probably more useful than the .list files for my purposes.]
Thanks for that starting point.
GNU tar does have direct support for incremental backups see
https://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html
But it's probably *much* easier and more reliable to use an existing system like restic
which uses better archives than tar, handles deltas very fast and supports encryption for offsite backups.
Systems like restic and rsync are likely to be much faster than a homebrew system (although these are fun to write).
S
On Sat, 6 Jul 2019 at 20:24, Mark Rogers mark@more-solutions.co.uk wrote:
On Sat, 6 Jul 2019 at 12:32, Steve Mynott steve.mynott@gmail.com wrote:
I did a simple "strace dpkg -S /usr/bin/411toppm|grep open" and looked
at some of the paths and came up with
basename `grep -rl "^/usr/bin/411toppm$" /var/lib/dpkg/info` .list
Interesting, thanks. If I understand correctly this would only work on the live system (or at least a similar system with the same packages installed) but I could, at a push, look at the files in the var/lib/dpkg/info directory in the tarball. Whilst that would be slow if done for each file separately, I could just extract and parse all the .list files in that directory (of the tarball) up front which would be pretty workable.
I've tended to use rsync -n to compare directories of files and a
scripting language (or go) is probably a better tool than shell.
I'm working with a python script that can directly parse the compressed tarball (I found a sample script which could calculate checksums of all files in a tarball pretty quickly which I'm using as a base, because "tar -jtv" doesn't give enough information to detect if a file has changed), so my intention is to build on that. Parsing the .list files on the fly should be an easy add.
And in fact, I can compare just the list of .list files in that directory to get a diff of installed packages, so that's a good start. (But I do need to then exclude the files from each package from further comparisons, so I'd still need to parse them.)
[Actually on a quick dig, the .md5sums files are probably more useful than the .list files for my purposes.]
Thanks for that starting point.
Mark Rogers // More Solutions Ltd (Peterborough Office) // 0844 251 1450 Registered in England (0456 0902) 21 Drakes Mews, Milton Keynes, MK8 0ER
On Sat, 6 Jul 2019 at 22:07, Steve Mynott steve.mynott@gmail.com wrote:
GNU tar does have direct support for incremental backups see
These aren't backups as such, but snapshots of different systems (all of which based on a Pi). You could think of them as different branches from a version control point of view, but encompassing the entire O/S. (I have scripts which backup and restore SD cards as tarballs instead of disk images, because they result in much smaller and more flexible files - search this mailing list for "rpi-backup" and you'll see my starting point from a couple of years ago). So a raw Raspbian install will have several tweaks and installs for one project, and a new clean install several different tweaks for a different project. Sometimes it is useful to compare them - eg a problem I solved on one now needs solving on another - or I just want to recreate a project on a newer Raspbian version.
Also worth pointing out that visibility is the key - just because I made a change doesn't mean that with hindsight it was necessary, and over time I'll end up with packages installed that were never used. Knowing what they were makes it easier to make decisions about whether they should be installed or removed.
But it's probably *much* easier and more reliable to use an existing system like restic
I don't *think* restic will help me here but I will have a look as it would be a useful tool regardless.
Systems like restic and rsync are likely to be much faster than a homebrew system (although these are fun to write).
Agreed, and I won't deny that the latter is part of this...
On 05/07/2019 13:59, Mark Rogers wrote:
(Background: I'm trying to look at SD card backups and work out what packages were installed without booting them.)
I don't understand the underlying reason why you're trying to do what you're trying to do. What's the top-level reason for wanting to do this - i.e. what's your initial purpose?
On Sat, 6 Jul 2019 at 15:13, steve-ALUG@hst.me.uk wrote:
I don't understand the underlying reason why you're trying to do what you're trying to do. What's the top-level reason for wanting to do this
- i.e. what's your initial purpose?
OK, fair question; I have explained some of it above but I'll go into more depth.
I do quite a lot of work on Raspberry Pi's. I back them up periodically as a tarball of the whole filesystem because it's the entire system I'm interested in, not just one or two files.
The result is dozens of tarballs and no easy way to get a simple view of what has changed between them. Changes fall into several categories: - Mundane updates (apt upgrade etc) - lots of files change but nothing of substance - Package installs (apt install xxx) - lots of files get added but can be summarised by the list of packages installed - Configuration file changes - Additions/Changes to my own scripts
Since what I am working with are tarballs, the easiest starting point is the file list; it's trivial to extract the file listing from a tarball and make comparisons between two of them, but when thousands of files can be changed with only a handful of actual changes this is really an exercise in seeing the woods for the trees.
Or consider this example: - I have a Pi SD card based on Raspbian Jessie and several months of tweaks and updates (including package additions and removals) - I want to summarise the changes I've made to I can re-run them on a clean install (or on Raspbian Buster), using the file listings from a clean install and my working copy as the starting point
Hope that's clearer. Bottom line: What I have is dozens of full OS tarballs that I need to compare, and "installed packages X, Y & Z" is far more useful to me than "added <list of hundreds of new files>".
On 06/07/2019 20:03, Mark Rogers wrote:
On Sat, 6 Jul 2019 at 15:13, steve-ALUG@hst.me.uk wrote:
I don't understand the underlying reason why you're trying to do what you're trying to do. What's the top-level reason for wanting to do this
- i.e. what's your initial purpose?
OK, fair question; I have explained some of it above but I'll go into more depth.
I do quite a lot of work on Raspberry Pi's. I back them up periodically as a tarball of the whole filesystem because it's the entire system I'm interested in, not just one or two files.
The result is dozens of tarballs and no easy way to get a simple view of what has changed between them. Changes fall into several categories:
- Mundane updates (apt upgrade etc) - lots of files change but nothing
of substance
- Package installs (apt install xxx) - lots of files get added but can
be summarised by the list of packages installed
- Configuration file changes
- Additions/Changes to my own scripts
Since what I am working with are tarballs, the easiest starting point is the file list; it's trivial to extract the file listing from a tarball and make comparisons between two of them, but when thousands of files can be changed with only a handful of actual changes this is really an exercise in seeing the woods for the trees.
Or consider this example:
- I have a Pi SD card based on Raspbian Jessie and several months of
tweaks and updates (including package additions and removals)
- I want to summarise the changes I've made to I can re-run them on a
clean install (or on Raspbian Buster), using the file listings from a clean install and my working copy as the starting point
Hope that's clearer. Bottom line: What I have is dozens of full OS tarballs that I need to compare, and "installed packages X, Y & Z" is far more useful to me than "added <list of hundreds of new files>".
OK that helps. I don't think it helps the current situation, but I think most people doing something like this would host a git repository to store their scripts & track changes to them. The same idea can be used to cope with changes to configurations for files in /etc/. I'd think it would be easier to keep a manual list of top level packages added/removed using apt-get, than to try and work it out backwards from a file list. I think there are also packages that keep track of installs & config changes so that you can deploy them on other systems, multiple times, but I can't remember the names.
None of that helps with where you are now though!
Searching for "apt installed packages vs dependant packages" provides lots of info, but it pretty much all relies on being run on a live system rather than files in a tarballs.
I did find this though that *may* be helpful.
https://unix.stackexchange.com/questions/381395/how-to-find-which-package-re...
"apt doesn't remember which reverse dependency caused it to install rsync, but it does log all its actions in /var/log/apt, so you might find the dependency there:
zgrep rsync /var/log/apt/history.log*
Look for a line saying that rsync was installed automatically; one of the non-automatic packages there should be the source of the installation."
The pertinent info "apt ... but it does log all its actions in /var/log/apt"
Looking, there's files history.log, history.log.?.gz
and term.log, term.log.?.gz and maybe a few others. History seems to be the top level commands, eg apt-get upgrade, or apt-get install wibble. Term seems to contain the output of the above commands.
<Grandmother suck eggs mode> view .log files using cat history.log
view log.?.gz files using zcat history.log.1.gz
similarly, you can use zgrep to grep a .gz file </Grandmother suck eggs mode>
That may help a bit, but it depends on what period the log-rotated apt log files you have.
It seems to me it would be easier to work out using apt/aptitude etc on a live system - (see the search I mentioned earlier - lots and lots of lovely links, methods etc) It seems to me it would also be easier to keep a manual list of top-level changes.
Anyway, I hope this helps.
Steve