Putting the past on disk
It was one of those nights, music blurring into noise and boredom kicks in. Random memories flood; in need of a distraction. One of the things Iāve regretted is throwing data away. Not any garbage files, but data with sentimental values.
Growing up with a computer from a young age, most of my fond memories are encoded in bits.
Being always on the lookout for a clean break. I often just deleted off my history, internet and physical. especially during school days.

Scanner Collage of randomĀ things
Back to that night, I wanted to find myself on the old-web. With careful google advance search queries with specific references, I managed to dig some of the old blogs. Lightheaded I scrolled through these blogs trying to save everything I can. Memories left recessed within, resurfaced with an old but familiar bad taste in my mouth.
Dusty photoĀ albums
Another day. With the work from home grind during the COVID-19 pandemic, I step out of my room for a breather. I see outside, mum is rummaging through a box of old photo albums. I join her for a little roller-coster ride of the days past.
I havenāt seen most of these photos. Many of them are of me doing various things with some occasional nudes sprinkled in(why?). With my recent encounter with my teenage blogs and the strong feeling of āneed-for-archivingā, I ask my mum to take the photos out of the albums to get scanned.
Physical hard-copies are the best form for these pictures to survive, but remember āBackups are importantā!
The scanningĀ project
I return to my room to pick work back up, But I was giddy to get this project started. During breaks between work calls, Googling was done; on to figure the best way to get these photos down to disk. I set a mental mission for this project,
Get all the photos backed up as easily as possible at the best quality. Make them easily accessible.
With that in mind, I first needed a Scanner. I donāt have a scanner myself but my uncle has many of them. Heās the sort of cool uncle who migrated to digital early on. He used to digitize a lot of images, but the backup mediums were poor, He tired.
The Scanner
I waited till my uncle gets home from work. Bit of small talk and I asked for a scanner. Target acquired! The Canon CanoScan LIDE 100 is mine now! Itās a portable flatbed scanner from 2004. Sleek and bus-powered through USB. My iMac picked it up with zero monkeying around. Moment of appreciation for plug-and-play.

The Canon CanoScan LIDE 100 with the rest of myĀ setup
Itās has a TWAIN-compatible driver, so now I need SANE. If youāre scratching your head right about now and ready to āSearch Google forā now please refrain.
- TWAIN is an API for communication between software and digital imaging cameras or scanners.
- SANE stands for āScanner Access Now Easyā what a lovely acronym! Itās a standardised interface to access image scanners.
āWell Kaveen, why use SANE when you have TWAIN?ā The SANE project home-page explains it best
āIn summary, if TWAIN had been just a little better designed, there would have been no reason for SANE to exist, but things being the way they are, TWAIN simply isnāt SANE.ā
Tech tangent aside, with SANE I can quickly get a TIFF out of the scanner without having to cry interacting with GUIs.
1
2
3
4
scanimage -p \\
--mode color \\
--format tiff \\
--resolution 600 > \`date +%s\`.tiff
Some might say that is INSANE. The above configuration is to scan at 600 DPI in colour.
Scanning TheĀ Photos
Moving along. Armed with SANE now, I must decide how I can get the most of my time. I have a couple of options regarding how to take on the scanning.
ā ļø Important note: I donāt want to do the manual labour of cropping pictures, Iāll waste my time automating it instead.
- Scan a single image at a timeāāāToo slow but I can quickly crop using a set coordinates.
- Setup a scan layout with multiple imagesāāāFast and I can establish multiple crop areas using coordinates, but canāt accommodate different photo sizes.
- auto-mask and identify multiple imagesāāāAUTO-MAGICK.

The Scanning Layout Selected, Photos are aligned with the edges of the scanner. but precision is not required as auto-masking will beĀ used.
I went the auto-magick route, thatās no misspelling but letās revisit that part later in this account.
I created myself a little bash script to keep continually scanning. It will simply let me scan images and save each of the TIFF files with the current timestamp as the name.
1
2
3
4
5
6
7
8
9
10
11
12
while true; do
scanimage -p \\
--mode color \\
--format tiff \\
--resolution 600 > \`date +%s\`.tiff
echo "Press Any Key but ESC to continue scanning"
read -r -n1 key
if \[\[ $key == $'\\e' \]\]; then
break;
fi
done
Splitting The Scanned TIFFĀ files
āImage Magickā thatās what saved this project. Itās mind-blowing to know that Image Magick project had its initial release almost 30 years ago!
During my initial research for this project, I came across this script called āmulticropā by Fred Weinhaus. He also has some amazing scripts utilising IM(Image Magick). With multicrop, itās just,
1
2
3
4
5
6
./multicrop -f 20 \\
-u 1 \\
-d 300 \\
-p 10 \\
-b '#e9ecea' \\
in.tiff out.tiff
Just by running this bash script it automatically masks out the blank spaces and spits out multiple output tiff files! Multicrop was a godsend, but
ā¹ļø It didnāt work out of the box under my circumstance. The results were all over the place.
Why wasnāt multicrop working forĀ me?
After scratching my head for some time, It was the scans! My scanner doesnāt spit out a uniform background. The backing of the scanner has deformed in the corners which appear on the scans; to the script, these look like part of an image.
How did I catch this? The multicrop script has `-m` switch that allows you to save the mask for debugging purposes.
1
2
3
4
5
6
7
./multicrop -f 20 \\
-u 1 \\
-d 300 \\
-p 10 \\
-b '#e9ecea' \\
-m save \\
in.tiff out.tiff

The left image shows the raw mask of the scanner and the right is after being trimmed viaĀ IM.
This should be a fairly trivial fix as I can lose some of the pixels on the edges and get rid of the problem. IM back to the rescue!
1
2
3
magick convert in.tiff \\
-shave 15x45 \\
in\_shaved.tiff
As you can see the result is apparent, The script is now able to separate scans with a negligible amount of failures. Now weāre full steam ahead on getting all these scanned images processed!
End of the line, writing a bash-script to bulk process theĀ scans
Crisis averted, To wrap up, a folder structure was established to keep images in albums for easy organisation. This will help in the last part which is syncing the images on the cloud provider.
ā ļø Warning: Please excuse the crudeness of these bash scripts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
imagePath="/Users/kaveenr/ImageScanning"
processIt () {
echo "Shaving edges of $1"
tmpFile="/tmp/sane\_scan\_$(date +%s).tiff"
magick convert "$1" -shave 15x45 $tmpFile
./multicrop -f 20 \\
-u 1 \\
-d 300 \\
-p 10 \\
-b '#e9ecea' \\
$tmpFile "./$2/$3"
rm $tmpFile
echo "Processed Under $outPath/$fileName"
}
find $imagePath -name "\*.tiff" | while read line; do
fileName=$(basename "$line")
fileName=${fileName// /\_}
fileDir=$(dirname "$line")
outPath="out${fileDir#$imagePath}"
outPath=${outPath// /\_}
mkdir -p "$outPath"
# Check Path
checkPath="./$outPath/${fileName%.tiff}-\*.tiff"
echo "checking path $checkPath"
if ls $checkPath 1> /dev/null 2>&1; then
echo "files do exist for '$line'"
else
processIt "$line" "$outPath" "$fileName"
fi
done
This first script will take the scanned images; do the aforementioned trimming and run it through multicrop.

Woo!
š„³ Profit! Now you have an output folder with albums and splitter tiff files.
The easily accessible part of theĀ mission
Hold on, the mission of this project had āMake the albums easily accessible.ā. At this point, Iāve already synced these photos to a cloud drive and another local-offline backup.
As for the ease of access, I decided to add the images to my cloud photos manager Google Photos. Scripting time!
1
2
3
4
5
6
7
8
9
10
11
imagePath="/Users/kaveenr/Projects/sane-scan-photo/out"
ls $imagePath | while read album; do
albumTitle=${album//\_/ }
albumTitle=${albumTitle%%(\*)}
albumTitle="${albumTitle%% }"
echo "Syncing $albumTitle"
rclone sync $imagePath/$album \\
Photos:album/"$albumTitle" -P
done
Apart from some album renaming the heavy lifting is done by Rclone and itās Google Photos backend.

The End. Memories On GoogleĀ Photos
š Done! Now Iām able to browse photos easily on Google Photos and share it with my parents!
Takeaways
It feels great to write something again. Apart from feelings and sorts. In short, the following points summarize this project in total.

Happiness, Me feeding āRolloā the cat cake on my 7th birthdayĀ (2004)
- Back your shiz upāāāFor images as said before physical is the way to go, but always have a secondary soft-copy.
- Think twice before you delete anything, you might regret later.
- Image Magick is amazing. Iāve been converted. I actually manipulated all of the images for this post using IM.
Personally, this was my first entry into bash-scripting as I often use Python for most automation tasks. And with tools like IM, itās unbelievably easy to write image related scripts in bash. Next is ffmpegā¦.
Appendices
Heh, I have more if youāre interested. Some stuff that is too boring to cram into the main section
Multicrop parameter Notes
Configuration Notes:-b ā#e9eceaāāāāBackground Color to be masked out
-f 20āāāfuzzy match background threshold (for non-uniform backgrounds)
-d 300āāādiscard identified regions if they are smaller than 300 pixels
Adding EXIF Data & Compressing toĀ HEIC
I created an EXIF data script that will extract the date or year from the folder and embed it into the image. I decided to compress the images to HEIC which is much better than JPEG compression.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
imagePath="/Users/kaveenr/Projects/sane-scan-photo/out"
yearRegex='\\((\[0-9\]{4})\\)'
dateRegex='\\((\[0-9\]{4})\\:(\[0-9\]{2})\\:(\[0-9\]{2})\\)'
targetFormat="heic"
ls $imagePath | while read album; do
albumTitle=${album//\_/ }
if \[\[ "$albumTitle" =~ $yearRegex \]\]; then
guessYear=${BASH\_REMATCH\[1\]}
echo "Found Year in title $guessYear, Setting EXIF Data"
exiftool -overwrite\_original \\
-xmp:dateTimeOriginal="$guessYear:1:1" \\
-r $imagePath/$album/
fi
if \[\[ "$albumTitle" =~ $dateRegex \]\]; then
guessYear=${BASH\_REMATCH\[1\]}
guessMonth=${BASH\_REMATCH\[2\]}
guessDay=${BASH\_REMATCH\[3\]}
echo "Found Date in title $guessYear/$guessMonth/$guessDay, Setting EXIF Data"
exiftool -overwrite\_original \\
-xmp:dateTimeOriginal="$guessYear:$guessMonth:$guessDay" \\
-r $imagePath/$album/
fi
if ls $imagePath/$album/compressed 1> /dev/null 2>&1; then
echo "Album $albumTitle already compressed"
else
echo "Compressing $albumTitle to $targetFormat"
(cd $imagePath/$album ; magick mogrify -quality 100% -format $targetFormat \*.tiff)
(cd $imagePath/$album ; mkdir -p compressed; mv \*.$targetFormat compressed)
fi
done
Note: Ultimately these tags were not picked up by Google Photos for some reason I suspect itās the HEIC format.
Q: How many photos did I backĀ up?
Not that much, 371
1
2
ā find ./out -name "\*.tiff" | wc -l
371