Connecting AWS Redshift and S3 Across Accounts

This is a problem that has been occurring a lot for me lately and I have done multiple times and still manage to fail at it. I keep either missing parts, getting confused part way, or feeling as if I am blindly copy/pasting answers I don't understand. Thus this is my own attempt at writing out the solution for this problem in a way that teaches how and what relationships are being formed and in a hope that it may help others. In essense, this is a rewrite of the AWS documentation post but in my own words and personalized examples in a format I personally find easier to comprehend.

The Problem

The situation this article is solving is when you have two separate AWS accounts. One of these accounts is running a Redshift cluster and the other is hosting your S3 bucket storage. What you would like to do is have Redshift write to the S3 buckets in the other account in a safe and secure manner, but continue running in its own account. Along with that, naming things is hard, so lets have a naming convention that makes sense as we go along

Parameters And Conditions

To keep things straight we are going use the following names for the various accounts, roles, and policies that we will be creating to solve this problem:

  • AWSRedshiftAccount - Account with Redshift Running on it. Account ID:110000000022
    • RedshiftClusterRole - The Role your redshift cluster assumes when doing anything
      • S3AWSS3AccountFromAWSRedshiftAccount-AssumePolicy - A policy allowing the RedshiftClusterRole to assume the S3AWSS3AccountFromAWSRedshiftAccountRole in the AWSS3Account
      • S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy - A policy specifying what buckets and what actions on those buckets can be done by redshift in AWSRedshiftAccount on the S3 buckets in AWSS3Account
  • AWSS3Account - Account with Redshift Running on it. Account ID: 550000000066
    • RedshiftStorageBucket - The name of the S3 Bucket we will be storing our Redshift data in
    • S3AWSS3AccountFromAWSRedshiftAccountRole - A role assumed by the RedshiftClusterRole to access S3 Bucket resources
      • S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy - A policy specifying what buckets and what actions on those buckets can be done by redshift in AWSRedshiftAccount on the S3 buckets in AWSS3Account
    • S3AWSS3AccountFromAWSRedshiftAccountTrustPolicy - A trust policy applied to the S3AWSS3AccountFromAWSRedshiftAccountRole specifying that the RedshiftClusterRole in the AWSRedshiftAccount is a trusted source to login with
    • S3AWSS3AccountFromAWSRedshiftAccountBucketPolicy - A policy applied to the bucket itself in S3 of the S3AWSAccount specifying access permissions for the S3AWSS3AccountFromAWSRedshiftAccountRole role

The above essentially gives a high level of all the pieces we are going to create and their relationships to each other. I have put policies as sub-bullets if they belong to the above role as a visual helper as well. In total we will be creating 2 Roles on each account and adding a number of policies and trust policies to them to give them the appropriate permissions to communicate with each other. Then we create an additional policy that is explicitly applied to the S3Buckets. Note that also some of these policies have identical names, this is intentional, as they do the identical things. The way AWS IAM works requires this duplication as inter-account communication is rather blind when it executes

Also, as an additional note, everything will be running in us-east-1. In more generic terms, this means this solution is only tried and trued with setups within the same regions. If you are communicating across regions, then I make no promises.

The Solution

Note! Order of this tutorial does matter. This tutorial has been written in the order listed in the overview to keep it sane and allow you to wrap your head around it. There will be issues if you follow it in the exact order though! AWS verifies Roles and Policy relations before they are submitted and if a Role or Policy references another Role or Policy that does not exist yet it will error on submission! However this ordering is not very clear to a user who is unaware or confused about what they are doing. The purpose of the tutorial is to make it clear and then to come back to this paragraph on how to actually complete the tutorial. In order to avoid receiving errors from AWS when creating the following Roles and Policies, create them in this order:

  1. Create RedshiftClusterRole in the AWSRedshiftAccount ("Create / Get Role Redshift Is/Will Use" Section)
  2. Create S3AWSS3AccountFromAWSRedshiftAccountRole and the S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy in the AWSS3Account ( "Create Role and BucketPolicy In AWSS3Account" Section)
  3. Create S3AWSS3AccountFromAWSRedshiftAccount-AssumePolicy and S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy inline policies in the AWSRedshiftAccount ("Add Policies To RedshiftClusterRole" Section)
  4. Edit the Trust Relationship in the AWSS3Account ("Edit Trust Relationship to Include Trust Relationship Policy" Section)
  5. Add the S3AWSS3AccountFromAWSRedshiftAccountBucketPolicy to the S3 Buckets in the AWSS3Account ("Add BucketPolicy" Section to S3)

If you are a first-time reader. I would strongly recommend you read this top to bottom first if you would like to be able to properly grasp what is all going on and the relationship that is being created with these IAM roles and accounts. After which, you can come back and read and copy/paste each section in the order mentioned aboe. If you are confident in your IAM and wanting to jump through this, then make sure to read the Parameters and Conditions section for the naming convention and then follow the above listing

Create / Get Role Redshift Is/Will Use:

Within the AWSRedshiftAccount, your Redshift cluster will have a Role it assumes when making certain actions. If that doesn't exist, go create one. Were going to assume that is named RedshiftClusterRole.

Add Policies To RedshiftClusterRole

Still within your AWSRedshiftAccount, select the RedshiftClusterRole within IAM and select "Add inline policy". Paste the following policy information:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "S3AWSS3AccountFromAWSRedshiftAccountAssumePolicy",
            "Action": [
            "Effect": "Allow",
            "Resource": "arn:aws:iam::550000000066:role/S3AWSS3AccountFromAWSRedshiftAccountRole"

Name the above policy: S3AWSS3AccountFromAWSRedshiftAccount-AssumePolicy. What this policy specifies is allow the RedshiftClusterRole in the AWSRedshiftAccount to assume the S3AWSS3AccountFromAWSRedshiftAccountRole role in the AWSS3Account. Note that we use the full ARN value in the "Resource" parameter to specify the account (using the account number) and the role.

Repeat this process, and create an additional inline policy to the RedshiftClusterRole and paste the following:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "S3AWSS3AccountFromAWSRedshiftAccountBucketPolicy",
            "Effect": "Allow",
            "Action": [
            "Resource": [

Name the above policy: S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy. This policy tells the RedshiftClusterRole that it is allowed to do any action (s3:*) on the RedshiftStorageBucket and its subfolders (RedshiftStorageBucket/*) in S3. Since all S3 bucket names are universally unique, there is no need to specify an account number, and thus it is not included in the full ARN

Create Role and BucketPolicy In AWSS3Account

Switch now to your AWSS3Account and go to IAM. Create a new role with the following settings:

  1. In "Select Type Of Trust Entity" choose "AWS Service"
  2. From the Service section choose "Redshift"
  3. Then, from the use case section below, choose "Redshift-Customizable"

On the policy page then select "Create Policy". For this policy paste the following:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "S3AWSS3AccountFromAWSRedshiftAccountBucketPolicy",
            "Effect": "Allow",
            "Action": [
            "Resource": [

Name the above policy: S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy. Note that this policy has the identical name as the policy we created in the AWSRedshiftAccount as an inline policy for the RedshiftClusterRole. This is needed by AWS and means each account now explicitly knows its permissions on the S3 buckets. Finish creation of the policy.

When creating this policy, AWS will have opened a new tab. Switch back to the other tab which contains the Role we were creating. In the policy listing on that page, click the refresh button and then search for S3AWSS3AccountFromAWSRedshiftAccount-BucketPolicy. Add that policy to the role. Select next and then name this role S3AWSS3AccountFromAWSRedshiftAccountRole

Edit Trust Relationship to Include Trust Relationship Policy

Select the S3AWSS3AccountFromAWSRedshiftAccountRole that you just created and select the "Trust Relationship" tab, then select "Edit Trust Relationship". Delete all JSON contents and paste the following:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": "",
        "AWS": "arn:aws:iam::110000000022:role/RedshiftClusterRole"
      "Action": "sts:AssumeRole"

The above now specifies that the RedshiftClusterRole in the AWSRedshiftAccount is a trusted role that can assume the S3AWSS3AccountFromAWSRedshiftAccountRole in the S3BucketAccount

Add BucketPolicy to S3 Buckets

Now go to S3 within the S3BucketAccount and select the bucket which will be storing the Redshift data. In our example the bucket is named RedshiftStorageBucket. Select the bucket, then Permissions and then Bucket Policy. Within the bucket policy paste the following:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "S3AWSS3AccountFromAWSRedshiftAccountBucketPolicy",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::110000000022:role/RedshiftClusterRole"
            "Action": "s3:*",
            "Resource": [

The above policy specifies that the RedshiftClusterRole in the AWSRedshiftAccount is allowed to performa any action (s3:*) on the RedshiftStorageBucket and its subfolders (RedshiftStorageBucket/*) in S3.

Upon completion of that, you will have successfully configured permissions between Redshift and S3 across your two accounts!

How To Create Post-Install Hooks With Yum


CentOS and Fedora both use yum as their package manager. This gives you the benefit of using RPMs which come with options built in for actions such as pre and post package installation hooks. But what if you want to trigger actions on any package or a certain type of package regardless of the RPM ? What if you want yum after every package install, to run a script or command ? If you are also using RPMs not under your control, you don't want to be tearing apart those RPMs either to add your own functionality. Some RPMs when they install have also post actions configured and these can cause issues if your server is dealing with compliance or just general state consistency.

This is when you need the yum-plugin-post-transaction-actions plugin. This plugin adds the ability for you to execute scripts and commands after a certain yum action and also trigger only when certain packages are touched, or all packages. In this example we will configure a post action to run a script after every time yum install <package> is run. The use case of this example is in a situation of compliance when the state of the server needs to be checked for any issues that may have been caused by the installed package. The installed package we can assume could be anything from an upgrade to a dev debugging tool.

Install The Plugin

Install the plugin with the following command:

sudo yum -y install yum-plugin-post-transaction-actions  

This will install the yum-plugin-post-transaction-actions plugin and configure its default configuration. Within the /etc/yum/pluginconf.d/post-transaction-actions.conf file the following will have been automatically configured with the install:

enabled = 1  
actiondir = /etc/yum/post-actions/  

This simply specifies whether the plugin is to be enabled and the location of our post-action scripts. Post-action scripts are the configuration files telling the plugin what to look for when a yum action occurs and what script or command to execute when it finds a match.

Configure A Post-Action Every Time Yum Install Is Called

Within the /etc/yum/post-actions/ folder create a file named install.actions and add the following:

*:install:sh /tmp/

The above configuration can be interpreted as follows:


The above configuration specifies to match any application (*) during a yum install call (install) and then to execute with shell the script /tmp/ (sh /tmp/ Now we simply fill in /tmp/ with our shell script to automate the process we want. This note is not limited to just shell. As we could alternatively make this a python script and execute as such with the following post actions configuration in the install.actions file as follows:

*:install:python /tmp/

Note that in this case we have a python script instead of a shell script being called

For more examples of configurations see the sources for this post. Specifically the fedora documentation explains more variables that can be used for more dynamic configuration within .actions configurations files and other configurations for tailoring specifically to certain yum actions or certain applications only

Create Windows Service In Python 3

Python for windows is pretty powerful. Largely by its ability to complete extensive tasks in very few lines of code. Legibility is maybe tougher, but it is a great tool when scripting or tool building.

Python gets even better with its support to run as a Windows Service. The main problem though with it is the shortage of documentation. Hopefully this post will change that.

Note the following tutorial was carried out on Windows 10 Pro x64. mborus was also able to carry out the following tutorial on Windows 7 x86 with only some minor path changes!

Installing Python 3 on Windows

First up is installing python 3 for windows. This is pretty easy - just download from For this demo I used specifically Python 3.6.1 (

Next, Python does lack some native library support needed to create services in windows - so you also need to install the python windows extension library here: This process is quite a pain to sift through, as you need to dig through the version numbers and match your python version and the python bit version. If you downloaded python 3.6.1 as I did, it has likely installed an x86/32-bit version of python 3.6.1. The corresponding extension I downloaded was in the Build 221 folder here: From here I downloaded pywin32-221.win32-py3.6.exe. Note anything with arm64 in the name is x64/64-bit.

UPDATE - April 28/18: I've been building some more recent tools with newer versions of python and pywin32, and additionaly been working with 64 bit python rather than 32 bit. The 32bit version I found has memory limitations when working with multi-processing or multi-threading. For my latest setup I used Python 3.6.5 and pywin32 release 223. To download these I downloaded the 64-bit amd64 version of python 3.6.5: and then downloaded for release 332 from pywin32's new github page in their releases: The previous sourceforge location is still kept for older versions as an archive but is no longer maintained.

Up to this point everything is auto-magically installed into place. For your service to run though, the python installation is missing required PATH variable information, so to do this you need to include a number of routes in your system and user PATH variables.

Open your windows environment variables and add the following to the system (note it must be the system path as windows services don't run under your current user by default) and user PATH variables. You will need to tweak them to your computer as Python is automatically installed in the AppData of the logged in user:

C:\Program Files\Python 3.6  

The DLL path is specifically needed so that Python can generate and run the Windows Service.

From mborus, for Windows 7 x86, only the following paths are needed in order to get the python service to run:


Note that if you have installed Python on your computer differently, or it is not located in the AppData folder, you may need to do some more searching in order to correctly find the location of the DLL and Scripts folders.

Starting Your Service Project

Now that you have everything setup with Python 3, you can build your service. Documentation here is sparse, but I was able to find another person blog and a stack overflow post to find everything needed to get started. The original posts are as follows:

From these two I was able to come up with this for a starting template

import win32serviceutil  
import win32service  
import win32event  
import servicemanager  
import configparser  
import os  
import inspect  
import logging  
from logging.handlers import RotatingFileHandler

class AppServerSvc (win32serviceutil.ServiceFramework):  
    _svc_name_ = "MyService"
    _svc_display_name_ = "My Service"

    _config = configparser.ConfigParser()

    def __init__(self,args):
        self.hWaitStop = win32event.CreateEvent(None,0,0,None) + '/config.ini')
        print(os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) + '/teconfig.ini')

        logDir = self._config["DEFAULT"]["loggingDir"]
        logPath = logDir + "/service-log.log"

        self._logger = logging.getLogger("MyService")
        handler = RotatingFileHandler(logPath, maxBytes=4096, backupCount=10)
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    def SvcStop(self):

    def SvcDoRun(self):
                              (self._svc_name_,''))"Service Is Starting")


    def main(self):
        # your code could go here
        rc = None
        while rc != win32event.WAIT_OBJECT_0:

            # your code here...

            # hang for 1 minute or until service is stopped - whichever comes first
            rc = win32event.WaitForSingleObject(self.hWaitStop, (1 * 60 * 1000))

            # your code also here ...

if __name__ == '__main__':  

The above code is pretty straight forward. the SvcStop method is called when your service is stopped in windows. The SvcDoRun is called when your method is started. To keep the code more sane there is also a main method which is called at start. Within this method all of your services logic should be called. Ontop of these basic components I also added a logger and configuration loader. The configuration loader looks for a config.ini located within the same directory as the service script is above. From within this config the logging directory can be found - writing to the folder location specified in the config to the file service-log.log. _svc_name_ and _svc_display_name are both required attributes needed for the service installation. These will work as your display names and calling names from terminal.

Install Your Service

Installing your service in python is extremely easy. Open a terminal with Administrator privileges and cd to the location of your service script. Type the following:

python <nameOfFile>.py install  

This will install your service. For additional options simply run your script with no parameters. You can update or remove your service at any time using the appropriate update and remove commands, or even run in debug with debug

Extra Tips

One of the most common errors from windows when starting your service is Error 1053: The service did not respond to the start or control request in a timely fashion. This can occur for multiple reasons but there are a couple things to check when you do get them:

  • Make sure your service is actually stopping:Note the `main` method has an infinite loop. The template above with break the loop if the stop even occurs, but that will only happen if you call `win32event.WaitForSingleObject` somewhere within that loop; setting `rc` to the updated value
  • Make sure your service actually starts: Same as the first one, if your service starts and does not get stuck in the infinite loop, it will exit. Terminating the service
  • Check your system and user PATH contains the necessary routes: The DLL path is extremely important for your python service as its how the script interfaces with windows to operate as a service. Additionally if the service is unable to load python - you are also hooped. Check by typing `echo %PATH%` in both a regular console and a console running with Administrator priveleges to ensure all of your paths have been loaded
  • Give the service another restart: Changes to your `PATH` variable may not kick in immediately - its a windows thing

ProGuard, Gradle BuildTypes & Dynamic Class Loading In Android

Android has many great build tools for compiling and configuring your app in the most ideal and organized way possible, but not without some unknown tricks that can throw you (or mostly me) off when trying to get there. The problems I ran into was when configuring gradle build types with proguard and my app which contained dynamicaly loaded classes. Arguably, not loading classes dynamicaly would have avoided this entire issue, but thats not the point, and for the future functionality of my app, was not an option.

The problem starts when trying to configure FireBase into my application. The recommended setup to seperate your development event logs from production development logs, is to have seperate applicationIds. This lead to configuring my with the following (inside the android {} block) :

defaultConfig {

    applicationId "com.projectterris.myapplication"
    minSdkVersion 21
    targetSdkVersion 23
    versionCode 1
    versionName "1.0.0"
    multiDexEnabled true

buildTypes {  
    release {
        minifyEnabled true
        shrinkResources true
        proguardFiles getDefaultProguardFile("proguard-android.txt"), ""

debug {  
    applicationIdSuffix ".debug"
    debuggable true

    minifyEnabled true
    shrinkResources true
    proguardFiles getDefaultProguardFile('proguard-android.txt'), ''

Normaly, its not ideal to run proguard in your debug builds, as it causes your stack traces to be obfusicated, which causes issues when debugging. When I discovered this issue, I was testing my proguard setup and wanted to see the setup without having to create a release type.

After syncing and building I started getting these errors. Note my app was written in Kotlin, if your in Java and you have this issue, youll get roughly the same errors as Kotlin compiles down to Java bytecode anyway

Caused by: java.lang.ClassNotFoundException: Didn't find class "com.projectterris.myapplication.debug.datastore.dll.SQLiteHelper" on path: DexPathList[[zip file "/data/app/com.projectterris.myapplicaiton.debug-1/base.apk"],nativeLibraryDirectories=[/data/app/com.projectterris.myapplication.debug-1/lib/arm, /vendor/lib, /system/lib]]  

I eventualy discovered this was because of how my SQLiteHelper class was being dynamicaly loaded.

val classLoader =  
val aClass = classLoader.loadClass("com.projectterris.myapplication.datastore.dll.SQLiteHelper")  
val ctor = aClass.getDeclaredConstructor(  
val instance = ctor.newInstance(context) as SQLiteHelper  

Now you may be thinking "oh your fully qualified name does not contain 'debug' in it", but this causes an interesting problem. It seems that when you have different applicationIds configured in gradle, the package name to find all your compiled classes changes aswell. So for a release build the SQLiteHelper's canonical name is com.projectterris.myapplication.datastore.dll.SQLiteHelper and for debug it is com.projectterris.myapplication.debug.datastore.dll.SQliteHelper.

The fix for this can be implemented in a couple of different ways. For one, every android application has a BuildConfig object which contains a boolean variable that can be accessed as BuildConfig.DEBUG. This value will return true if your BuildType is debug. Consistency in this variable working though in Android Studio is argued alot on StackOverflow, so an alternative is preferrable. Another option is that you can set variables in gradle that can be accessed via the BuildConfig object.

Both of these options work in most cases, except for when proguard gets involved, and brings us to the next bug I encountered in my app. Proguard is basicaly the minifyer for Java/Kotlin, but in the process of doing so, also changes all the names of your classes and methods. This makes using a static string in the classLoader.loadClass(String) method not an option. Regardless of whether .debug is included in the canonical name or not, the actual class name or package will not be equivelent to its uncompiled canonical name. The solution I found to this was to use reflection again in determining the canonical name. This way, I always have the complete and proper canonical name, regardless of proguard or my buildtype. The code in the ended looked like this:

val classLoader =  
val aClass = classLoader.loadClass(  
val ctor = aClass.getDeclaredConstructor(  
val instance = ctor.newInstance(context) as SQLiteHelper  

The next problem the appeared was from my constructor, which proguard decided at compile time was no longer needed and I ended up with errors like this:

Caused by: java.lang.NoSuchMethodException: <init> [class android.content.Context]  

This fortunatly was one of the easier fixes and is well described in the StackOverflow solution here :

Basicaly, what I needed to do was tell proguard not to remove the constructor entry, thus allowing me to dynamicaly instantiate it. Adding the following rule to my file, resolved this final issue:

-keepclassmembers class com.projectterris.myapplication.datastore.dll.SQLiteHelper{
 public <init>(android.content.Context);

Note that proguard rules can be used extensively and are quite powerful. The StackOverflow link above shows how you can also configure proguard to exempt any class that extends a common super class. This could be another fourth alternative to using the relective canonical name when loading dynamic classes with classLoader.loadClass(String). I haven't tested this, but telling proguard not to obfusicate the SQLiteHelper class could greatly simplify implementation, and reduce the reliance on reflection (which is known to be slow)

I ended up in this process also with AndroidManifest.xml permission errors not being shared between my .debug and normal applicationId apps, but later found out this was just intermediate files not being deleted properly. A clean and rebuild was able to resolve that issue. And so my project continues...

Volume "filesystem root has only 0 bytes of disk space" Errors on Fedora 24


Recently I encountered an unpleasent system error on Fedora 24. "filesystem root has only 0 bytes of disk space" after some googling, resolved quite simply to the root partition / being out of space. The problem though was trying to find the right places to delete the right things. If you are like me and installed Fedora simply by following the GUI installer and largely accepted default settings, you end up with your HDD heavily repartitioned into various segments, including /dev, /home, /boot and / being root. Any data on your computer not stored within the other segments is fair game to be taking up space in the / root segment. I was able to find though a number of useful tools for confirming and clearing out old data stored on Fedora 24 and cleanup the / root segment.

Viewing Your Segments

You can view all segments on your system using the df command. This is useful to confirm the problem is the space on your system You'll get an output looking something like this:

As of this writing and the above screenshot, you can see I've already been able to clear out space from my / root partition under the Mounted on column. The / root partition was given 50GB on my system, and I was able to cut down the used space to 82% which was acceptable for me for the time being.

Remove Orphan Packages

On debian there is a tool called deborphan which can be installed to remove "orphaned packages" or packages that no other package depends on. In Fedora there are couple of different commands to replicate this functionality.


package-cleanup is a yum-utils package that still works on Fedora 24, even with yum no longer being used. You can cleanup orphaned packages with the following command

package-cleanup --quiet --leaves --exclude-bin | xargs dnf remove -y  

package-cleanup can sometimes be too bold, so the --exclude-bin parameter is included to avoid potentialy important packages from being removed. The results from package-cleanup are then piped into xargs and the dnf remove command as parameters. --quiet is used to minimize the output and --leaves is the command to look for packages which aren't being relied on by currently installed packages.


For rpm based systems there is an actual identical tool to deborphan named rpmorphan. This tool does similar to mentioned above. If your system though gets the "filesystem root has only 0 bytes of disk space" you will not have any space available to install this package. So you will likely have to execute the package-cleanup command above before you can install rpmorphan. rpmorphan will also look for unused libraries and can also be configured to do further and more braver sweeps of packages. To run rpmorphan, execute the following commands

sudo dnf install rpmorphan  
sudo dnf remove `rpmorphan`  

The first command will install the package, followed by executing and removing the resulting packages from rpmorphan


dnf includes a parameter called autoremove which operates similarily to apt-get autoremove in that it will remove unused depency packages. The problem I found with this command is that it can include programs you may want to keep. Running the call I found dnf wanting to remove programs such as firefox, thunderbird, and wireshark. So I would recommend you use this command with caution, if at all. To execute an autoremove, simply execute the following command

sudo dnf autoremove  

dnf will prompt you with a list of all the packages it will remove, at which point you can answer Y/N to proceed. Check the packages at this point to decide whether to proceed.

Remove Old Package Versions

From your already installed packages, Fedora has a habit of keeping the previously installed outdated versions after an upgrade. These obviously take up space. This can be done simply enough using dnf and a number of commands available through it. The following command will fetch and remove all outdated copies of packages installed on your system

sudo dnf repoquery --quiet --duplicated --latest-limit -1 | xargs -l1 dnf remove -y  

Whats happening in the above command is dnf is making a repoquery for all duplicate packages using the --duplicated flag. These results though will also include the latest version, so a filter is applied using the --latest-limit flag. By setting this to -1 we tell dnf to not include the latest version in its repoquery. This thus generates a list of duplicated packages on your system, not including the latest installed version. This is all then piped into xargs and into the dnf remove command. When I execute this command, older versions of systemd packages showed up. systemd packages are obviously important, but you don't have to worry about them as dnf will not allow you to uninstall them, and will simply skip them as it processes the list

Remove PackageKit Meta

Using the GNOME Disk Usage Analytics tool I discovered the /var/cache/PackageKit folder was taking up 16GB of space in the / root segment. Within this folder is kept cache copies of different programs on your system. These are used by GNOME and not by dnf and is only referred to if you use the GNOME GUI to install your rpm applications. Clearing this cache will give you a large amount of space back in the / root segment. On Fedora, there is an included console client that will refresh the PackageKit system. Execute the following command

sudo pkcon refresh force -c -1  

The refresh refreshes the cached information and the -c command sets the cache age for how old the data can be before it is removed. By passing -1 it sets the cache age for to none, and thus deletes the cache.

Interestingly this command doesn't always work, and after refreshing on the GNOME Disk Usage Analytics tool, still had a large amount of data being stored in the PackageKit folder. In this case, some manual work needs to be done. You can manually clear the folder with the following command

sudo rm -rf /var/cache/PackageKit/metadata/updates/packages/*  

After doing this open the PackageKit config file located at /etc/PackageKit/PackageKit.conf and uncomment the following line within the file


This will stop the PackageKit from caching anymore packages in the future and hopefully mitigate the need to repeat this cleaning process

Cherry Pick Broken and Orphaned Packages

During an upgrade, certain packages may not be easily retrievable using the above methods. Fortunatly dnf comes with a couple of tools to spot out this information. The following commands will only list these odd packages. These calls typicaly will include packages you want to keep, so to use these you will have to run them and then remove them each individualy.

sudo dnf list extras  

This command will list extra packages found on the system that the current version of dnf cannot do much with. An example is if you are upgrading from Fedora 21 to 22, the above command will list packages from Fedora 21 that are of no longer use to Fedora 22. You can identify these packages by their postfix (in our example ending with .fc21). This can be applied to any Fedora upgrade situation

sudo dnf repoquery --unsatisfied  

This command will list any packages that may have broken or has unsatisfied dependencies on your system. In most use cases, you probably will want to update or install the appropriate dependencies for the package, assuming you want to use whatever package that is unsatisfied, but if not, these packages will still be taking up room in their broken state on your system and can be removed to save space.