The automatic critical CSS generators can be very finicky. You might have some luck if your site is fairly static (DOM not being modified by any JS) - but I found they would never get any better than like 80% ok. I was never personally satisfied with them. They will reduce your first interaction time and reduce your cumulative layout shift - but will not get them to zero in my experience. I wanted to get to 0 CLS (except for web font loading, so very small fraction CLS may be unavoidable but should be essentially zero).
So for my website, I’m using a gulp toolchain to concatenate and minify all my CSS/JS and specifically a postcss plugin called “postcss-criticial-split” that splits the CSS based on comments you’ve add throughout your CSS files.
Have a read through the postcss-criticial-split docs here:
You can add comments to designate CSS rules that are critical and this tool will pull all those rules out into its own file (and optionally you can also run the tool to pull out all the non-critical css rules). Using comments in your CSS files allows you to keep the original CSS organized however you like (instead of butchering the bootstrap CSS, you just annotate what bits are important to your above the fold content).
To decide which comments are critical, you might start with a list of rules spit out by an automatic tool like you hinted to above. But then I always manually check the site using browser developer tools, crank up the connection throttling in your browser dev tools and watch the site load slowly. Does anything shift/load late/blink? (CLS and FOUC)
This takes a bit of hand tuning (ie: time), but allows you to get to those insanely low load times. With a full warmed up CDN, my site can hit <500ms fully loaded homepage now. Whether that is worth the manual work to you or not will be up to you to decide! I’d say I spent > 40 hours on the BensTechLab.com website to get it to this state (perfect WebPageTest score, perfect GTMetrix score and pagespeed insights).
As for your second question, I would avoid loading ALL of your CSS in the deferred load because it is wasteful of extra bandwidth, duplicate CSS rule processing (CPU/rendering) and most of all it can produce some weird CSS bugs when you have duplicate rules and the order of equal specificity rules changes after load. So again I am using the above postcss plugin to spit out a “critical” css file and a “rest of the css” file.