2000x performance win
I recently helped analyze a performance issue in an unexpected but common place, where the fix improved performance of a task by around 2000x (two thousand times faster). As this is short, interesting...
View ArticleFlame Graphs
MySQL Flame Graph Determining why CPUs are busy is a routine task for performance analysis, which often involves profiling stack traces. Profiling by sampling at a fixed rate is a coarse but effective...
View ArticleVisualizing Device Utilization
Device utilization is a key metric for performance analysis and capacity planning. In this post, I’ll illustrate different ways to visualize device utilization across multiple devices, and how that...
View ArticleActivity of the ZFS ARC
Disk I/O is still a common source of performance issues, despite modern cloud environments, modern file systems and huge amounts of main memory serving as file system cache. Understanding how well that...
View ArticlePerformance Analysis talk at SCALE10x
Last week I gave a talk at the Southern California Linux Expo (SCALE) titled “Performance Analysis: New Tools and Concepts from The Cloud”. There was a great turnout for my talk, which was videoed by...
View ArticleVisualizing Process Snapshots
In Visualizing the Cloud I showed processes and their parent-child hierarchy, across a cloud environment, exploring patterns at different scales. Here I’ll take this a little further and look at...
View ArticleVisualizing Process Execution
In Visualizing Process Snapshots I showed processes and their parent-child hierarchy over time, using snapshots of process information. This approach misses short-lived processes that occur between the...
View ArticleLinux Kernel Performance: Flame Graphs
To get the most out of your systems, you want detailed insight into what the operating system kernel is doing. A typical approach is to sample stack traces; however, the data collected can be time...
View Article10 Performance Wins
I work on weird and challenging performance issues in the cloud, often deep inside the operating system kernel. These have made for some interesting blog posts in the past, but there’s a lot more I...
View ArticleFISL13: The USE Method
In July, Bryan Cantrill, Deirdré Straughan and I spoke at FISL, one of the world’s largest open software conferences, in Porto Alegre, Brazil. I had a great time. My talk introduced the USE Method: a...
View ArticleActive Benchmarking
Benchmarking is often done badly: tools are run ad-hoc, without understanding what they are testing or checking that the results are valid. This can lead to poor architectural choices that haunt you...
View ArticleDTracing in Anger
My Macbook has becomeso sluggish that it feels like I’m typing ove a 9600 baud modem aagn. Or 2400. It’s alo droping keystokes – which is irritatng as hll – so please forgive theapparent tyos and...
View ArticleSurge 2012: Real-time in the real world
In September, I attended and spoke at the Surge’12 conference in Baltimore. I highly recommend it for anyone interested in performance. The theme with most talks was problems encountered at scale – and...
View ArticleUSENIX LISA 2010: Visualizations for Performance Analysis
My USENIX LISA talk from 2010 is now available on youtube, also embedded below. The title is Visualizations for Performance Analysis (and more), and showed how the full distribution of data could be...
View ArticleUSENIX LISA 2012: Performance Analysis Methodology
At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in...
View ArticleThe USE Method: Linux Performance Checklist
The USE Method provides a strategy for performing a complete check of system health, identifying common bottlenecks and errors. For each system resource, metrics for utilization, saturation and errors...
View ArticleThe USE Method: SmartOS Performance Checklist
The USE Method provides a strategy for performing a complete check of system health, identifying common bottlenecks and errors. For each system resource, metrics for utilization, saturation and errors...
View Articlezfsday: ZFS Performance Analysis and Tools
At zfsday 2012, I gave a talk on ZFS performance analysis and tools, discussing the role of old and new observability tools for investigating ZFS, including many based on DTrace. This was a fun talk –...
View ArticleVirtualization Performance: Zones, KVM, Xen
At Joyent we run a high-performance public cloud based on two different virtualization technologies: Zones and KVM. We have historically run Xen as well, but have phased it out for KVM on SmartOS. My...
View ArticleRevealing Hidden Latency Patterns
Latency Heat Map Response time – or latency – is crucial to understand in detail, but many of the common presentations of this data hide important details and patterns. Latency heat maps are an...
View Article