http://highscalability.com/blog/2012/5/16/big-list-of-20-common-bottlenecks.html Nothing specific, mostly code clean up, refactoring and simplification, the performance boost was a surprise. <- This is a good one - http://bad-concurrency.blogspot.com.au/2012/07/disruptor-v3-faster-hopefully.html Is there anything we can do about this when designing algorithms and data-structures? Yes there is a lot we can do. If we perform chunks of work on data that is co-located, and we stride around memory in a predictable fashion, then our algorithms can be many times faster. For example rather than using bucket and chain hash tables, like in the JDK, we can employ hash tables using open-addressing with linear-probing. Rather than using linked-lists or trees with single items in each node, we can store an array of many items in each node. - http://mechanical-sympathy.blogspot.com.au/2012/08/memory-access-patterns-are-important.html Skip lists are used instead of b-trees because b-trees don’t scale. - http://highscalability.com/blog/2012/8/14/memsql-architecture-the-fast-mvcc-inmem-lockfree-codegen-and.html Beware about the performance issue of static initalization - http://stackoverflow.com/questions/14010906/given-that-hashmaps-in-jdk1-6-and-above-cause-problems-with-multi-threading-how Experiment show why arraylist is better in most cases - http://www.javaadvent.com/2013/12/arraylist-vs-linkedlist.html How to design low latency application in java - http://vanillajava.blogspot.com.au/2014/05/chronicle-and-low-latency-in-java.html http://highscalability.com/blog/2014/5/21/9-principles-of-high-performance-programs.html http://blog.libtorrent.org/2012/12/principles-of-high-performance-programs/ Suggestion about how to determine number of thread pool - http://venkateshcm.com/2014/05/How-To-Determine-Web-Applications-Thread-Poll-Size/ Beware the performance penalty of logging - https://plumbr.eu/blog/locking-and-logging Keep thing dynamic - http://highscalability.com/blog/2014/5/21/9-principles-of-high-performance-programs.html http://www.rationaljava.com/2015/01/first-rule-of-performance-optimisation.html http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html http://highscalability.com/blog/2015/5/4/elements-of-scale-composing-and-scaling-data-platforms.html?SSLoginOk=true Discuss about developing low latency financial application - http://queue.acm.org/detail.cfm?ref=rss&id=2770868 Discussion of object pooling - http://highscalability.com/blog/2015/7/29/a-well-known-but-forgotten-trick-object-pooling.html http://coffeenco.de/articles/jvm_performance_part_1_object_pooling.html Efficiency - the amount of work you need to do. Performance - how fast you can do that work Efficiency - governed by your algorithm Performance - governed by your data structures. http://www.rationaljava.com/2015/07/the-difference-between-efficiency-and.html Turning off power save mode on the CPU reduced brought the max latency from 11 msec down to 8 msec. Guaranteeing threads will always have CPU resources using CPU isolation and thread affinity brought the maximum latency down to 14 microseconds. http://highscalability.com/blog/2015/9/30/strategy-taming-linux-scheduler-jitter-using-cpu-isolation-a.html http://epickrram.blogspot.co.uk/2015/09/reducing-system-jitter.html About design for performance for webapi - http://tech.forter.com/9-5-low-latency-decision-as-a-service-design-patterns/ check list - http://techbeacon.com/102-performance-engineering-questions-every-software-development-team-should-ask Beware if system utilization over 80% - http://www.infoq.com/cn/news/2016/02/utilisation-wait-latency http://robharrop.github.io/maths/performance/2016/02/20/service-latency-and-utilisation.html scalable-io-events-vs-multithreading-based - https://thetechsolo.wordpress.com/2016/02/29/scalable-io-events-vs-multithreading-based/ How to find out bottleneck - https://vanilla-java.github.io/2017/02/06/Improving-percentile-latencies-in-Chronicle-Queue.html https://www.inkandswitch.com/slow-software.html Compiler Performance and LLVM - http://pling.jondgoodwin.com/post/compiler-performance/ Know Thy Complexities! - https://www.bigocheatsheet.com/?fbclid=IwAR2iUUNJDHSJCc9PNgyoW9D2qyCTt3qMftzYxsdz7KJs8LkSeRPWLzguwpA 10 大高性能开发宝石 - https://xie.infoq.cn/article/a0d418bf29915ecad5d5eeab0 How to detect and fix IO related performance issue - https://blog.ycrash.io/2020/11/28/i-o-waiting-cpu-time-wa-in-top/ RangeBitmap produces a RoaringBitmap of the indexes which satisfy a predicate, and can take RoaringBitmap parameters as inputs to skip over rows already filtered out. The Streams API code used before is translated into RangeBitmap API calls: - https://richardstartin.github.io/posts/range-predicates