Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

Sorry, you do not have permission to ask a question, You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please type your username.

Please type your E-Mail.

Please choose an appropriate title for the post.

Please choose the appropriate section so your post can be easily searched.

Please choose suitable Keywords Ex: post, video.

Browse

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Logo Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Logo

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Navigation

  • Home
  • About Us
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • About Us
  • Contact Us
Home/ Questions/Q 9002

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Latest Questions

Author
  • 60k
Author
Asked: November 28, 20242024-11-28T06:22:10+00:00 2024-11-28T06:22:10+00:00

Loop Unrolling in JavaScript?

  • 60k

JavaScript can feel very removed from the hardware it runs on, but thinking low-level can still be useful in limited cases.

A recent post of Kafeel Ahmad on loop optimization detailed a number of loop performance improvement techniques. That article got me thinking about the topic.

Premature Optimization

Just to get this out of the way, this is a technique very few will ever need to consider in web development. Also, focusing on optimization too early can make code harder to write and much harder maintain. Taking a peek at low-level techniques can give us insight into our tools and the work in general, even if we can't apply that knowledge directly.

What is Loop Unrolling?

Loop unrolling basically duplicates the logic inside a loop so you perform multiple operations during each, well, loop. In specific cases, making the code in the loop longer can make it faster.

By intentionally performing some operations in groups rather than one-by-one, the computer may be able to operate more efficiently.

Unrolling Example

Let's take a very simple example: summing values in an array.

// 1-to-1 looping const simpleSum = (data) => {   let sum = 0;   for(let i=0; i < data.length; i += 1) {     sum += data[i];   }   return sum; };  const parallelSum = (data) => {   let sum1 = 0;   let sum2 = 0;   for(let i=0; i < data.length; i += 2) {     sum1 += data[i];     sum2 += data[i + 1];   }   return sum1 + sum2; }; 
Enter fullscreen mode Exit fullscreen mode

This may look very strange at first. We're managing more variables and performing additional operations that don't happen in the simple example. How can this be faster?!

Measuring the Difference

I ran some comparisons over a variety of data sizes and multiple runs, as well as sequential or interleaved testing. The parallelSum performance varied, but was almost always better, excepting some odd results for very small data sizes. I tested this using RunJS, which is built on Chrome's V8 engine.

Different data sizes gave very roughly these results:

  • Small (< 10k): Rarely any difference
  • Medium (10k-100k): Typically ~20-80% faster
  • Large (> 1M): Consistently twice as fast

Then I created a JSPerf with 1 million records to try across different browsers. Try it yourself!

Chrome ran parallelSum twice as fast as simpleSum, as expected from the RunJS testing.

Safari was almost identical to Chrome, both in percents and operations per second.

Firefox on the same system performed almost the same for simpleSum but parallelSum was only about 15% faster, not twice as fast.

This variation sent me looking for more information. While it's nothing definitive, I found a StackOverflow comment from 2016 discussing some of the JS engine issues with loop unrolling. It's an interesting look at how engines and optimizations can affect code in ways we don't expect.

Variations

I tried a third version as well, which added two values in a single operation to see if there was a noticeable difference between one variable and two.

const parallelSum = (data) => {   let sum = 0   for(let i=0; i < data.length; i += 2) {     sum += data[i] + data[i + 1];   }   return sum; }; 
Enter fullscreen mode Exit fullscreen mode

Short answer: No. The two “parallel” versions were within the reported margin of error of each other.

So, How Does it Work?

While JavaScript is single-threaded, the interpreters, compilers, and hardware underneath can perform optimizations for us when certain conditions are met.

In the simple example, the operation needs the value i to know what data to fetch, and it needs the latest value of sum to update. Because both of these change in each loop, the computer has to wait for the loop to complete to get more data. While it may seem obvious to us what i += 1 will do, the computer mostly understands “the value will change, check back later”, so it has difficulty optimizing.

Our parallel versions load multiple data entries for each value of i. We still depend on sum for each loop, but we can load and process twice as much data per cycle. But that doesn't mean it runs twice as fast.

Deeper Dive

To understand why loop unrolling works we look to the low-level operation of a computer. Processors with super-scalar architectures can have multiple pipelines to perform simultaneous operations. They can support out-of-order execution so operations that don't depend on each other can happen as soon as possible. For some operations, SIMD can perform one action on multiple pieces of data at once. Beyond that we start getting into caching, data fetching, and branch prediction…

But this is a JavaScript article! We're not going that deep. If you want to know more about processor architectures, Anandtech has some excellent Deep Dives.

Limits and Drawbacks

Loop unrolling is not magic. There are limits and diminishing returns that appear because of program or data size, operation complexity, computer architecture, and more. But we've only tested one or two operations, and modern computers often support four or more threads.

To try some larger increments, I made another JSPerf with 1, 2, 4, and 10 records and ran it on an Apple M1 Max MacBook Pro running macOS 14.5 Sonoma, and an AMD Ryzen 9 3950X PC running Windows 11.

Ten records at a time was 2.5-3.5x faster than the base loop, but only 12-15% faster than processing four records on the Mac. On the PC we still saw the 2x improvement between one to two records, but ten records was just 2% faster than four records, which I would not have predicted for a 16-core processor.

Platforms and Updates

These different results remind us to be careful with optimization. Optimizing for your computer could create a worse experience on less-capable or just different hardware. Performance or functionality issues for older or entry-level hardware is a common issue when developers work on fast, powerful machines, and it's something I've been tasked with multiple times in my career.

For some performance scale, a currently-available entry-level Chromebook from HP has an Intel Celeron N4120 processor. This is roughly equivalent to my 2013 Core i5-4250U MacBook Air. It has just one ninth the performance of the M1 Max in a synthetic benchmark. On that 2013 MacBook Air, running the latest version of Chrome, the 4-record function was faster than the 10-record, but still only 60% faster than the single-record function!

Browsers and standards are constantly changing, too. A routine browser update or a different processor architecture could make optimized code slower than a regular loop. When you find yourself deeply optimizing, you may need to ensure your optimization is relevant to your consumers, and that it stays relevant.

It reminds me of the book High Performance JavaScript by Nicholas Zakas, which I read back in 2012. It was a great book and contained a lot of insight. However, by 2014 a number of the significant performance issues identified in the book had been resolved or substantially reduced by browser engine updates, and we were able to focus more effort on writing maintainable code.

If you are trying to stay on the edge of performance optimization, be prepared for change and regular validation.

Lessons from the Past

While researching this topic I came across a Linux Kernel Mailing List thread from the year 2000 about removing some loop unrolling optimizations which ultimately improved the application performance. It included this still-relevant point (emphasis mine):

The bottom line is that our intuitive assumptions of what's fast and what isn't can often be wrong, especially given how much CPU's have changed over the past couple of years.
– Theodore Ts'o

Conclusion

There are times you may need to squeeze performance out of a loop, and if you are processing enough items, this could be one of the ways you do that. It's good to know about these kind of optimizations, but for most work, You Aren't Gonna Need It™.

Still I hope you've enjoyed my rambling, and that maybe in the future your memory will be jogged about performance optimization considerations.

Thanks for reading!

javascriptperformanceprogrammingwebdev
  • 0 0 Answers
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

Sidebar

Ask A Question

Stats

  • Questions 4k
  • Answers 0
  • Best Answers 0
  • Users 1k
  • Popular
  • Answers
  • Author

    How to ensure that all the routes on my Symfony ...

    • 0 Answers
  • Author

    Insights into Forms in Flask

    • 0 Answers
  • Author

    Kick Start Your Next Project With Holo Theme

    • 0 Answers

Top Members

Samantha Carter

Samantha Carter

  • 0 Questions
  • 20 Points
Begginer
Ella Lewis

Ella Lewis

  • 0 Questions
  • 20 Points
Begginer
Isaac Anderson

Isaac Anderson

  • 0 Questions
  • 20 Points
Begginer

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help

Footer

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise

Querify Question Shop: Explore, ask, and connect. Join our vibrant Q&A community today!

About Us

  • About Us
  • Contact Us
  • All Users

Legal Stuff

  • Terms of Use
  • Privacy Policy
  • Cookie Policy

Help

  • Knowledge Base
  • Support

Follow

© 2022 Querify Question. All Rights Reserved

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.