stared 6 minutes ago

Beware - one step more and you get into the region of generating functions. I recommend a book Herbert Wilf with a wonderful name of Generatingfunctionology (https://www2.math.upenn.edu/~wilf/gfology2.pdf).

  • eliben a minute ago

    Indeed, generating functions are mentioned in a footnote :) Very interesting topic

  • srean 2 minutes ago

    Indeed. Perturbation theory and Feynman integrals not far behind.

incognito124 4 hours ago

My favourite use case for this: By the same derivation as this blog, one can prove that, if you have any two probability distributions X and Y (they can be different), the probability distribution of X+Y is a convolution of the PMFs/PDFs of X and Y.

  • srean 3 hours ago

    On similar lines the MAX operator on the random variables become PRODUCT operator on its distribution.

    It's fun to play with the (Max, +)algebra of random variables and infer it's distribution.

    This turns out to be quite useful in estimating completion time of dependant parallel jobs.

    Spawning multiple parallel jobs becomes a Max operation and chaining sequential jobs becomes a '+' operation on the completion times. This expression tree of Max'es and Plus'es can be algebraically processed to obtain bounds on the completion time distribution.

    One example is the straggler problem in mapreduce/Hadoop. In the naive case, the completion time is the max of each parallel subtask. (*)

    If the tasks have a heavy tail, which sometimes they do, the straggler's completion time can be really bad. This can be mitigated by k-out-n set up, where you encode the problem in such a way that only k out of n jobs need to finish to obtain the final result. One can play with this trade-off between potentially wasted computation and expected completion time.

    For heavy tailed distributions another simplification is possible. The tails of Max and + start becoming of the same order, so one can switch between convolutions and products.

    (*) This shows up in microservices architectures also. The owner/maintainer of a microservice end point might be very happy with its tail latency. However an end user who consumes and composed the results of the endpoints can experience really bad tail latencies for the final output.

    • whatshisface 10 minutes ago

      That doesn't sound right. If P(X) is the vector {0.5,0,0.5} and P(Y) is {0.5,0.5,0}, P(X)P(Y) is {0.25,0,0} and that's both not normalized and clearly not the distribution for max(X,Y). Did you get that from an LLM?

      • srean 7 minutes ago

        You are using PMFs. I meant distribution function aka cumulative distribution function. They are closed under products.

        > Did you get it from LLM

        LOL. There must be a fun and guilty story lurking inside the accusation.

        On a more serious note, I would love it if LLMs could do such simplifications and estimations on their own.

    • ttoinou 43 minutes ago

      What ? I've never heard of this math. Do you mean literally those are the resulting operations in the general case or are those approximate explanation and we need to find more specific cases to make this true ?

      • srean 30 minutes ago

        The math is exact. Max and Plus at the random variables space becomes product and convolution in their distribution function space.

            Distr(X+Y) =  DistrX ° DistrY
        
            Distr (X ^ Y) = DistrX * DistrY. 
        
            Where '^' denotes Max and '°' denotes convolution.
        
        Note *, +, ° and ^ being commutative and associative they can be chained. One can also use their distributive properties. This really the math of groups and rings.

        However, one can and one does resort to approximations to compute the desired end results.

        More specifically, people are often interested not in the distribution, but some statistics. For example, mean, standard deviation, some tail percentile etc. To compute those stats from the exact distributions, approximations can be employed.

    • incognito124 3 hours ago

      Thanks for sharing the name of that problem! I've encountered it before while optimizing batched LLM inference. The whole batch would last until all queries in a batch were done, and by changing the batch size, you'd trade off per-query-speed (better in a larger batch) with overall performance (worse with a larger batch).

      Nowadays I think this is solved in an entirely different way, though.

Sourabhsss1 8 hours ago

The visualizations make the concept easy to grasp.

bjt12345 6 hours ago

This and complex analysis are fascinating topics in Undergraduate studies.