#
What is the significance of p(x|y) ?

I was reading a machine learning book that uses probabilities like these:

`P(x;y),P(x;y,z),P(x,y;z)`

I couldn't find what they are and how can I read and understand them?

` This means "Parameterized by".`

*First, we all agree on the idea of conditional probabilities:
*

` P(X|Y)=P(X,Y)/P(Y)`

**That is, the probability that X happens given that we've seen Y happen, is the fraction of worlds in which Y happened that also contain X. This is uncontroversial.
**

If you're a Bayesian, you might view parameters themselves as variables in a statistical model. So you might want to speak about the probability of a parameter taking on a certain value, or the probability of the data given that certain parameters have taken on certain values. In that case, you might write something like P(D|θ) to denote the probability of the data given a parameter θ. If you're a frequentist, you might find notation like this unsettling, because parameters don't have probabilities under the frequentist view, but instead have fixed values. You could talk about the probability of observing the data under a particular parameterization by defining a family of parameterized probability density functions, and then writing something like P(D;θ). You might also do this as a Bayesian if you wanted to clearly differentiate between model parameters and other things you might observe.

`P(x;y) then is read as "The probability of x under probability density function P, parameterized by y."`

P(x;y,z) is "The probability of x under probability density function P, parameterized by y and z."

P(x,y;z) is "The probability of x and y jointly, under probability density function P, parameterized by z."

You could also write something like P(X|Y;z), which would be "The conditional probability of X given that we observe Y, under probability density function P, parameterized by z."

An example of the latter would be in something like logistic regression. We might wish to know P(L|D;θ), where L is the label of the data, D are the features we observe, and θ are the coefficients of the regression.