Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion #127

Open
1112114641 opened this issue Apr 11, 2024 · 3 comments
Open

Feature suggestion #127

1112114641 opened this issue Apr 11, 2024 · 3 comments

Comments

@1112114641
Copy link

1112114641 commented Apr 11, 2024

Hi,

had a quick look, and quite like the library. I have a suggestion to extend the library, specifically ols.rs / num.py - what my suggestion achieves / changes, is taking query_lstsq() from predicting the current value, to predicting pred_dist steps ahead. Moreover, it allows to, on the fly, change from linear-, quadratic-, ...., polynomial-level prediction using the order kwarg.

use polars::prelude::*;
use polars::{
  datatypes::DataType,
  error::PolarsResult,
  series::Series,
};
use pyo3_polars::derive::polars_expr;
use ndarray::{Array, Array2, Dim, ShapeError};
use serde::Deserialize;
use ndarray_linalg::LeastSquaresSvdInto;

#[derive(Deserialize, Debug)]
pub(crate) struct LstsqKwargs {
    pub(crate) order: u8,
    pub(crate) pred_dist: f64,
}

fn pred_output(_: &[Field]) -> PolarsResult<Field> {
    Ok(Field::new("pred", DataType::Float64))
}
// fn pred_coef_output(_: &[Field]) -> PolarsResult<Field> {
//   Ok(Field::new("coeffs",DataType::List(Box::new(DataType::Float64)),))
// }

#[inline(always)]
fn series_to_array1(series: &Series) -> Result<Array<f64, Dim<[usize; 1]>>, ShapeError> {
  let array = series.f64().unwrap();
  let y_data = array.into_no_null_iter().collect::<Vec<_>>();
  Ok(Array::from_vec(y_data))
}

#[inline(always)]
fn series_to_vandermonde_array2(series: &Series, degree: usize) -> PolarsResult<Array2<f64>> {
  let nrow = series.len();
  let tmp_arr = series.cast(&DataType::Float64)?;
  let array = tmp_arr.f64().unwrap();
  let data = array.into_no_null_iter()
      .map(|val| (0..degree).map(move |pow| val.powi(pow as i32)).collect::<Vec<_>>())
      .collect::<Vec<_>>();

  Ok(Array2::from_shape_vec((nrow, degree), data.into_iter().flatten().collect()).unwrap())
}

/// takes inputs w x at 0 y at 1, and returns a tuple w y at ncols
#[inline(always)]
fn mask_n_into_mat2(inputs: &[Series], order: usize) -> PolarsResult<(Array2<f64>, Array<f64, Dim<[usize; 1]>>)> {
  let mat = series_to_vandermonde_array2(&inputs[0], order).unwrap();
  let y = series_to_array1(&inputs[1]).unwrap();
  Ok((mat, y))
}


// #[polars_expr(output_type=Float64)]
#[polars_expr(output_type_func=pred_output)]
// #[polars_expr(output_type_func=pred_coef_output)]
pub fn lstsq_pred(inputs: &[Series], kwargs: Option<LstsqKwargs>) -> PolarsResult<Series> {
    let kwargs = kwargs.unwrap_or_else(|| LstsqKwargs {
        order: 1, // Default to linear if not specified
        pred_dist: 1.0,
    });

    let order = kwargs.order + 1; // Add 1 to order to account for the constant term

    match mask_n_into_mat2(inputs, order as usize) {
        Ok((mat, y)) => {

          let result = mat.least_squares_into(y).unwrap().solution;
          let x_max: f64 = inputs[0].cast(&DataType::Float64)?.max()?.unwrap();
          let x_pow_vec = (0..order).map(|i| (x_max + kwargs.pred_dist).powi(i as i32)).collect::<Vec<_>>();
          let pred = Array::from_vec(x_pow_vec).dot(&result);

          // debug helper
          // let mut coef_build: ListPrimitiveChunkedBuilder<Float64Type> = ListPrimitiveChunkedBuilder::new("coefs", 1, result.len(), DataType::Float64);
          // coef_build.append_slice(&result.iter().map(|&val| val).collect::<Vec<f64>>());
          // let out = coef_build.finish();
          // Ok(out.into_series())

          Ok(Series::new("pred",vec![pred]))
        },
        Err(e) => Err(e),

    }

}

&

def lstsq_pred(
  x: IntoExpr,
  y: IntoExpr,
  order: int,
  pred_dist: float,
) -> pl.Expr:
  """
  This is an aggregation, hence return_scalar/is_elementwise values.

  order = 1 -> linear

  order = 2 -> quadratic

  order = 3 -> cubic

  Args:
      x (IntoExpr): string or pl.Expr
      y (IntoExpr): string or pl.Expr
      order (int): linear = 1, quadratic = 2, cubic = 3, ...
      pred_dist (float): timesteps to predict into the future

  Returns:
      pl.Expr: predicted values
  """
  return register_plugin_function(
    args=[str_to_expr(x), str_to_expr(y)],
    is_elementwise=False,
    returns_scalar=True,
    function_name="lstsq_pred",
    plugin_path=Path(__file__).parent,
    kwargs={"order": order, "pred_dist": pred_dist},
  )

cheers,
1112114641

@abstractqqq
Copy link
Owner

Thanks for the request! Do you have a blog post or something I can read about? Reading dry code doesn't help me understand the topic very well.

@1112114641
Copy link
Author

Um, work in progress 😅

@abstractqqq
Copy link
Owner

Um, work in progress 😅

Let me know once it is more or less ready! Also I would like some references to the topics you are implementing. That would help me greatly and I can then start working on this too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants